OpenAI Open-Sourced Two mega models under Apache 2.0, shattering License limits
The 120B parameter model runs on a $ 17k GPU; 20B parameter version works on high-end gaming cards
Performance Rivals GPT-4O-Mini and O3-Bears Rivals in Mathematics, Code and Medical Benchmarks
On Tuesday, OpenAi released two open-weight steel models that deliver performance that corresponds to its commercial offers while they are performed on consumer hardware-the GPT-OSS-120B needs a single 80 GB GPU and the GPT-OSS-20B works on devices with only 16 GB of memory.
The modelsAvailable under Apache 2.0 licenses, reach almost parity with OpenAI’s O4-Mini at reasoning benchmarks. The parameter version of 120 billion activates only 5.1 billion parameters per token through the architecture of the mixture of experts, while the 20 billion parametermodel activates 3.6 billion. Both treat context lengths up to 128,000 tokens-it the same as GPT-4O.
The fact that they are released under that specific license is a pretty big problem. It means that everyone can use, adjust and benefit from those models without limitations without limitations. This includes everyone, from you to the competitors of OpenAi, such as the Chinese startup Deepseek.
The release comes when speculation is increased over the upcoming arrival and the competition of GPT-5 in the open-source AI room. The OSS models are the latest open-weight steel models from OpenAI since GPT-2 in 2019.
There is not really a release date for GPT-5, but Sam Altman hinted that it could happen earlier than later. “We have many new things for you in the coming days,” he tweeted early and promised “a big upgrade later this week.”
We have many new things for you in the coming days!
The open-source models that fell today are very powerful. “These models perform better than in similar size open models for reasoning announcement. The company trained them with the help of reinforcement learning and techniques of its O3 and other border systems.
On code forces Competition Codeing scored an ELO rating of 2622 with tools and 2463 without transfer O4-mini’s 2719 rating and the approaching of O3’s 2706. The model reached 96.6% accuracy on AIME 2024 Mathematics-compete 57.6% on the Healthbench evaluation, Beating O3s, Beating O3s, Beating O3s, Beating O3s. 50.1% score.
Image: OpenAi
The smaller GPT-OSS-20B Matte or surpassing O3-Mini about these benchmarks despite its size. It scored 2516 ELO on code forces with tools, reached 95.2% on Aime 2024 and reached 42.5% on Healthbench – everything while it fits with memory restrictions that would make it viable for Edge implementation.
Both models support three reasoning effort levels – low, medium and high – that commercial latter for performance. Developers can adjust these settings with a single sentence in the Systemberight. The models were post-educated with the help of processes that are comparable to O4-Mini, including guided refinement and what OpenAi described as a “high-computer RL phase”.
But don’t just think because someone can adjust those models as you please, you will have it easy. OpenAI filtered certain harmful data with regard to chemical, biological, radiological and nuclear threats during pre-training. The phase after the training used deliberative alignment and instruction hierarchy to teach refusal of unsafe prompts and defense against fast injections.
In other words, OpenAi claims to have designed his models to make them so safe, they cannot even generate harmful answers after adjustments.
Eric Wallace, an OpenAI outlin expert, revealed that the company performed unprecedented safety tests before release. “We have refined the models to deliberately maximize their bio- and cyber options,” Wallace posted on X. The team has put together domain-specific data for biology and trained the models in coding environments to resolve the challenges of the capture-the-flag.
Today we give GPT-OSS-120B and GPT-OSS-20B from-two Open-Weight LLMs that deliver strong performance and use of agent tools.
Before release we performed a first in its friendly safety analysis, where we refined the models to deliberately maximize their bio and cyber options 🧵 pic.twitter.com/err2mbcgxx
The opponents who were aligned version underwent evaluation by three independent expert groups. “On our border risico evaluations, our malicious fine-tuned GPT-Oss performs afterwards OpenAI O3, a model under readiness high capacity,” Wallace said. The tests indicated that even with robust refinement with the help of the Opai training pile, the models could not reach dangerous capacities according to the company’s preparedness framework.
That said, the models retain the reasoning of the unmarried chain stream, of which OpenAi said it is of the utmost importance to keep the AI on their guard. “We have not made a direct supervision of the cot for both GPT-OSS model,” the company said. “We believe that this is crucial to check the misconduct of the model, deception and abuse.”
OpenAI hides the full idea of its best models to prevent competition from replicating their results – and to prevent a deep chat event, which can now happen even more easily.
The models are available on Hug. But as we said in the beginning, you need a colossus of a GPU with at least 80 GB Vram (such as the $ 17k NVIDIA A100) to perform the version with 120 billion parameters. T
he smaller version with 20 billion parameters requires a minimum of 16 GB of Vram (such as the $ 3K Nvidia RTX 4090) on your GPU, which is not so crazy for a hardware of consumer quality.
Generally intelligent Newsletter
A weekly AI trip told by Gen, a generative AI model.