Google’s Gemini 2.5 Pro Tops Coding Charts and MENSA Tests in AI ‘IQ’ Battle

by shayaan

In short

  • Google’s new Gemini 2.5 Pro is at the top of the WebDev Arena -Leaderboard, better than competitors such as Claude in Codertaken, making it a striking choice for developers looking for superior coding options.
  • The AI ​​model also has a 1 million token context window (expandable to 2 million), which means it is able to process large code bases and complex projects that go much further than the capacity of models such as Chatgpt and Claude 3.7 sonnet.
  • It also achieved the highest scores on reasoning benchmarks, including a Mensa IQ test and the latest exam of humanity, who demonstrates advanced problem-solving skills that are essential for advanced development tasks.

Google’s recently launched Gemini 2.5 Pro has risen first and foremost on coding leaderboards and defeated Claude in the famous Webdev Arena—A a non-infessional ranking of the ranking related to the LLM ArenaBut specifically aimed at measuring how good AI models are when coding. The performance comes in the midst of Google’s push to position its flagship AI model and a leader in both coding and reasoning tasks.

Gemini 2.5 Pro earlier this year released stands in the first place In different categories, including coding, style control and creative writing. The enormous context window of the model – a million tokens that expand to two million Soon – it allows it to process large code bases and complex projects that would even suffocate the nearest competitors. For context, powerful models such as Chatgpt and Claude 3.7 Sonnet can process only a maximum of 128k tokens.

See also  2024’s loyalty overhaul: Blockchain’s promise for brands

Gemini also has the highest “IQ” of all AI models. Trackingai has passed it formalized by Mensa testsWith the help of verbalized questions from Mensa Norway to make a standardized way to compare AI models.

Gemini 2.5 Pro scored higher than competitors on these tests, even when using tailor -made questions that are not publicly available in training data.

With an IQ score of 115 in offline tests, the new Gemini is one of the “Clear”With the average human intelligence that scored around 85 to 114 points. But the idea of ​​an AI with IQ must be unpacked. AI systems do not have intelligence quotients such as people, so it is better to consider the benchmark as a metaphor for performance on reasoning benchmarks.

For benchmarks that are specifically designed for AI, Gemini 2.5 Pro 86.7% scored on the AIM 2025 Math test and 84.0% on the GPQA Science Assessment. At the last exam of the Humanity (HLE), a newer and harder benchmark made to prevent test saturation problems, Gemini scored 2.5 18.8%, with OpenAI’s O3 Mini (14%) and Claude 3.7 Sonnet (8.9%) (8.9%) is beating, which is remarkable in the performance boost ..

The new version of Gemini 2.5 Pro is now available for free (with tariff limits) for all Gemini users. Google previously described this release as an “experimental version of 2.5 Pro”, part of his family of “thinking models” that were designed to reason by reasoning reactions instead of simply generating.

Although it did not win every benchmark, Gemini has that caught the attention of developers With its versatility. The model can make complex applications of some instructions, building interactive web apps, require endless runner games and visual simulations without detailed instructions.

See also  Uniswap Tops Ethereum Burner DeFi Leaderboard with $3.8M in 7 Days

We tested the model with the request to resolve a broken HTML5 code. It generated nearly 1000 code rules and offered results that Claude 3.7 -Sonnet defeated – the previous leader – in terms of quality and understanding of the full set of instructions.

For working developers, the input of Gemini costs 2.5 Pro $ 2.50 per million tokens and output costs $ 15.00 per million tokens, so that it is positioned as a cheaper alternative to some competitors and yet offers impressive options.

The AI ​​model processes up to 30,000 codes in its advanced plan, making it suitable for projects at company level. Are multimodal skills – working with text, code, audioImages, and video-joint flexibility that other coding -oriented models cannot agree.

Generally intelligent Newsletter

A weekly AI trip told by Gen, a generative AI model.

Source link

Related Posts