Gemini 3 obliterates GPT-5.1 on every benchmark

Nov 20, 2025
2 min read

Google rewrites AI race rules with multimodal dominance

Google's Gemini 3 scores 37.5% on HLE vs GPT-5.1's 26.5%, doubles screen understanding, hits 91% spatial reasoning. Anti-gravity IDE kills Cursor. New era begins.

Gemini 3 demolishes benchmarks with impossible gains

The benchmark massacre is comprehensive: Gemini 3 Pro scored 31.1% on Arc AGI 2 versus GPT-5.1's 17.6%, crushed VPCT spatial reasoning at 91% versus 66%, and doubled the previous best on Screen Spot Pro from Sonnet's 36.2% to 72.7%.

Matt Schumer declared this "massively accelerated my timeline to full computer-using agents" while noting "the last capability jump of this magnitude was GPT-4 in March 2023."

Gemini now ranks #1 across all Arena leaderboards text, vision, webdev, coding, math, creative writing, and occupational tasks. On academic reasoning, it hits 91.9% on GPQA Diamond versus GPT-5.1's 88.1%.

The Deep Think mode pushes Arc AGI scores to 45.1%, with François Chollet calling it "impressive progress." Artificial Analysis declared simply: "Gemini 3 Pro is the new leader in AI," placing it three points ahead of GPT-5.1 in aggregate scoring.

Anti-gravity IDE makes Cursor obsolete overnight

Google's Anti-gravity isn't just another IDE it's an autonomous coding partner that plans and executes complex tasks across editor, terminal, and browser simultaneously. When asked to convert SVG to PNG without proper tools, it rendered the image in Chrome and saved the pixels directly.

Max Weinbach declared it "outperforming Cursor and Windsurf" after just days of use, while early testers report agents validating their own code and building fully functional Game Boy emulators from text prompts.

The platform transforms developers into architects directing intelligent agents rather than writing code. Pietro Schirano demonstrated Gemini building a 3D Lego editor "nailing UI, complex spatial logic, and functionality" in one shot, plus recreating Ridiculous Fishing complete with sound effects.

Logan Kilpatrick explained agents "operate autonomously across editor, terminal, and browser," communicating via detailed artifacts while handling everything from feature building to bug fixing and report generation.

Google rewrites AI race rules with multimodal dominance

Sundar Pichai's confidence was justified Gemini ships to 650 million monthly users on day one, integrated into search, AI Studio, Vertex AI, and the new generative interfaces that adapt dynamically to user needs.

The model processes requests so fast that Dan Shipper noted "intelligence per second is off the charts," while maintaining quality that makes previous models feel "spiky and inconsistent."

Early testing reveals profound practical advantages: finding and synthesizing information in long documents that stumped other models, respecting user time without "flowery preambles," and finally producing creative writing that "doesn't sound like AI slop anymore."

Demis Hassabis recreated his 1990s game Theme Park "down to adjusting salt on chips" in hours, demonstrating the model's unprecedented understanding of complex requirements. Simon Smith's observation cuts through the noise: "So I guess we haven't hit a wall."