Model Based Problem Solving

23h

Google upgrades Gemini Deep Research’s search and problem-solving capabilities

Google evaluated Gemini Deep Research’s capabilities using two benchmarks called HLE and DeepSearchQA. According to the company, it achieved record performance on both tests.

18h

Nous Research just released Nomos 1, an open-source AI that ranks second on the notoriously brutal Putnam math exam

Nous Research's open-source Nomos 1 AI model scored 87/120 on the notoriously difficult Putnam math competition, ranking ...

The Next Web

Is ChatGPT’s New Shopping Research Solving a Problem, or Creating One?

When OpenAI announced its new shopping search capabilities, I took the news with a grain of salt (perhaps the whole shaker). For the past decade, we have watched the slow evolution of traditional ...

7don MSNOpinion

Opinion: Solving homelessness requires housing — and trust in local communities

Homelessness is not a market-incentive problem. It is a challenge produced by intersecting failures in housing, affordability ...

Elon Musk confirms Grok 4.20 AI model coming in weeks: Expected to beat GPT-5.1, Gemini 3 Pro

Elon Musk has confirmed that the next major AI model from xAI, Grok 4.20, will launch within the next 3–4 weeks. The model ...

Analytics India Magazine

OpenAI Launches GPT-5.2, Calls It Most Capable Model for Professional Work

The AI giant has released GPT-5.2, a new frontier model for professional knowledge work, long-running agents, and complex ...

Why You’re Better Than a Computer at Solving Connections

An engineer for New York Times Games has been trying to teach artificial intelligence to understand wordplay more like a human.

CIOOpinion

The truth problem: Why verifiable AI is the next strategic mandate

As AI takes center stage, the real win is making sure we can actually trust its decisions — and that’s why verifiable AI is ...

Devdiscourse

Generative AI proven to strengthen student reasoning skills, especially problem-solving

According to the study, generative AI produces a moderately strong positive effect on higher-order thinking overall. Across ...

15d

Alibaba's AgentEvolver lifts model performance in tool use by ~30% using synthetic, auto-generated tasks

The new framework from Tongyi Lab enables agents to create their own training data by exploring and interacting with new software environments.

10d

‘Why not?’ At re:Invent, AWS answers with big step into frontier AI model reasoning and agentic services

Amazon Web Services Inc. envisions a world in which billions of AI agents will be working together. That will take a significant advance in frontier model reasoning, and the company made several major ...

10d

New Deepseek 3.2 AI Open Model Outthinks ChatGPT 5 in Tough Reasoning Tests

Deepseek version 3.2 packs 671B parameters with 37B active at inference, giving you faster tool use and lower run costs on ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results