The Best NeurIPS Papers of 2025, Simply Explained

AI, But Simple Issue #89

 

Hello from the AI, but simple team! If you enjoy our content (with 10+ custom visuals), consider supporting us so we can keep doing what we do.

Our newsletter is not sustainable to run at no cost, so we’re relying on different measures to cover operational expenses. Thanks again for reading!

The Best NeurIPS Papers of 2025, Simply Explained

AI, But Simple Issue #89

2025 was a defining and eventful year for deep learning (DL), machine learning (ML), and AI, defined by large-scale LLM research, validation, and new hybrid models. In 2026, there are no signs of slowing research progress.

As we move forward in 2026, let’s take a look at some award-winning, highly cited, and highly influential NeurIPS papers from 2025, diving into the latest research to uncover some insights for the future.

Reinforcement Learning Does Not Increase LLM Reasoning Capacity

Paper: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yue, Chen, et al., Tsinghua University - NeurIPS 2025 - Best Paper Runner-up

Reinforcement Learning with Verifiable Rewards (RLVR) is one of the most popular and “effective” LLM fine-tuning techniques, used to train reasoning models like OpenAI o1 and DeepSeek-R1.

At its core, Yue et al.’s paper is on the empirical analysis of RLVR and if it actually increases the model’s capability.

The authors investigate whether RLVR actually teaches models how to reason or if it just makes them better at guessing what they already know. To test this theory, the researchers looked into the "reasoning boundary" of LLMs.

Usually, models are evaluated on “pass@1,” which measures if the model gets the answer right on the first try. This paper instead used “pass@k” with very large values of k. For example, letting the model attempt a problem 256 or more times.

If RLVR teaches new reasoning skills (e.g., "how to use a new math theorem"), the RL-trained model should be able to solve problems that the base model never could, given a sufficient and large number of guesses.

The authors took a base LLM and an RLVR-trained version of it. They then sampled thousands of outputs for challenging math and coding problems to see the total set of solvable problems for each.

Subscribe to keep reading

This content is free, but you must be subscribed to AI, But Simple to continue reading.

Already a subscriber?Sign in.Not now