- AI, But Simple
- Posts
- Reasoning LLMs, Simply Explained
Reasoning LLMs, Simply Explained
AI, But Simple Issue #55

Hello from the AI, but simple team! If you enjoy our content, consider supporting us so we can keep doing what we do.
Our newsletter is no longer sustainable to run at no cost, so weâre relying on different measures to cover operational expenses. Thanks again for reading!
Reasoning LLMs, Simply Explained
AI, But Simple Issue #55
Reasoning large language models (LLMs) are transformer-based LLMs that appear to âthinkâ by breaking a more complex question down into smaller steps, resulting in intermediate steps before an output.
They are a very hot topic in research recently, and you can find them just about anywhereâDeepSeek-R1, OpenAI o1, and Google Gemini 2.0 Flash Thinking are examples of LLMs with âreasoningâ capabilities.

But what do we mean by reasoning? We define âreasoningâ as the process of answering questions that require complex, multi-step generation with intermediate steps. These steps are often called reasoning steps or thought processes.
For instance, factual question answering, such as âWhich ocean is the largest in the world?â does not involve reasoning.
However, a question like âIf a car speeds by at 40 mph and travels 160 miles, how much time did it take?â requires some simple reasoning. The model would need to recognize the relationship between speed, distance, and time before arriving at the answerâit can do this much better with reasoning steps.

But when should we use a reasoning model? We use them for complex tasks such as solving puzzles, solving advanced math problems, and working out challenging coding problems.
Reasoning models are typically not necessary for simpler tasks such as summarization, translation, or knowledge-based tasks. In fact, using reasoning models for everything can be inefficient and expensive.
Reasoning models are indicative of a fundamental change in approach to training LLMs. Instead of scaling train-time compute, reasoning models allocate more resources to scaling test-time compute.