- AI, But Simple
- Posts
- Neural Scaling Laws, Simply Explained
Neural Scaling Laws, Simply Explained
AI, But Simple Issue #50

Hello from the AI, but simple team! If you enjoy our content, consider supporting us so we can keep doing what we do.
Our newsletter is no longer sustainable to run at no cost, so we’re relying on different measures to cover operational expenses. Thanks again for reading!
Neural Scaling Laws, Simply Explained
AI, But Simple Issue #50
Neural scaling laws are mathematical relationships that show how a neural network’s performance scales, such as its test loss or error rate, when you increase the number of parameters (model size), the amount of training data, or the computational resources used.
These laws have become a blueprint for building modern deep learning models—primarily in the development of LLMs in natural language processing (NLP), but also in other areas like deep convolutional neural networks (CNNs) in computer vision.

The neural scaling laws show that it is possible to approximate the loss of an ML system by analyzing two main variables:
The model size (N), measured in the number of parameters. Parameters are weights and biases in a model, which is adjusted in training. The more parameters there are, the more it can “learn” and “remember” from a dataset.
The dataset size (D), typically measured in tokens, pixels, or other fundamental units.
Additionally, the amount of computational resources, known as compute (C), is included in the analysis, as it is the resource that enables more parameters and more data. Compute is measured in the number of calculations performed over time.
The metric for compute is FLOP/s, the number of floating-point operations a computer performs per second.
The compute is typically fixed; organizations have a given budget, and scaling laws can determine the ideal model and dataset size given that budget.