Training Large Models
How a model that can do anything is actually built. Pre-training at scale, the laws that predict what bigger gets you, and the fine-tuning and RL steps that turn raw next-token prediction into a useful assistant.
Pre-training
Learning the world from a next-token-prediction objective at scale.
Scaling Laws
The clean power-laws that predict what bigger models, more data, and more compute will buy you.
Training Data
Where the trillions of tokens come from, and why curation matters as much as quantity.
Fine-Tuning
Specializing a pre-trained model on a downstream task or domain.
PEFT & LoRA
Tuning a fraction of the parameters and getting most of the gain.
RLHF
Turning human preferences into a reward signal, and then into a better model.
DPO & Preference Optimization
Skipping the reward model and optimizing on preferences directly.
RLVR & Verifiable Rewards
When you can grade the answer, reinforcement learning gets a lot simpler — and more powerful.
Constitutional AI
Replacing human feedback with a model critiquing itself against a written constitution.