DeepSeek-R1: Teaching a Model to Reason With Almost No Human Labels
Reinforcement learning alone, with no supervised reasoning traces, can make a base language model develop strong step-by-step reasoning, rivaling top closed models.
Institution
A Chinese AI lab known for strong open-weight language and reasoning models.
Reinforcement learning alone, with no supervised reasoning traces, can make a base language model develop strong step-by-step reasoning, rivaling top closed models.