Language Models · Transformers
T5: Turning Every NLP Task Into Text-to-Text
T5 unified NLP transfer learning by casting every task as text input to text output, then systematically studying objectives, data, scale, and fine-tuning choices.
T5 unified NLP transfer learning by casting every task as text input to text output, then systematically studying objectives, data, scale, and fine-tuning choices.
What problem it solves
Transfer learning in NLP had become powerful but fragmented. Different tasks used different output formats, objectives, datasets, and fine-tuning conventions. T5 asks whether the field can be simplified by treating every text task the same way: feed text in, produce text out.
The core method
T5 uses a unified text-to-text framework. Translation, summarization, classification, question answering, and other tasks are all represented as text strings. The paper then systematically compares pretraining objectives, architectures, unlabeled datasets, transfer approaches, and scale. It also introduces the Colossal Clean Crawled Corpus, a cleaned web dataset for pretraining.
Key results
By combining the text-to-text framework, careful pretraining choices, scale, and C4, T5 achieves state-of-the-art results across many benchmarks covering summarization, question answering, classification, and language understanding. The paper also releases data, models, and code, making it a practical reference for transfer learning.
Why it matters
T5 made NLP workflows cleaner. Instead of designing separate heads and formats for every task, builders could express tasks as text transformations. That influenced later instruction tuning and sequence-to-sequence systems, where task formatting and prompt wording became part of model design.
Limits and open questions
Text-to-text unification is elegant, but it can hide task-specific structure that might help. C4 is cleaner than raw web data, but still reflects web biases and filtering choices. T5’s lasting value is not that one format is always optimal, but that a unified interface makes comparison and scaling much easier.
One line: T5 made NLP tasks speak one interface.