NITP: Predict the Next Token's Meaning, Not Just Its ID
NITP adds a dense target to next-token prediction: forecast a shallow-layer embedding of the next token. On a 9B MoE it lifts MMLU-Pro by 5.71 points for about 2 percent extra training FLOPs and zero inference cost.