Topics

Small Language Models

Compact, on-device, and edge-deployable models — strong capability per parameter for local and low-cost inference.

DistilBERT: A Smaller and Faster Version of BERT

DistilBERT turns knowledge distillation for compact language models into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.

Small Language Models · Google Research

MobileBERT: Compact BERT for Resource-Limited Devices

MobileBERT turns mobile-friendly BERT compression into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.

Small Language Models · Independent Researcher

TinyLlama: An Open Small Language Model Recipe

TinyLlama turns open small language model training into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.

Small Language Models · Meta AI

MobileLLM: Better Sub-Billion Models for Devices

MobileLLM argues architecture matters more at sub-billion scale: deep-thin designs plus sharing improve 125M/350M models by 2.7%/4.3%, then 0.7%/0.8% more.

Small Language Models · Hugging Face

SmolLM2: A Fully Open 1.7B Model Built on a Public Data Recipe

SmolLM2 is a 1.7B model overtrained on ~11T tokens through four data stages. It scores 68.7 on HellaSwag and 19.4 on MMLU-Pro, beating Llama3.2-1B — and ships every dataset, not just the weights.

Efficient AI · Microsoft Research

Phi-3-mini: A 3.8B Model That Rivals GPT-3.5 on Your Phone

Phi-3-mini is a 3.8B-parameter model trained on 3.3T heavily filtered and synthetic tokens that hits 69% on MMLU and 8.38 on MT-bench — matching Mixtral 8x7B and GPT-3.5 while small enough to run on a phone.