Long Context · University of Maryland
End-to-End Context Compression at Scale with LCLMs
LCLMs train a 0.6B encoder and 4B decoder jointly to compress long context into soft tokens at 1:4, 1:8 and 1:16, cutting prefill memory and time-to-first-token while staying close to the uncompressed baseline.