Institution

University of Illinois Urbana-Champaign

A leading public research university with deep strength in computer science, AI, and systems.

DRPO: Rethinking Divergence Regularization in LLM RL

DRPO swaps DPPO's hard divergence mask for a smooth advantage-weighted quadratic regularizer, keeping the Binary-TV trust region but with bounded gradient weights, and trains Qwen3 LLMs more stably under FP8.

AI Agents · University of Illinois Urbana-Champaign

Harness-1: Move Search-Agent Bookkeeping Out of the Policy

Harness-1 is a 20B RL search agent that hands working memory to the environment, hitting 0.730 average curated recall and beating the next open subagent by +11.4 points.

AI Agents · Shanghai Jiao Tong University

SWE-Explore: Can Coding Agents Find the Right Code?

SWE-Explore isolates the repo-exploration stage of coding agents over 848 issues. Agentic explorers crush BM25 (HitFile 0.65 vs 0.08), but line-level recall stalls at 0.15-0.20, and that gap is what limits repairs.

AI Agents · University of Illinois Urbana-Champaign

Code as Agent Harness: Reframing Code as the Runtime of AI Agents

This survey reframes code not as a thing agents generate but as the executable substrate they run on, mapping 40-plus systems across three layers (interface, mechanisms, multi-agent scaling) plus seven open problems.

Multimodal Models · University of Illinois Urbana-Champaign

Crafter: A Multi-Agent Harness for Editable Scientific Figures

Crafter wraps an image model in five cooperating agents and scores 50.34 on PaperBanana-Bench vs 11.13 for the raw backbone; then CraftEditor turns the raster output into editable SVG you can actually fix.

Long Context · University of Illinois Urbana-Champaign

From Context to Skills: Ctx2Skill Self-Evolves Context Learning

Ctx2Skill is a self-play framework that discovers natural-language skills from a long context with no human labels or external rewards, lifting GPT-4.1 from 11.1% to 16.5% and GPT-5.1 from 21.2% to 25.8% on CL-bench.

AI Agents · University of Illinois Urbana-Champaign

Eywa: Letting LLM Agents Call Scientific Foundation Models

Eywa lets an LLM agent invoke domain models like Chronos and TabPFN through a learned interface instead of serializing data into text. On EywaBench it lifts utility from 0.6154 to 0.6558 while cutting ~30% tokens.