Institution

Peking University

A leading Chinese research university in Beijing, prominent in natural language processing, multimodal learning, and AI research.

Robotics · Peking University

DragMesh-2: Dexterous Articulated Manipulation Through Contact

DragMesh-2 opens doors and drawers with a 51-DoF hand and no actuator on the object joint, so motion comes only from contact. PICA training hits 0.89 success at nominal damping and 0.56 at 4x, with no tactile sensing.

World Models · HKUST

WorldCraft: Object Manipulation for Camera-Controlled Video World Models

WorldCraft lets you click an object and drag its path inside a camera-controlled video world model. It hits 38.90px trajectory error vs DragAnything's 39.86 and keeps camera RPE at 0.131 via world-space paths and a LoRA.

Video Generation · Peking University

LoomVideo: A 5B Unified Video Generator That Edits Without Concatenation

LoomVideo runs text-to-video, editing, and multi-image-to-video in one 5B model, matching 13B baselines on VBench (63.15 vs 63.01) and editing 5.41x faster by adding the source latent instead of concatenating it.

AI Agents · The Chinese University of Hong Kong

Orchestra-o1: Omnimodal Agent Orchestration

Orchestra-o1 orchestrates text, image, audio, and video sub-agents and hits 72.8% on OmniGAIA with a GPT-5 brain (+10.3 over Gemini-3-Pro). Its trained 8B orchestrator reaches 30.0%, best among open omnimodal agents.

AI Agents · Ant Group

SearchSwarm: Delegation Intelligence for Deep Research

SearchSwarm fine-tunes Tongyi DeepResearch-30B-A3B on harness-generated delegation trajectories, lifting BrowseComp from 43.4 to 68.1 and topping every 30B-A3B model on four deep-research benchmarks.

Theorem Proving · MiniMax AI

MaxProof: How MiniMax M3 Reaches Gold-Level Proof Scores

MaxProof turns MiniMax-M3 into a generator, verifier, fixer, and ranker; with population-level test-time scaling it reports 35/42 on IMO 2025 and 36/42 on USAMO 2026.

Long Context · MiniMax AI

MiniMax Sparse Attention: 1M Context Without Dense Attention

MiniMax Sparse Attention keeps only 2,048 selected KV tokens per query group and reports 28.4x lower attention FLOPs plus 14.2x prefill speedup at 1M context.

AI Agents · TokenRhythm Technologies

Claw-SWE-Bench: Why Coding Agent Harnesses Matter

Claw-SWE-Bench evaluates OpenClaw-style coding-agent harnesses on 350 GitHub issue tasks. OpenClaw jumps from 19.1% to 73.4% Pass@1 with a full adapter.

Multimodal Models · Peking University

Watch, Remember, Reason: A Human-View Map of Video MLLMs

A survey that reframes long-video MLLMs as three abilities (watch, remember, reason), comparing against 11 prior surveys and organizing 100+ methods plus 5 application domains.

LLM Reasoning · Samsung Research

TrOPD: Trust-Region On-Policy Distillation for Small LLMs

TrOPD masks on-policy distillation to the tokens where the teacher is actually trustworthy, adding +3.06 to +3.52 average points over standard OPD on math, code, and STEM benchmarks with 1.5B-1.7B students.

AI Agents · Peking University

Video2GUI: Mining 12M GUI Agent Trajectories From Internet Videos

Video2GUI turns 500M unlabeled tutorial videos into WildGUI — 12M grounded GUI interaction trajectories across 1,500+ apps and sites — and pretraining Qwen2.5-VL and Mimo-VL on it lifts GUI benchmarks by 5-20%.