Vision-Language-Action · Zhejiang University
LabVLA: A VLA Model for Scientific Lab Robots
LabVLA trains a Qwen3-VL-4B backbone plus DiT action expert on laboratory workflows and reports 71.1% ID and 70.0% OOD success on LabUtopia.
Institution
A Chinese AI research institute working on large models, agents, and AI safety.
Vision-Language-Action · Zhejiang University
LabVLA trains a Qwen3-VL-4B backbone plus DiT action expert on laboratory workflows and reports 71.1% ID and 70.0% OOD success on LabUtopia.
Multimodal Models · Shanghai AI Laboratory
OVO-S-Bench: Streaming Spatial Intelligence in MLLMs turns streaming spatial intelligence into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Shanghai AI Laboratory
ResearchClawBench: Testing Autonomous Research Agents turns end-to-end scientific research agents into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
LLM Reasoning · Shanghai AI Laboratory
ThoughtFold trims the redundant reasoning of DeepSeek-R1-Distill-Qwen-7B by about 56% of tokens while keeping accuracy on AIME, MATH-500, and GPQA-Diamond intact, using a masked preference objective.
Efficient AI · Shanghai AI Laboratory
Draft-OPD trains speculative draft models on states their own drafting induces, not just target transcripts. On Qwen3 thinking models it hits 4.86x to 4.89x, beating EAGLE-3 by 23 percent and DFlash by 13 percent.
LLM Reasoning · Shanghai AI Laboratory
SU-01, a 30B-A3B open model from Shanghai AI Lab, hits 35 points on IMO 2025 and clears gold lines at IPhO 2024/2025 using only ~338K short SFT trajectories plus a 200-step two-stage RL pipeline.
AI Agents · Shanghai AI Laboratory
AgentDoG 1.5 trains 0.8B-8B agent-safety guard models on only ~1k samples, hits 92.2% accuracy on R-Judge with the 4B variant, rivals GPT-5.4, and cuts agentic-RL deployment overhead by two orders of magnitude.
AI Agents · Shanghai AI Laboratory
Pi-Bench scores agents on proactivity, not just task completion, across 100 long-horizon tasks. The best model, GPT-5.4, hits only 67.0% proactivity, and removing prior sessions drops it 9.5 points.
Multimodal Models · Shanghai AI Laboratory
CiteVQA makes document QA models return bounding-box citations with every answer. The top model scores 76.0 Strict Attributed Accuracy; the best open model just 22.5 — most answer right but cite the wrong region.
AI Agents · Shanghai AI Laboratory
COLLEAGUE.SKILL distills one person's work traces into a versioned skill package with two tracks — capability and bounded behavior — that any agent can install, correct, and roll back. The open repo reports ~18.5k stars.
Speech Recognition · Shanghai AI Laboratory
Mega-ASR fights ASR's noise-robustness gap by synthesizing 2.4M clips across 54 compound acoustic scenarios, then training Qwen3-ASR-1.7B in two stages — cutting WER to 45.69% vs 54.01% on VOiCES R4-B-F.
Long Context · Shanghai AI Laboratory
δ-mem bolts a tiny 8×8 delta-rule memory onto a frozen LLM and lifts average long-memory scores 1.10× over the backbone and 1.15× over other memory methods — no fine-tuning, no context extension.
Vision-Language-Action · Shanghai AI Laboratory
PhysBrain 1.0 compiles human egocentric video into physics QA to pretrain a VLM, then adapts it to robot control — lifting Franka grasping from 47.1% to 63.3% over 50 trials versus a pi0.5 baseline.