Institution

CASIA

The Institute of Automation, Chinese Academy of Sciences, with research groups in computer vision, multimodal learning, and robotics.

AI Agents · The Chinese University of Hong Kong

Orchestra-o1: Omnimodal Agent Orchestration

Orchestra-o1 orchestrates text, image, audio, and video sub-agents and hits 72.8% on OmniGAIA with a GPT-5 brain (+10.3 over Gemini-3-Pro). Its trained 8B orchestrator reaches 30.0%, best among open omnimodal agents.

AI Agents · CASIA

Agentic Environment Engineering for LLMs: A Survey of the Field

A CASIA survey maps agentic environments for LLM agents along eight attribute axes and eight domains, unifying synthesis, evaluation, and co-evolution. Sharpest finding: environments barely fit multi-agent settings.

Multimodal Models · Nanjing University

HYDRA-X: One Visual Tokenizer for Images and Video

HYDRA-X unifies image and video tokenization in one ViT; tubelet attention and hierarchical temporal patchify improve DAVIS rFVD to 11.19 and editing overall to 4.34.