Institution
CASIA
The Institute of Automation, Chinese Academy of Sciences, with research groups in computer vision, multimodal learning, and robotics.
AI Agents · The Chinese University of Hong Kong
Orchestra-o1 orchestrates text, image, audio, and video sub-agents and hits 72.8% on OmniGAIA with a GPT-5 brain (+10.3 over Gemini-3-Pro). Its trained 8B orchestrator reaches 30.0%, best among open omnimodal agents.
AI Agents · CASIA
A CASIA survey maps agentic environments for LLM agents along eight attribute axes and eight domains, unifying synthesis, evaluation, and co-evolution. Sharpest finding: environments barely fit multi-agent settings.
Multimodal Models · Nanjing University
HYDRA-X unifies image and video tokenization in one ViT; tubelet attention and hierarchical temporal patchify improve DAVIS rFVD to 11.19 and editing overall to 4.34.