Topics

Agent Memory

How AI agents store, retrieve, and update long-term memory across tasks and sessions — beyond the context window.

From Chatbot to Digital Colleague: A Survey of Persistent Autonomous AI

A Tencent YouTu Lab survey maps the chatbot-to-agent shift on two axes: cognitive core (Chatbot then Thinking LLM) and task execution (Agent then Workspace plus Skill), arguing persistent state is the real leap.

AI Agents · City University of Hong Kong

RHO: Retrospective Harness Optimization via Self-Preference

RHO tunes an LLM agent harness from past unlabeled trajectories using self-consistency and pairwise self-preference, lifting SWE-Bench Pro from 59% to 78% in one round with no external grading.

Agent Memory · National University of Singapore

MRAgent: Graph Memory That Reconstructs Instead of Retrieves

MRAgent gives LLM agents a Cue-Tag-Content memory graph and lets the model reason while it traverses it, lifting LoCoMo LLM-Judge from 68.3 to 84.2 while cutting tokens to 118k per sample.

Agent Memory · National University of Singapore

EvoArena: Why Agent Memory Must Track Environment Changes

EvoArena turns static agent tasks into evolving chains and finds current agents average only 39.6% accuracy; EvoMem adds patch memory and improves chain-level accuracy by 3.7 points.

World Models · JD.com (Joy Future Academy)

Echo-Memory: Which Memory Lets a World Model Remember a Room?

When a camera revisits an old spot, block-wise state-space recurrence scored 69.0 open-domain VLM consistency vs 12.25 for the no-memory baseline; aggressive compression and spatial summaries mostly collapsed.

Multimodal Models · Peking University

Watch, Remember, Reason: A Human-View Map of Video MLLMs

A survey that reframes long-video MLLMs as three abilities (watch, remember, reason), comparing against 11 prior surveys and organizing 100+ methods plus 5 application domains.

AI Agents · Shanghai Jiao Tong University

LatentSkill: Bake Agent Skills Into LoRA Weights, Not the Prompt

A hypernetwork compiles a textual skill into a LoRA adapter in one forward pass. On ALFWorld, LatentSkill lifts success by 21.4 points (seen) with 64.1% fewer prefill tokens.

AI Agents · Lehigh University

OpenSkill: Self-Evolving LLM Agents With No Task Supervision

OpenSkill lets agents build skills and their own verifiers from the open web, hitting 43.6% on SkillsBench (+8.9 over the best baseline) with zero target-task answers.

AI Agents · Ant Group

SkillAdaptor: How LLM Agents Rewrite Their Own Skills

SkillAdaptor edits an agent's skill library from failed trajectories without touching model weights, lifting WebShop score +2.3 and PinchBench +1.5 over the frozen backbone.

Agent Memory · ByteDance

TaskMem: Teaching a Video Agent What Is Worth Remembering

TaskMem trains a multimodal agent to write its own memory with RL, lifting streaming-video QA accuracy to 67.9% on VideoMME and 45.4% on EgoLife, gains of 6.3 and 7.0 points over the Qwen3-VL-30B baseline.

Agent Memory · UC Berkeley

MemGPT: Treating the LLM Context Window Like an Operating System

MemGPT borrows OS virtual memory — it lets the LLM page data in and out of its own context with function calls, lifting deep memory retrieval to 93.4% with GPT-4 vs 35.3% for recursive summarization.

Long Context · Shanghai AI Laboratory

δ-mem: An 8×8 Online Memory That Boosts Frozen LLMs

δ-mem bolts a tiny 8×8 delta-rule memory onto a frozen LLM and lifts average long-memory scores 1.10× over the backbone and 1.15× over other memory methods — no fine-tuning, no context extension.

AI Agents · MemTensor

MemPrivacy: Private Edge-Cloud Agent Memory via Reversible Placeholders

MemPrivacy swaps sensitive spans for type-aware placeholders on-device, processes memory in the cloud over them, then restores them locally — utility loss stays within 1.6% and 0.6B-4B models beat GPT-5.2 at detection.