Agents' Last Exam: Why AI Agents Still Fail at Work
Agents' Last Exam tests AI agents on 1,490 expert-built professional tasks across 55 digital industries; the hardest tier averages only 2.6% full pass.
Institution
University of California, Berkeley — academic source of influential work including denoising diffusion models.
Agents' Last Exam tests AI agents on 1,490 expert-built professional tasks across 55 digital industries; the hardest tier averages only 2.6% full pass.
MemGPT borrows OS virtual memory — it lets the LLM page data in and out of its own context with function calls, lifting deep memory retrieval to 93.4% with GPT-4 vs 35.3% for recursive summarization.
Diffusion Models · UC Berkeley
Denoising Diffusion Probabilistic Models trains a network to undo gradual Gaussian noise step by step, hitting FID 3.17 on CIFAR-10 — and laying the groundwork that Stable Diffusion and DALL-E 2 later built on.