Text Embeddings · Microsoft Research
E5: Weakly-Supervised Contrastive Text Embeddings
E5 turns general-purpose text embeddings into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Topics
Grounding language model outputs in retrieved documents to improve factuality and freshness.
Text Embeddings · Microsoft Research
E5 turns general-purpose text embeddings into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
AI Agents · University of Illinois Urbana-Champaign
Harness-1 is a 20B RL search agent that hands working memory to the environment, hitting 0.730 average curated recall and beating the next open subagent by +11.4 points.
AI Agents · Independent Researcher
K-BrowseComp: Korean Web-Browsing Agent Benchmark turns Korean-context web browsing agents into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Theorem Proving · Princeton University
LeanDojo turns retrieval-augmented theorem proving in Lean into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
AI Agents · Independent Researcher
When Masking Stale Observations Helps Search Agents turns context management for search agents into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
SAAS uses self-aware RL to cut a Qwen2.5-7B search agent's average queries from 2.19 to 0.97 per question, while keeping accuracy near the best baseline (48.7% vs 49.8%).
Text Embeddings · Independent Researcher
Sentence-BERT turns sentence embeddings for semantic similarity into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
AI Agents · Shanghai Jiao Tong University
SWE-Explore isolates the repo-exploration stage of coding agents over 848 issues. Agentic explorers crush BM25 (HitFile 0.65 vs 0.08), but line-level recall stalls at 0.15-0.20, and that gap is what limits repairs.
Retrieval-Augmented Generation · Universidad de San Andres
Treating pairwise LLM reranking as active learning, a tournament selector hits 68.00 NDCG@10 on TREC DL while cutting LLM calls 3-5x versus sorting-based PRP, plus a randomized-direction oracle that debiases in one call.
AI Agents · University of Waterloo
Direct Corpus Interaction (DCI) lets a search agent grep the raw corpus instead of calling a retriever. On BrowseComp-Plus it lifts accuracy from 69.0% to 80.0% while cutting cost 29.4%.
Multimodal Models · Shanghai AI Laboratory
CiteVQA makes document QA models return bounding-box citations with every answer. The top model scores 76.0 Strict Attributed Accuracy; the best open model just 22.5 — most answer right but cite the wrong region.
Retrieval-Augmented Generation · University of Massachusetts Amherst
GrepSeek trains an LLM to answer questions by issuing shell commands like grep against the raw corpus — no embedding index — and posts the best F1 and Exact Match across seven open-domain QA benchmarks.
Retrieval-Augmented Generation · AIRI
OCC-RAG is a pair of 0.6B and 1.7B reasoning models trained to answer strictly from the given context and refuse when the answer isn't there — matching or beating general models 2-6x their size on multi-hop QA.
Retrieval-Augmented Generation · Meta AI
The original RAG paper bolts a Wikipedia dense retriever (DPR) onto a BART seq2seq generator, set new state-of-the-art on three open-domain QA tasks, and updates knowledge by swapping the index — no retraining.