AI Agents · Language Models

COLLEAGUE.SKILL: Turning One Person's Expertise Into a Portable AI Skill

COLLEAGUE.SKILL distills one person's work traces into a versioned skill package with two tracks — capability and bounded behavior — that any agent can install, correct, and roll back. The open repo reports ~18.5k stars.

COLLEAGUE.SKILL: Turning One Person's Expertise Into a Portable AI Skill

Quick answer

COLLEAGUE.SKILL is an automated pipeline that turns a person’s heterogeneous work traces — documents, decisions, message threads — into a versioned skill package an AI agent can install. Each package carries two coordinated tracks: a capability track (practices, mental models, decision heuristics) and a bounded behavior track (communication style, interaction rules, and a correction history). The package is inspectable, editable through plain-language feedback, version-rollback-able, and portable across agent hosts. At time of writing the open-source system reports roughly 18.5k GitHub stars and a gallery of 215 skills from 165 contributors.

The gap it targets

Most “make this agent act like person X” attempts pick one of two broken halves. Memory systems hoard fragments of what a person did but never compile them into something reusable. Persona prompts hand-write a personality but have no grounding in real work and no way to be corrected when they drift. Meanwhile, skill-packaging formats (the kind that ship as folders of instructions) define a clean container but say nothing about how to fill it from messy human evidence. COLLEAGUE.SKILL is an end-to-end answer to the missing middle: the trace-to-skill distillation step that takes raw expert material in and emits a structured, correctable artifact.

How the two-track package works

The design choice that matters is splitting what someone can do from how they do it. The capability track captures transferable know-how — the heuristics an expert applies, the mental models behind a decision, the standard practices they follow. The bounded behavior track captures the interaction layer — tone, the rules they hold to, and crucially a running log of corrections. “Bounded” is the operative word: the behavior is meant to stay inside explicit, auditable limits rather than free-associate from a vibe.

Around the artifact sits a lifecycle, not a one-shot generation. You can inspect the package to see why it behaves as it does, invoke it inside an agent, update it by typing natural-language feedback (“you over-explain — be terser”), roll back to a previous version when an edit makes things worse, install it across different agent hosts, and optionally prepare it for controlled distribution. The paper frames this as an artifact contract plus a correction lifecycle, with domain presets for common roles.

Why this is timely

The agent-skills format went mainstream in 2025–2026, and the obvious next problem is authoring: hand-writing a good skill is slow, and there’s no principled way to derive one from how a real person actually works. COLLEAGUE.SKILL’s bet is that the valuable unit isn’t a frozen prompt or a hidden memory store — it’s a correctable package you can read, diff, and revert like code. That reframes person-grounded agents as a version-controlled artifact problem, which is a far more maintainable footing than prompt-tweaking.

Key results

  • The public repository reports approximately 18.5k GitHub stars at the time of writing — strong real-world adoption rather than a lab demo.
  • The gallery lists 215 skills from 165 contributors, evidence the format is being authored by many hands, not just the original team.
  • Listed skill cards report over 100k cumulative stars, indicating the distributed skills draw their own attention.
  • The artifact is two-track and versioned: every package separates capability from bounded behavior and keeps a correction history that supports rollback.

Limits and open questions

The headline numbers are adoption metrics, not quality measurements. 18.5k stars and 215 community skills tell you people find the tool worth installing; they do not tell you whether a distilled skill actually reproduces an expert’s judgment, how often corrections are needed, or how the packages perform against a baseline. There is no reported benchmark, controlled human study, or task-success comparison in the abstract — so the central claim (“agents can carry bounded human expertise”) is demonstrated by a working system and community uptake, not by measured fidelity. Two harder questions also stay open: distilling a real person’s traces raises consent and likeness concerns the system’s “controlled distribution” only gestures at, and “bounded behavior” is only as safe as the bounds someone remembered to write. Treat this as a well-adopted engineering artifact and a useful design pattern, not as proof that expertise transfers cleanly.

FAQ

What does COLLEAGUE.SKILL actually generate?

A versioned skill package with two tracks — a capability track for practices, mental models, and decision heuristics, and a bounded behavior track for communication style, interaction rules, and a correction history. It is built to be inspected, invoked, corrected via natural language, rolled back, and installed across agent hosts.

How is COLLEAGUE.SKILL different from a persona prompt or a memory system?

A persona prompt is hand-written with no grounding in real work and no correction path; a memory system stores fragments but never compiles them into a reusable unit. COLLEAGUE.SKILL distills real expert traces into a structured, version-controlled package you can read, edit, and revert like code.

Yes — it is an open-source system that, at the time of writing, reports roughly 18.5k GitHub stars, with a gallery of 215 skills from 165 contributors and over 100k cumulative stars across listed skill cards.

Does COLLEAGUE.SKILL prove an AI can replicate a human expert?

No. The reported figures are adoption metrics, not fidelity measurements. The paper shows a working trace-to-skill pipeline and strong community uptake, but provides no benchmark or human study measuring how well a distilled skill reproduces the original person’s judgment.

One line: it treats a person’s expertise as a correctable, version-controlled package rather than an opaque prompt or hidden memory. Read the original paper on arXiv.