AgentPatterns: Product Playbook for an Engineering Knowledge Base

You want a knowledge base that engineers actually reuse: patterns, runbooks, and evaluable playbooks for agents/LLM systems—so teams ship faster without repeating the same failures.

What this is (and what it is not)

AgentPatterns is a productized playbook repository for building and operating LLM/agent systems.

It is designed to be:

Actionable: copy/pasteable procedures and checklists
Operational: includes failure modes and rollback/diagnosis notes
Evaluable: “what good looks like” + minimal regression tests

It is not a blog. If it can’t be executed or verified, it doesn’t belong here.

Content contract: every playbook must answer

When should I use this? (scope, prerequisites, constraints)
What decision does it help me make? (tradeoffs, default recommendation)
How do I implement it? (steps)
How does it fail in production? (failure modes + observability)
How do I know it works? (minimal eval + regression cases)

If a doc doesn’t have at least (2), (4), and (5), it is likely “advice”, not a playbook.

Publishing model

This site intentionally uses an admin-only publishing flow.

It keeps content consistent (no drive‑by edits).
It allows stronger opinions and clearer defaults.
It avoids building a full auth product before there’s traction.

The public site only lists playbooks with status = PUBLISHED.

Recommended first 10 playbooks (high leverage)

If you want this repository to become useful quickly, start with these:

Tool‑Calling judgment: when to call tools vs. answer directly
Tool‑call parsing strategy: safe fallbacks + audit logging
RAG debugging: retrieval failures, wrong retrieval, context pollution
Minimal eval harness: 20 cases, scoring rubric, regression runner
Prompt+policy structure for agents (separation of concerns)
Retry/backoff/circuit breaking for flaky tools
Idempotency + dedupe for side-effect tools
Observability: tracing, tool-call logs, error taxonomy
Caching strategy: what to cache and what not to cache
Red‑team prompts for your top 3 failure modes

References

Google SRE Workbook (runbooks, operational checklists): https://sre.google/workbook/
OpenAI Function Calling guide (canonical interface patterns): https://platform.openai.com/docs/guides/function-calling
LocalLLaMA discussion on tool-calling judgment + pitfalls (community evidence): https://www.reddit.com/r/LocalLLaMA/comments/1r4ie8z/i_tested_21_small_llms_on_toolcalling_judgment/