Engineering Playbook — Internal

Agentic AI: What Works Today

A practical guide to shipping reliable AI agents — architecture patterns, eval frameworks, and the complexity traps to avoid.

Recommended: Thin Agent, Fat Tooling

Instead of complex multi-agent orchestration with self-evaluation loops, use a single LLM orchestrator surrounded by deterministic scaffolding. The LLM decides what to do. Deterministic code does it. Structured validation checks outputs.

Pipeline Architecture
INPUT
Structured Request
Schema-validated input. Reject malformed requests before touching the LLM.
LLM CALL
Single Orchestrator
One well-prompted call with structured output (JSON schema / tool use). No chaining.
VALIDATION
Deterministic Checks
Schema conformity, required fields, length bounds, entity verification. No LLM needed.
FALLBACK
Retry / Escalate
Validation fails → retry with modified prompt (max 2x). Still failing → human queue.
OUTPUT
Deterministic Action
API call, DB write, notification — executed by code, not the LLM. Auditable.
✗ Anti-patterns
LLM-as-judge evaluating its own output
Multi-agent debate / consensus loops
Unbounded autonomous retries
Self-modifying prompts at runtime
Chains of 5+ sequential LLM calls
✓ What works in production
Single orchestrator + deterministic tools
Schema validation on every LLM output
Human-in-the-loop for low-confidence paths
Max 2 retries then escalate
Separate stronger model for eval (if needed)