Engineering Playbook — Internal

Agentic AI: What Works Today

A practical guide to shipping reliable AI agents — architecture patterns, eval frameworks, and the complexity traps to avoid.

Recommended: Thin Agent, Fat Tooling

Instead of complex multi-agent orchestration with self-evaluation loops, use a single LLM orchestrator surrounded by deterministic scaffolding. The LLM decides what to do. Deterministic code does it. Structured validation checks outputs.

Pipeline Architecture

INPUT

Structured Request

Schema-validated input. Reject malformed requests before touching the LLM.

→

LLM CALL

Single Orchestrator

One well-prompted call with structured output (JSON schema / tool use). No chaining.

→

VALIDATION

Deterministic Checks

Schema conformity, required fields, length bounds, entity verification. No LLM needed.

→

FALLBACK

Retry / Escalate

Validation fails → retry with modified prompt (max 2x). Still failing → human queue.

→

OUTPUT

Deterministic Action

API call, DB write, notification — executed by code, not the LLM. Auditable.

✗ Anti-patterns

–LLM-as-judge evaluating its own output

–Multi-agent debate / consensus loops

–Unbounded autonomous retries

–Self-modifying prompts at runtime

–Chains of 5+ sequential LLM calls

✓ What works in production

–Single orchestrator + deterministic tools

–Schema validation on every LLM output

–Human-in-the-loop for low-confidence paths

–Max 2 retries then escalate

–Separate stronger model for eval (if needed)