Claude Prompt for Agent Architectures (ReAct, Plan-Execute, Multi-agent)
Refactor an existing single-loop tool-calling agent for code PR review into a Plan-and-Execute architecture using AutoGen. Focus: what to split, what to keep, what to evaluate.
More prompts for Agent Architectures (ReAct, Plan-Execute, Multi-agent).
End-to-end CodeAct (code as action) agent implemented in Vercel AI SDK for SEO keyword research. Includes graph/state design, tool wiring, loop termination, observability via Braintrust, and evals.
Multi-agent loop-until-done with critic system in AutoGen tackling SQL report writing in a e-commerce context. Roles, handoffs, shared state, and supervisor logic.
Multi-agent loop-until-done with critic system in Inngest agent-kit tackling onboarding coordinator in a HR context. Roles, handoffs, shared state, and supervisor logic.
Agent loop that critiques and revises its own output for customer support triage. Full trace capture via LangSmith, retry budget, and ship criteria.
Agent loop that critiques and revises its own output for incident postmortem drafting. Full trace capture via OpenTelemetry + Honeycomb, retry budget, and ship criteria.
Agent loop that critiques and revises its own output for content calendar planning. Full trace capture via Weights & Biases Weave, retry budget, and ship criteria.
You are reviewing a working but brittle code PR review agent. It's a single-loop tool-calling agent (one LLM in a while-loop with N tools) and it's hitting a ceiling. Your job: refactor it to Plan-and-Execute using AutoGen without regressing what works. **Model:** Claude Sonnet 4.5 **Runtime:** Deno 2 ## Part 1 — Honest baseline Before touching the code, run the existing agent on 50 held-out code PR review inputs and record: - Success rate - Avg steps to completion - Cost per run - The 10 most common failure modes (categorized) You will grade the refactor against this baseline. **If the refactor is worse on cost or latency without clearly better success, you roll back.** ## Part 2 — Decide what to split Not every problem needs multi-agent. For code PR review, decide: - Is the failure mode "tool confusion" (too many tools → router)? - Is it "shallow reasoning" (→ Plan-and-Execute or Tree-of-Thoughts)? - Is it "confidently wrong" (→ Reflexion / critic)? - Is it "context bloat" (→ worker agents with scoped context)? Match the pathology to Plan-and-Execute. If Plan-and-Execute doesn't match the actual pathology, refuse the refactor and propose the right one. ## Part 3 — Migration plan Write a phased plan: 1. **Phase 0:** freeze the old agent, lock baseline metrics 2. **Phase 1:** extract shared state + tool layer (no behavior change) 3. **Phase 2:** introduce the new Plan-and-Execute scaffolding alongside the old, behind a feature flag 4. **Phase 3:** dual-run on a shadow traffic slice, compare 5. **Phase 4:** promote if wins on success AND no worse on cost/latency 6. **Phase 5:** delete the old path ## Part 4 — Refactor in AutoGen Write the new code. Show the diff-style structure: - What moved from a single system prompt into specialized agent prompts - How the old tool list was partitioned by role - How state replaces previous ad-hoc scratchpads - Where the new control edges are in the graph ## Part 5 — Compatibility surface The agent is called from somewhere. Preserve its input/output contract so callers don't break. Document: - Input schema (unchanged) - Output schema (unchanged) - New trace structure (may change — callers of logs care) ## Part 6 — New failure modes Plan-and-Execute introduces new failure modes the single-loop didn't have (e.g. agents getting stuck in a loop handing off to each other). List them and add guards. ## Part 7 — Eval Re-run the 50 held-out inputs. Compare head-to-head: - Success rate - Cost - Latency p50/p95 - New qualitative failure modes Decision rule: ship only if success improves by ≥10% OR cost drops by ≥20% with equal success. ## Part 8 — Rollback - How to flip the flag back - Data migration concerns (trace format, state serialization) - Communication plan Produce the new agent code, the migration plan, the eval report template, and the rollback runbook.