ChatGPT Prompt for Memory & Tool Use
Managed context window for long-running agents doing code PR review. Covers rolling summarization, reference-and-expand, budget allocation, and eval of context loss.
More prompts for Memory & Tool Use.
Entity-centric memory model for a legal agent performing customer support triage. Tracks people, orgs, docs, and relationships with episodic memory patterns over Milvus.
Implement a knowledge graph memory memory system for a AutoGen agent handling code PR review. Vector store: Supabase pgvector. Covers write, retrieve, prune, and eval.
Implement a knowledge graph memory memory system for a LangGraph agent handling meeting note extraction. Vector store: Chroma. Covers write, retrieve, prune, and eval.
Entity-centric memory model for a sales ops agent performing onboarding coordinator. Tracks people, orgs, docs, and relationships with knowledge graph memory patterns over pgvector.
Managed context window for long-running agents doing contract redlining. Covers rolling summarization, reference-and-expand, budget allocation, and eval of context loss.
Managed context window for long-running agents doing investor update drafting. Covers rolling summarization, reference-and-expand, budget allocation, and eval of context loss.
You maintain a long-running Claude Agent SDK agent that handles code PR review across sessions lasting hundreds of turns. The context window is the bottleneck: too full and cost/latency spike, too aggressive pruning and the agent forgets what it was doing. Design the context window manager. **Model:** o1 (context: 200k tokens effective) **Runtime:** TypeScript + Node 20 ## Part 1 — Context budget Divide the effective window into regions: - **System/instructions** (fixed) - **Memory retrievals** (dynamic, capped) - **Rolling summary** (compressed history) - **Recent turns** (raw) - **Current turn tool outputs** (dynamic) - **Response headroom** (reserved) Give a concrete byte/token split for code PR review. Justify. ## Part 2 — Rolling summarization - Trigger: turns-count OR tokens-over-threshold - What to summarize (all of history, or only stable/older chunks) - Summary prompt: must preserve entities, decisions, open questions, open tool results - Cadence: every N turns vs. on-demand Write the summarizer prompt. Must output structured sections: `decisions_made`, `facts_established`, `open_loops`, `rejected_approaches`. ## Part 3 — Reference-and-expand Instead of keeping big tool outputs inline, store them externally and keep only a reference: - Assign each tool result an ID and a 2-line summary - Store full result in Pinecone or blob storage - Provide a `fetch_result(id)` tool so the agent can re-hydrate on demand - Eviction policy for full results Write the pattern + code. ## Part 4 — Adaptive pruning When over budget, drop in this order: 1. Old tool outputs (→ reference form) 2. Old raw turns (→ roll into summary) 3. Low-salience memory retrievals 4. Never: system prompt, currently-open plan Implement as a middleware that runs before every model call. ## Part 5 — Plan persistence Across summarization cycles, the agent's current plan must survive intact. Store it as a structured object in state (not prose) and always inject the latest version near the end of the prompt. Schema: - `goal` - `steps` (each with status: todo/doing/done/blocked) - `blockers` - `next_action` ## Part 6 — Detecting context loss Instrument: - **Self-contradiction rate**: agent says something that contradicts its earlier decision - **Re-asking rate**: agent re-asks for info it already had - **Plan drift**: steps change meaning across summarization boundaries Build an eval that stress-tests long sessions (200+ turns) and measures these. ## Part 7 — Cost model Per-session, track: - Cumulative input tokens - Avg input tokens per turn (rolling) - Input tokens spent on summary vs. raw turns vs. tool output vs. memory - Cost per completed code PR review Dashboard these. ## Part 8 — Claude Agent SDK integration Implement the manager as a Claude Agent SDK-native middleware/hook/node. Must: - Run before every LLM call - Read/write the session state - Be testable in isolation (given state X + new turn Y → prompt Z) Write the code. ## Part 9 — Fail-safes - Hard cap on prompt size (never send > N tokens) - Circuit breaker: if summarization fails twice, fall back to truncation + alert - Graceful degradation: what the agent does when it has to drop important state Deliver the design, the summarizer prompt, the plan schema, the middleware code, and the long-session eval.