Claude Prompt for Structured Output & Tool Schemas
Design tool definitions and function schemas for an LLM agent doing calendar scheduling, with strict validation and error handling.
More prompts for Structured Output & Tool Schemas.
Design Pydantic BaseModel, LLM prompt, and validator for extracting job postings (title, salary, location, skills) from HTML docs with strict schema adherence.
Design JSON Schema (Draft 2020-12), LLM prompt, and validator for extracting bug reports from support chats from Confluence wikis with strict schema adherence.
Robust retry/repair loop that recovers from null vs empty string inconsistency in LLM JSON Schema (Draft 2020-12) output without looping or masking bugs.
Design Pydantic BaseModel, LLM prompt, and validator for extracting purchase orders from Jira tickets with strict schema adherence.
Design Pydantic BaseModel, LLM prompt, and validator for extracting purchase orders from research papers with strict schema adherence.
Design Pydantic BaseModel, LLM prompt, and validator for extracting insurance claim line items from regulatory filings with strict schema adherence.
You are an AI engineer designing the tool interface for an LLM agent. Clear, unambiguous tool schemas are the single biggest lever for agent reliability.
## Agent Task
The agent will perform calendar scheduling. Expected tool calls per session: 10-20. Target task completion rate: 80%.
## Tool Design Principles
1. **Each tool does one thing.** Avoid "do_anything" tools.
2. **Parameter names are unambiguous.** `user_id` not `id`. `order_total_usd` not `amount`.
3. **Rich descriptions.** Every tool and every parameter has a description an LLM can reason from.
4. **Types, not stringly-typed.** Use number, boolean, enum. Never "string that looks like a number".
5. **Explicit enums** for controlled vocabularies. List every allowed value.
6. **Idempotency.** Where possible, tools accept an idempotency key.
7. **Errors are data.** Tools return `{ok: bool, result?, error?: {code, message, retryable}}` — not exceptions that crash the agent loop.
## Tool Catalog for calendar scheduling
Define 8 tools with this structure. For each:
### Tool: `tool_name`
- **Description:** 1-3 sentences an LLM can use to decide when to call this tool
- **When to use:** 2-3 concrete trigger conditions
- **When NOT to use:** anti-patterns to steer the model away
- **Parameters:**
- `param_name` (type, required/optional): description, constraints, example values
- **Returns:** schema of the success payload
- **Errors:** error codes this tool can return and their meaning
- **Side effects:** does calling this tool mutate state? irreversibly?
Produce 8 fully-specified tools covering the workflow for calendar scheduling. Include at least:
- 1 read tool (fetch data)
- 1 search tool (find something)
- 1 write tool (mutate state)
- 1 escalation tool (hand off to human)
## JSON Schema Format (OpenAPI spec)
Output all tools as an Anthropic-tool-use compatible array:
```json
[
{
"name": "string",
"description": "string",
"input_schema": {
"type": "object",
"properties": { ... },
"required": [ ... ],
"additionalProperties": false
}
}
]
```
Set `additionalProperties: false` everywhere — this is the single biggest reliability win. With strict schemas, the model cannot smuggle in undeclared fields.
## System Prompt for the Agent
Craft a system prompt that covers:
```
You are an agent specialized in calendar scheduling.
You have access to these tools: {tool_list}.
PROTOCOL:
1. When asked to do something, first THINK about what tools you need and in what order.
2. Before making any irreversible change (create, update, delete), confirm with the user if they haven't explicitly asked for it.
3. If a tool returns an error with retryable=true, retry up to 2 times.
4. If a tool returns an error with retryable=false, explain to the user and offer alternatives.
5. Never call more than 20 tools in response to a single user message.
6. Always cite the tool result IDs when summarizing (e.g., "Order o_123 was updated").
SAFETY:
- Never call a write tool based on ambiguous user input. Confirm first.
- Never chain-call tools in ways that exceed per-minute rate limits.
- If you detect a likely loop (same tool with same args ≥ 3 times), stop and report.
```
## Validation Layer
Every tool call the LLM emits is validated BEFORE execution:
1. Parse LLM tool call → JSON object
2. Validate against the tool's `input_schema` (Pydantic / Zod / ajv)
3. Apply custom business validators (e.g., user_id must belong to the calling session)
4. On validation failure, return a structured error to the LLM:
```json
{
"ok": false,
"error": {
"code": "VALIDATION_ERROR",
"message": "Field 'amount' must be a positive number. Got: -5",
"retryable": true,
"suggested_fix": "Pass a positive number for 'amount'."
}
}
```
This teaches the model to self-correct in-loop, which beats retry-after-full-failure.
## Error Handling Per Tool
For each tool, enumerate error codes:
- `NOT_FOUND` — entity doesn't exist (retryable=false)
- `PERMISSION_DENIED` — agent session doesn't have rights (retryable=false)
- `RATE_LIMITED` — back off and retry (retryable=true, with `retry_after_ms`)
- `VALIDATION_ERROR` — input shape wrong (retryable=true after fix)
- `CONFLICT` — e.g., version mismatch (retryable=true after re-fetch)
- `DEPENDENCY_FAILURE` — external API down (retryable=true with backoff)
## Rate Limits & Quotas
- Per-session tool call cap: 25
- Per-tool per-minute: 120/min
- Total tokens per session: 50k
- Hard stop if exceeded; return "QUOTA_EXCEEDED" as a final assistant message.
## Evaluation
Build an agent eval harness with 50 multi-turn scenarios covering:
- Happy path: user goal achieved in ≤ N turns
- Error recovery: tool fails, agent retries appropriately
- Ambiguous input: agent asks clarifying question instead of guessing
- Out-of-scope request: agent refuses or hands off
- Adversarial: prompt injection via tool result content
Metrics:
- Task completion rate (human or custom rubric with chain-of-thought-scored)
- Avg tool calls per task
- Tool-call validity (schema pass rate)
- User satisfaction proxy (custom rubric with chain-of-thought rubric)
## Observability
Traces in Galileo:
- Parent span: agent session
- Child span: each tool call with input, output, latency, error
- Attach LLM thinking/reasoning where available
- Track tool-call frequency, tool-error rate, avg session length
## Deliverables
1. `tools.json` — the 8 tool definitions
2. `system_prompt.md` — the agent system prompt
3. `validators.py` / `.ts` — per-tool input validators
4. `errors.md` — error code reference
5. `agent_loop.py` / `.ts` — reference implementation with retries, rate limits, quota enforcement
6. `eval_scenarios.jsonl` + eval harness
7. Galileo dashboard
Organize your output using a clear framework with labeled sections. Each section should build on the previous one.Replace the bracketed placeholders with your own context before running the prompt:
[...]— fill in your specific ....