AI Prompt for Reasoning Patterns (CoT, ReAct, ToT)
Scratchpad-style Self-Consistency prompt for a evaluation engineer working on multi-hop QA, tuned for GPT-4o.
More prompts for Reasoning Patterns (CoT, ReAct, ToT).
Scratchpad-style ReAct prompt for a staff data scientist working on medical triage, tuned for o3-mini.
Diagnose why a Least-to-Most prompt is failing on API design decisions with Llama 3.3 70B and produce a fix plan.
Diagnose why a Reflexion prompt is failing on sales lead qualification with GPT-4o-mini and produce a fix plan.
Diagnose why a Least-to-Most prompt is failing on data pipeline debugging with Mistral Large and produce a fix plan.
Production-ready Skeleton-of-Thought prompt template for funnel analysis tuned for Claude 4 Sonnet — includes few-shot examples, output schema, and eval rubric.
Production-ready Self-Refine prompt template for threat modeling tuned for GPT-4.1 — includes few-shot examples, output schema, and eval rubric.
You are designing a scratchpad-based Self-Consistency prompt to be used by a evaluation engineer working on multi-hop QA through an LLM copilot backed by GPT-4o.
## Goal
The evaluation engineer pastes a problem in and gets back a structured reasoning trace plus a final answer. The scratchpad must be good enough that the evaluation engineer would forward it to a peer without being embarrassed.
## Deliverable — one single markdown document with these sections:
### 1. System prompt
Role: a careful, domain-fluent evaluation engineer working on multi-hop QA. The assistant writes its reasoning in a scratchpad using Self-Consistency and never skips to answers.
### 2. Scratchpad grammar
Define the exact tags or headings the model uses. For Self-Consistency, pick the natural shape:
- Chain-of-Thought → numbered steps
- ReAct → Thought / Action / Observation loops
- Tree-of-Thoughts → branches with evaluate() + prune() comments
- Self-Consistency → N independent chains, then "vote"
- Reflexion → attempt / critique / revised-attempt
- Plan-and-Solve → Plan then Execute(step i)
- Least-to-Most → Subproblems[] then Solve(sub_i) then Combine
- Self-Refine → draft → critique → refine (repeat up to 3x)
### 3. Worked example
A realistic multi-hop QA problem a evaluation engineer would actually hit. Show the full scratchpad.
### 4. Final answer contract
The scratchpad ends with a clean, copy-pasteable block tagged `FINAL ANSWER` in JSONL stream. Nothing may follow.
### 5. Anti-patterns
List 6 anti-patterns specific to evaluation engineers misusing Self-Consistency on multi-hop QA (e.g., faux-reasoning, hindsight bias, plagiarizing the input, answer-first-then-justify).
### 6. Model tuning
- Recommended temperature for GPT-4o
- Whether to enable extended thinking / reasoning tokens
- Stop sequences
- Max-tokens budget for scratchpad vs. final answer
### 7. Safety layer
- refuse PII extraction
- Always refuse if the problem requires expertise the evaluation engineer would not have (escalate instead).
- Never include real PII in the scratchpad even if present in the input.
Keep the whole file under 1,200 tokens. Write in second person ("You are…"). Do not add front-matter prose explaining what the file is — the file is self-evidently a prompt.