Production-ready Skeleton-of-Thought prompt template for funnel analysis tuned for Claude 4 Sonnet — includes few-shot examples, output schema, and eval rubric.

💬ChatGPT

871512

Prompt Engineering

Premium

Build Self-Refine Prompt for threat modeling with GPT-4.1

Production-ready Self-Refine prompt template for threat modeling tuned for GPT-4.1 — includes few-shot examples, output schema, and eval rubric.

🤖Any Model

2881509

You are designing a scratchpad-based Self-Consistency prompt to be used by a evaluation engineer working on multi-hop QA through an LLM copilot backed by GPT-4o. ## Goal The evaluation engineer pastes a problem in and gets back a structured reasoning trace plus a final answer. The scratchpad must be good enough that the evaluation engineer would forward it to a peer without being embarrassed. ## Deliverable — one single markdown document with these sections: ### 1. System prompt Role: a careful, domain-fluent evaluation engineer working on multi-hop QA. The assistant writes its reasoning in a scratchpad using Self-Consistency and never skips to answers. ### 2. Scratchpad grammar Define the exact tags or headings the model uses. For Self-Consistency, pick the natural shape: - Chain-of-Thought → numbered steps - ReAct → Thought / Action / Observation loops - Tree-of-Thoughts → branches with evaluate() + prune() comments - Self-Consistency → N independent chains, then "vote" - Reflexion → attempt / critique / revised-attempt - Plan-and-Solve → Plan then Execute(step i) - Least-to-Most → Subproblems[] then Solve(sub_i) then Combine - Self-Refine → draft → critique → refine (repeat up to 3x) ### 3. Worked example A realistic multi-hop QA problem a evaluation engineer would actually hit. Show the full scratchpad. ### 4. Final answer contract The scratchpad ends with a clean, copy-pasteable block tagged `FINAL ANSWER` in JSONL stream. Nothing may follow. ### 5. Anti-patterns List 6 anti-patterns specific to evaluation engineers misusing Self-Consistency on multi-hop QA (e.g., faux-reasoning, hindsight bias, plagiarizing the input, answer-first-then-justify). ### 6. Model tuning - Recommended temperature for GPT-4o - Whether to enable extended thinking / reasoning tokens - Stop sequences - Max-tokens budget for scratchpad vs. final answer ### 7. Safety layer - refuse PII extraction - Always refuse if the problem requires expertise the evaluation engineer would not have (escalate instead). - Never include real PII in the scratchpad even if present in the input. Keep the whole file under 1,200 tokens. Write in second person ("You are…"). Do not add front-matter prose explaining what the file is — the file is self-evidently a prompt.

Self-Consistency Scratchpad Template for evaluation engineer Doing multi-hop QA

Related prompts

ReAct Scratchpad Template for staff data scientist Doing medical triage

Debug Broken Least-to-Most Chain on API design decisions (Llama 3.3 70B)

Debug Broken Reflexion Chain on sales lead qualification (GPT-4o-mini)

Debug Broken Least-to-Most Chain on data pipeline debugging (Mistral Large)

Build Skeleton-of-Thought Prompt for funnel analysis with Claude 4 Sonnet

Build Self-Refine Prompt for threat modeling with GPT-4.1

Self-Consistency Scratchpad Template for evaluation engineer Doing multi-hop QA

Related prompts

ReAct Scratchpad Template for staff data scientist Doing medical triage

Debug Broken Least-to-Most Chain on API design decisions (Llama 3.3 70B)

Debug Broken Reflexion Chain on sales lead qualification (GPT-4o-mini)

Debug Broken Least-to-Most Chain on data pipeline debugging (Mistral Large)

Build Skeleton-of-Thought Prompt for funnel analysis with Claude 4 Sonnet

Build Self-Refine Prompt for threat modeling with GPT-4.1

Tags

Who this is for