Refactor a baseline SQL query writing prompt into a Maieutic Prompting version and compare quality on o1.
Refactor a baseline API design decisions prompt into a Maieutic Prompting version and compare quality on GPT-4.1.
Refactor a baseline A/B test interpretation prompt into a Maieutic Prompting version and compare quality on Claude Opus 4.5.
Refactor a baseline research synthesis prompt into a Contrastive Chain-of-Thought version and compare quality on Qwen 2.5 72B.
Refactor a baseline legal brief summarization prompt into a Contrastive Chain-of-Thought version and compare quality on Gemini 2.0 Flash.
Refactor a baseline API design decisions prompt into a Contrastive Chain-of-Thought version and compare quality on GPT-4o-mini.
Refactor a baseline incident post-mortems prompt into a Contrastive Chain-of-Thought version and compare quality on o1-mini.
Refactor a baseline technical spec writing prompt into a Contrastive Chain-of-Thought version and compare quality on DeepSeek-V3.
Refactor a baseline data pipeline debugging prompt into a Contrastive Chain-of-Thought version and compare quality on Claude 3.7 Sonnet.
Refactor a baseline A/B test interpretation prompt into a Contrastive Chain-of-Thought version and compare quality on o3.
Refactor a baseline resume screening prompt into a Contrastive Chain-of-Thought version and compare quality on DeepSeek-R1.
Refactor a baseline math word problems prompt into a Contrastive Chain-of-Thought version and compare quality on Claude 4 Sonnet.