Design an eval harness for incident post-mortems using embedding distance that tracks cost-per-correct-answer across prompt versions on o3.
Design an eval harness for incident post-mortems using rubric scoring that tracks token cost across prompt versions on Command R+.
Use PromptBreeder to optimize a code generation prompt on GPT-4o against token cost without regressing safety.
Use PromptBreeder to optimize a multi-hop QA prompt on Qwen 2.5 72B against p95 latency without regressing safety.
Use PromptBreeder to optimize a medical triage prompt on Gemini 2.5 Pro against p95 latency without regressing safety.
Use PromptBreeder to optimize a financial report analysis prompt on GPT-4.1 against accuracy without regressing safety.
Use PromptBreeder to optimize a scientific literature review prompt on o1 against accuracy without regressing safety.
Use PromptBreeder to optimize a bug root-cause analysis prompt on Gemini 2.0 Flash against F1 score without regressing safety.
Use PromptBreeder to optimize a incident post-mortems prompt on Claude 3.5 Sonnet against F1 score without regressing safety.
Use PromptBreeder to optimize a threat modeling prompt on o1-mini against F1 score without regressing safety.
Use PromptBreeder to optimize a schema migration planning prompt on DeepSeek-R1 against factuality without regressing safety.
Use PromptBreeder to optimize a funnel analysis prompt on Claude 3.7 Sonnet against factuality without regressing safety.