Token-cost and latency reduction playbook for a math word problems prompt running on Claude 3.7 Sonnet, judged by human pairwise comparison.
Token-cost and latency reduction playbook for a math word problems prompt running on Claude 4.5 Sonnet, judged by rubric scoring.
Token-cost and latency reduction playbook for a math word problems prompt running on Gemini 2.5 Pro, judged by G-Eval.
Token-cost and latency reduction playbook for a math word problems prompt running on DeepSeek-V3, judged by G-Eval.
Token-cost and latency reduction playbook for a math word problems prompt running on Llama 3.3 70B, judged by Trulens feedback functions.
Token-cost and latency reduction playbook for a math word problems prompt running on Mistral Large, judged by Trulens feedback functions.
Token-cost and latency reduction playbook for a math word problems prompt running on Qwen 2.5 72B, judged by DeepEval metrics.
Token-cost and latency reduction playbook for a math word problems prompt running on o3, judged by promptfoo assertions.
Token-cost and latency reduction playbook for a math word problems prompt running on Grok 3, judged by promptfoo assertions.
Token-cost and latency reduction playbook for a math word problems prompt running on GPT-4o, judged by embedding distance.
Use manual grid search over temperature+system to optimize a contract review prompt on Claude 4 Sonnet against hallucination rate without regressing safety.
Use manual grid search over temperature+system to optimize a customer support routing prompt on o3-mini against user satisfaction (CSAT) without regressing safety.