Category Not Found

1252 prompts

Sort:

Build BLEU/ROUGE Eval Harness for incident post-mortems on Claude 4.5 Sonnet

Design an eval harness for incident post-mortems using BLEU/ROUGE that tracks inter-judge agreement across prompt versions on Claude 4.5 Sonnet.

Build regex match checks Eval Harness for incident post-mortems on Claude Haiku 4

Design an eval harness for incident post-mortems using regex match checks that tracks cost-per-correct-answer across prompt versions on Claude Haiku 4.

Build DeepEval metrics Eval Harness for incident post-mortems on Gemini 2.0 Flash

Design an eval harness for incident post-mortems using DeepEval metrics that tracks cost-per-correct-answer across prompt versions on Gemini 2.0 Flash.

Build semantic similarity Eval Harness for incident post-mortems on DeepSeek-R1

Design an eval harness for incident post-mortems using semantic similarity that tracks token cost across prompt versions on DeepSeek-R1.

Build BERTScore Eval Harness for incident post-mortems on Llama 3.1 405B

Design an eval harness for incident post-mortems using BERTScore that tracks token cost across prompt versions on Llama 3.1 405B.

Build promptfoo assertions Eval Harness for incident post-mortems on Mistral Small 3

Design an eval harness for incident post-mortems using promptfoo assertions that tracks token cost across prompt versions on Mistral Small 3.

Build semantic similarity Eval Harness for incident post-mortems on o1-mini

Design an eval harness for incident post-mortems using semantic similarity that tracks p95 latency across prompt versions on o1-mini.

Build BERTScore Eval Harness for incident post-mortems on o3-mini

Design an eval harness for incident post-mortems using BERTScore that tracks p95 latency across prompt versions on o3-mini.

Build promptfoo assertions Eval Harness for incident post-mortems on Command R+

Design an eval harness for incident post-mortems using promptfoo assertions that tracks accuracy across prompt versions on Command R+.

Build human pairwise comparison Eval Harness for incident post-mortems on GPT-4.1

Design an eval harness for incident post-mortems using human pairwise comparison that tracks accuracy across prompt versions on GPT-4.1.

Build factuality with retrieval Eval Harness for incident post-mortems on Claude 3.5 Sonnet

Design an eval harness for incident post-mortems using factuality with retrieval that tracks F1 score across prompt versions on Claude 3.5 Sonnet.

Build embedding distance Eval Harness for incident post-mortems on Claude 4.5 Sonnet

Design an eval harness for incident post-mortems using embedding distance that tracks F1 score across prompt versions on Claude 4.5 Sonnet.

💬ChatGPT

701172