Category Not Found

472 prompts

Sort:

Trace Analysis Playbook for agent with tool-use LLM App in Phoenix (Arize) (Go)

Instrument, query, and triage agent with tool-use LLM app traces in Phoenix (Arize) with Go SDK, covering latency, cost, and quality dashboards.

💬ChatGPT

3181176

AI Engineering & LLM Apps

Premium

Trace Analysis Playbook for content moderation LLM App in Phoenix (Arize) (Ruby)

Instrument, query, and triage content moderation LLM app traces in Phoenix (Arize) with Ruby SDK, covering latency, cost, and quality dashboards.

🟠Claude

4041159

AI Engineering & LLM Apps

Free

Trace Analysis Playbook for voice agent LLM App in Phoenix (Arize) (Ruby)

Instrument, query, and triage voice agent LLM app traces in Phoenix (Arize) with Ruby SDK, covering latency, cost, and quality dashboards.

🤖Any Model

3561466

AI Engineering & LLM Apps

Free

Regression Test Suite for customer support chat LLM App

Golden-set regression harness for customer support chat with GPT-4.1 rubric scorer scoring, CI integration, and budget-aware runs.

🟠Claude

305413

AI Engineering & LLM Apps

Free

Regression Test Suite for RAG over internal docs LLM App

Golden-set regression harness for RAG over internal docs with Claude Opus 4.5 pairwise scoring, CI integration, and budget-aware runs.

🟠Claude

349120

AI Engineering & LLM Apps

Premium

Regression Test Suite for code review agent LLM App

Golden-set regression harness for code review agent with Ragas faithfulness judge scoring, CI integration, and budget-aware runs.

💬ChatGPT

112888

AI Engineering & LLM Apps

Premium

Regression Test Suite for SQL generation LLM App

Golden-set regression harness for SQL generation with Claude Sonnet 4.5 rubric scorer scoring, CI integration, and budget-aware runs.

🤖Any Model

144134

AI Engineering & LLM Apps

Premium

Regression Test Suite for medical Q&A LLM App

Golden-set regression harness for medical Q&A with Arena-Hard-Auto scoring, CI integration, and budget-aware runs.

🟠Claude

328344

AI Engineering & LLM Apps

Premium

Regression Test Suite for legal analysis LLM App

Golden-set regression harness for legal analysis with G-Eval with Gemini 2.5 Pro scoring, CI integration, and budget-aware runs.

💬ChatGPT

328184

AI Engineering & LLM Apps

Premium

Regression Test Suite for tool-use agent LLM App

Golden-set regression harness for tool-use agent with Arena-Hard-Auto scoring, CI integration, and budget-aware runs.

🤖Any Model

1931304

AI Engineering & LLM Apps

Premium

A/B Rollout and Drift Detection for jailbreak resistance in agent-based workflows

Design A/B rollout analysis and drift detection for jailbreak resistance on a production LLM app in agent-based workflows.

🟠Claude

99715

AI Engineering & LLM Apps

Premium

A/B Rollout and Drift Detection for jailbreak resistance in code assistant

Design A/B rollout analysis and drift detection for jailbreak resistance on a production LLM app in code assistant.

💬ChatGPT

87361