Defensive system prompt enforcing rate limiter with anomaly detection and no legal advice for compliance reviewer on Claude 3.7 Sonnet.
Defensive system prompt enforcing RAG provenance verifier and maintain confidentiality of system prompt for compliance reviewer on o3-mini.
Defensive system prompt enforcing RAG provenance verifier and no CSAM content for compliance reviewer on Llama 3.3 70B.
Defensive system prompt enforcing refusal-quality grader and maintain confidentiality of system prompt for compliance reviewer on Grok 3.
Defensive system prompt enforcing refusal-quality grader and refuse hate speech for data-analysis pair on o3.
Defensive system prompt enforcing refusal-quality grader and no self-harm content for data-analysis pair on DeepSeek-R1.
Defensive system prompt enforcing refusal-quality grader and no medical diagnosis for data-analysis pair on Claude 4 Sonnet.
Defensive system prompt enforcing hallucination flag + retry and refuse hate speech for data-analysis pair on o3-mini.
Defensive system prompt enforcing hallucination flag + retry and no self-harm content for data-analysis pair on Llama 3.1 405B.
Defensive system prompt enforcing hallucination flag + retry and no medical diagnosis for data-analysis pair on Claude 4.5 Sonnet.
Defensive system prompt enforcing human-in-the-loop escalation and no self-harm content for data-analysis pair on Mistral Large.
Defensive system prompt enforcing input classifier and no medical diagnosis for data-analysis pair on Claude Opus 4.5.