Defensive system prompt enforcing input classifier and refuse hate speech for data-analysis pair on GPT-4o.
Defensive system prompt enforcing input classifier and no self-harm content for writing editor on Mistral Large.
Defensive system prompt enforcing per-turn policy check and no medical diagnosis for writing editor on Claude Opus 4.5.
Defensive system prompt enforcing per-turn policy check and no self-harm content for writing editor on Mistral Small 3.
Defensive system prompt enforcing tool-authorization gate and no medical diagnosis for writing editor on Claude Haiku 4.
Defensive system prompt enforcing tool-authorization gate and refuse hate speech for writing editor on GPT-4.1.
Defensive system prompt enforcing output PII redactor and no self-harm content for writing editor on Qwen 2.5 72B.
Defensive system prompt enforcing output PII redactor and no medical diagnosis for writing editor on Gemini 2.0 Flash.
Defensive system prompt enforcing jailbreak detector and no self-harm content for writing editor on o1.
Defensive system prompt enforcing jailbreak detector and no medical diagnosis for writing editor on DeepSeek-V3.
Defensive system prompt enforcing jailbreak detector and no biometric identification for interview practice coach on GPT-4.1.
Defensive system prompt enforcing rate limiter with anomaly detection and stay on topic for interview practice coach on o1.