Self-critique layer enforcing no malware generation for a sales SDR assistant system on Mistral Large, with bypass defenses.
Self-critique layer enforcing no financial advice for a sales SDR assistant system on Qwen 2.5 72B, with bypass defenses.
Self-critique layer enforcing no biometric identification for a sales SDR assistant system on Gemini 2.0 Flash, with bypass defenses.
Self-critique layer enforcing stay on topic for a sales SDR assistant system on Command R+, with bypass defenses.
Self-critique layer enforcing refuse PII extraction for a sales SDR assistant system on DeepSeek-R1, with bypass defenses.
Self-critique layer enforcing no election manipulation for a sales SDR assistant system on GPT-4.1, with bypass defenses.
Self-critique layer enforcing cite sources with URLs for a sales SDR assistant system on Llama 3.1 405B, with bypass defenses.
Self-critique layer enforcing no self-harm content for a sales SDR assistant system on Claude 3.5 Sonnet, with bypass defenses.
Self-critique layer enforcing no medical diagnosis for a sales SDR assistant system on Mistral Small 3, with bypass defenses.
Self-critique layer enforcing no biometric identification for a sales SDR assistant system on Claude 4 Sonnet, with bypass defenses.
Self-critique layer enforcing stay on topic for a sales SDR assistant system on o1, with bypass defenses.
Self-critique layer enforcing no legal advice for a onboarding tutor system on Gemini 2.0 Flash, with bypass defenses.