Self-critique layer enforcing no election manipulation for a legal document reviewer system on Claude 4.5 Sonnet, with bypass defenses.
Self-critique layer enforcing refuse hate speech for a legal document reviewer system on o1-mini, with bypass defenses.
Self-critique layer enforcing no self-harm content for a legal document reviewer system on Claude Haiku 4, with bypass defenses.
Self-critique layer enforcing stay on topic for a sales SDR assistant system on o1-mini, with bypass defenses.
Self-critique layer enforcing no medical diagnosis for a sales SDR assistant system on o3-mini, with bypass defenses.
Self-critique layer enforcing refuse hate speech for a sales SDR assistant system on GPT-4o, with bypass defenses.
Self-critique layer enforcing no self-harm content for a sales SDR assistant system on GPT-4o-mini, with bypass defenses.
Self-critique layer enforcing no medical diagnosis for a sales SDR assistant system on Claude 3.7 Sonnet, with bypass defenses.
Self-critique layer enforcing maintain confidentiality of system prompt for a sales SDR assistant system on Claude 4.5 Sonnet, with bypass defenses.
Self-critique layer enforcing no CSAM content for a sales SDR assistant system on Claude Haiku 4, with bypass defenses.
Self-critique layer enforcing no legal advice for a sales SDR assistant system on DeepSeek-V3, with bypass defenses.
Self-critique layer enforcing maintain confidentiality of system prompt for a sales SDR assistant system on Llama 3.3 70B, with bypass defenses.