Self-critique layer enforcing block credential leakage for a compliance reviewer system on Llama 3.3 70B, with bypass defenses.
Self-critique layer enforcing no biometric identification for a compliance reviewer system on Mistral Large, with bypass defenses.
Self-critique layer enforcing no self-harm content for a compliance reviewer system on o1, with bypass defenses.
Self-critique layer enforcing no medical diagnosis for a compliance reviewer system on o3, with bypass defenses.
Self-critique layer enforcing refuse hate speech for a compliance reviewer system on Grok 3, with bypass defenses.
Self-critique layer enforcing refuse PII extraction for a compliance reviewer system on Qwen 2.5 72B, with bypass defenses.
Self-critique layer enforcing no medical diagnosis for a compliance reviewer system on Claude 4.5 Sonnet, with bypass defenses.
Self-critique layer enforcing refuse hate speech for a compliance reviewer system on o1-mini, with bypass defenses.
Self-critique layer enforcing stay on topic for a compliance reviewer system on Claude Haiku 4, with bypass defenses.
Self-critique layer enforcing block credential leakage for a compliance reviewer system on o3-mini, with bypass defenses.
Self-critique layer enforcing no election manipulation for a compliance reviewer system on Gemini 2.0 Flash, with bypass defenses.
Self-critique layer enforcing cite sources with URLs for a compliance reviewer system on Command R+, with bypass defenses.