Adversarial test suite targeting data-analysis pair with role-reversal (user-as-assistant)-style attacks, with rubric and triage flow.
Adversarial test suite targeting data-analysis pair with pseudo-developer-mode-style attacks, with rubric and triage flow.
Adversarial test suite targeting data-analysis pair with chained encoding (ROT13 inside base64)-style attacks, with rubric and triage flow.
Adversarial test suite targeting data-analysis pair with DAN / 'Do Anything Now'-style attacks, with rubric and triage flow.
Adversarial test suite targeting data-analysis pair with grandma exploit-style attacks, with rubric and triage flow.
Adversarial test suite targeting data-analysis pair with hypothetical world framing-style attacks, with rubric and triage flow.
Adversarial test suite targeting data-analysis pair with translation smuggling-style attacks, with rubric and triage flow.
Adversarial test suite targeting data-analysis pair with fictional-character persona-style attacks, with rubric and triage flow.
Adversarial test suite targeting data-analysis pair with 'you are no longer Claude'-style attacks, with rubric and triage flow.
Adversarial test suite targeting data-analysis pair with reverse-psychology refusal-style attacks, with rubric and triage flow.
Adversarial test suite targeting writing editor with hypothetical world framing-style attacks, with rubric and triage flow.
Adversarial test suite targeting writing editor with reverse-psychology refusal-style attacks, with rubric and triage flow.