Layered defense design for a coding copilot deployment against prompt leaking attacks attacks, using dual-LLM architecture on Claude Opus 4.5.
Adversarial test suite targeting coding copilot with fictional-character persona-style attacks, with rubric and triage flow.
Adversarial test suite targeting coding copilot with ignore previous instructions-style attacks, with rubric and triage flow.
Adversarial test suite targeting coding copilot with markdown comment smuggling-style attacks, with rubric and triage flow.
Adversarial test suite targeting coding copilot with role-reversal (user-as-assistant)-style attacks, with rubric and triage flow.
Adversarial test suite targeting coding copilot with pseudo-developer-mode-style attacks, with rubric and triage flow.
Adversarial test suite targeting coding copilot with chained encoding (ROT13 inside base64)-style attacks, with rubric and triage flow.
Adversarial test suite targeting coding copilot with DAN / 'Do Anything Now'-style attacks, with rubric and triage flow.
Adversarial test suite targeting coding copilot with grandma exploit-style attacks, with rubric and triage flow.
Adversarial test suite targeting coding copilot with hypothetical world framing-style attacks, with rubric and triage flow.
Adversarial test suite targeting SQL copilot with reverse-psychology refusal-style attacks, with rubric and triage flow.
Adversarial test suite targeting SQL copilot with translation smuggling-style attacks, with rubric and triage flow.