Skip to main content

AI Prompt Injection Defense System

Build a defense layer against prompt injection: classifier, content filtering, sandboxed execution.

PythonAnthropic SDKLakera or open-source classifiersDocker

About this project

Prompt injection is the 2026 underdefended attack class. This project builds: an input classifier (detect injection attempts), output filtering (catch data exfiltration), and sandboxed tool execution (limit blast radius). Test it against the published prompt-injection benchmarks. Document what your defense catches and what slips through.

Why build this in 2026?

Prompt injection is the most-asked-about AI security topic in 2026. Almost nobody has shipped defenses.

What you'll ship

  • GitHub repo
Benchmark results
Writeup of trade-offs

Sign up to see the full project brief

Full deliverables, success criteria, and AI Career Tutor support — free.

You'll unlock:Complete project brief, AI tutor that knows this project, and progress tracking when you start.

Skills you'll practice

securitylarge language modelspython