Prompt Injection Defense That Actually Helps

Prompt injection gets talked about in two useless ways. One camp acts like a clever prompt wrapper solves it forever. The other talks as if the whole thing is hopeless, so you may as well give up. Both positions are lazy.

The real answer is the normal engineering answer: use layers, narrow the surface area, and make the system less gullible than it would otherwise be. That is not glamorous, but it is real.

The goal is not to make injection impossible. The goal is to make casual compromise materially harder and the blast radius materially smaller.

What Still Helps

Separate instructions from user data as clearly as possible.
Keep tool scopes small and permissioned.
Default to read-only where you can.
Require review or secondary confirmation before meaningful writes.
Preserve citations, logs, and traces so the model cannot hide what it used.
Refuse to let the model silently invent capabilities it does not have.

I still like request-specific delimiters and other simple separation tricks. They are not magic. They are just better than pretending raw user text and system instructions belong in one undifferentiated soup.

Where People Really Get Hurt

Not usually because somebody found the most sophisticated attack in a paper. More often because a team built a system with no boundaries, no review points, and no operational humility. They gave the model access to too much, trusted citations that were not there, or let “just this one internal assistant” drift into a write-capable workflow with no real controls.

That is why I care more about bounded systems than clever prompt tricks. If the model can only read a narrow slice, call a small set of tools, and trigger actions that are auditable and reversible, your failure mode looks a lot better even if prompt separation is imperfect.

Security Posture, Not Security Theater

Secure AI in a real environment should look boring. Small permissions. Narrow interfaces. Human checkpoints. Traceability. Good defaults. The teams that want a magical omnipotent assistant usually do not actually want security. They want excitement. That is a different thing.

For us, a safe AI system is one that can still be reasoned about after the demo energy wears off. If you cannot explain where the instructions came from, what the model saw, what tools it could touch, and how an action gets reviewed, you are not done building yet.

Prompt injection defense that actually helps.

What Still Helps

Where People Really Get Hurt

Security Posture, Not Security Theater