Prompt Injection Defenses
Prompt injection is the LLM version of hostile input trying to redirect system behavior.
The right mindset is:
Treat model inputs and retrieved content as untrustedDefense Layers
1. Limit capability
Do not give the model broad write access if it only needs read access.
2. Separate instructions from data
System rules should be structurally distinct from user and retrieved content.
3. Validate tool use
Even if the model asks for a tool call, validate arguments and policy before execution.
4. Sanitize and score context
Retrieved webpages, PDFs, tickets, or emails can contain hostile instructions.
5. Monitor suspicious behavior
Look for:
- repeated prompt leakage probes
- unusual tool requests
- attempts to access unrelated data
Key Reality
There is no single perfect prompt that "solves" injection.
The strongest defense is layered system design:
- scoped permissions
- validated tool calls
- adversarial evals
- human approval for risky actions
Interview Answer
How do you defend against prompt injection?
Use defense in depth: treat inputs and retrieved context as untrusted, keep model permissions narrow, validate tool calls, isolate high-risk actions behind policy checks or human approval, and continuously test the system with adversarial prompts.