logodev atlas
1 min read

Prompt Injection Defenses

Prompt injection is the LLM version of hostile input trying to redirect system behavior.

The right mindset is:

Treat model inputs and retrieved content as untrusted

Defense Layers

1. Limit capability

Do not give the model broad write access if it only needs read access.

2. Separate instructions from data

System rules should be structurally distinct from user and retrieved content.

3. Validate tool use

Even if the model asks for a tool call, validate arguments and policy before execution.

4. Sanitize and score context

Retrieved webpages, PDFs, tickets, or emails can contain hostile instructions.

5. Monitor suspicious behavior

Look for:

  • repeated prompt leakage probes
  • unusual tool requests
  • attempts to access unrelated data

Key Reality

There is no single perfect prompt that "solves" injection.

The strongest defense is layered system design:

  • scoped permissions
  • validated tool calls
  • adversarial evals
  • human approval for risky actions

Interview Answer

How do you defend against prompt injection?

Use defense in depth: treat inputs and retrieved context as untrusted, keep model permissions narrow, validate tool calls, isolate high-risk actions behind policy checks or human approval, and continuously test the system with adversarial prompts.

[prev·next]