5 min read
Agentic AI — Interview & Tricky Questions
Q1: What's the difference between a chain and an agent?
Chain:
- Fixed sequence of steps decided at design time
- Deterministic: same path every time
- Fast: no overhead for decision-making
- Predictable cost (fixed number of LLM calls)
- Example: prompt | llm | parser
Agent:
- LLM decides which steps to take at runtime
- Non-deterministic: path varies based on task
- Slower: LLM must reason before each action
- Variable cost (unknown number of LLM calls)
- Example: ReAct loop with tool calling
Rule of thumb:
If you know exactly what steps are needed → use a chain
If the steps depend on the input/result → use an agentQ2: How do you ensure an agent doesn't take destructive actions?
Answer:
python# Layer 1: Tool design — explicit descriptions with warnings
@tool
def delete_record(record_id: str) -> str:
"""
PERMANENT: Delete a record. This cannot be undone.
Only call this when the user has EXPLICITLY confirmed deletion
with the exact record ID they want to delete.
"""
db.delete(record_id)
# Layer 2: Human-in-the-loop before dangerous tools
from langgraph.graph import StateGraph
from langgraph.checkpoint.memory import MemorySaver
DANGEROUS_TOOLS = {"delete_record", "send_email", "deploy_code"}
def route_to_human(state):
last_message = state["messages"][-1]
if last_message.tool_calls:
for tc in last_message.tool_calls:
if tc["name"] in DANGEROUS_TOOLS:
return "human_review" # Pause here
return "tools"
# interrupt_before causes the graph to pause
app = workflow.compile(
checkpointer=MemorySaver(),
interrupt_before=["human_review"]
)
# Layer 3: Confirmation token in tool args
@tool
def delete_record(record_id: str, confirmation: str) -> str:
"""
Delete a record. The confirmation field must be "DELETE:{record_id}".
Ask the user to confirm by providing this exact string.
"""
if confirmation != f"DELETE:{record_id}":
return "Error: Incorrect confirmation. Ask user to confirm with 'DELETE:{record_id}'"
db.delete(record_id)
return "Deleted"
# Layer 4: Separate read and write permissions
read_only_tools = [search_db, get_record, list_records]
write_tools = [create_record, update_record, delete_record]
# Different agents with different tool access
readonly_agent = build_agent(tools=read_only_tools) # Safe for any user
admin_agent = build_agent(tools=read_only_tools + write_tools) # Admin onlyQ3: What is the "context stuffing" problem in agents and how do you solve it?
Answer:
As an agent runs more steps, its context grows. Each tool call adds:
- The tool call request
- The tool result
- Potentially large amounts of data
After 10 steps:
- System prompt: 500 tokens
- Original task: 50 tokens
- Turn 1 messages: 300 tokens
- Tool call 1 result: 2000 tokens (e.g., search result)
- Turn 2 messages: 200 tokens
- Tool call 2 result: 1500 tokens
- ...
Total: 15,000+ tokens per request, growing each iteration
Problems:
1. Cost: each iteration becomes more expensive
2. "Lost in the middle": early context forgotten
3. Context window limits hit
Solutions:python# Solution 1: Summarize tool results
def truncate_tool_result(result: str, max_tokens: int = 500) -> str:
if count_tokens(result) <= max_tokens:
return result
# Summarize the result
return llm.invoke(f"Summarize this in 200 words:\n{result}")
# Solution 2: Rolling summary of old messages
def compress_history(messages: list, keep_last_n: int = 4) -> list:
if len(messages) <= keep_last_n:
return messages
old_messages = messages[:-keep_last_n]
summary = llm.invoke(f"Summarize this conversation:\n{format_messages(old_messages)}")
return [SystemMessage(content=f"Previous conversation: {summary}")] + messages[-keep_last_n:]
# Solution 3: Store large results externally
@tool
def search_web(query: str) -> str:
results = actual_search(query)
# Store full result in state, return only summary
result_id = store.put(results)
summary = results[:500] # First 500 chars
return f"[Result ID: {result_id}] {summary}... (use get_result({result_id}) for full content)"
# Solution 4: LangGraph with summarization node
def summarize_if_too_long(state: AgentState) -> AgentState:
if count_tokens(state["messages"]) > 8000:
state["messages"] = compress_history(state["messages"])
return stateQ4: Why is agentic AI harder to test than traditional APIs?
Answer:
Traditional API test:
python# Deterministic — same input → same output every time
def test_calculate_tax():
assert calculate_tax(100, 0.1) == 10.0 # Always passes or always failsAgent test:
python# Non-deterministic — agent may take different paths
def test_research_agent():
result = agent.invoke("Research the top 3 AI companies")
# What exactly do you assert here?
# The exact wording will differ every run
# The order of companies might differ
# Which tools were called might differTesting strategies for agents:
python# 1. Outcome-based testing (not implementation)
def test_research_agent():
result = agent.invoke("What are the top 3 AI companies?")
# Check outcome, not exact text
assert "NVIDIA" in result or "Nvidia" in result # Flexible
assert len(result) > 100 # At least substantial
assert not contains_hallucination_markers(result)
# 2. Tool call assertion
def test_uses_search_tool():
with tool_call_recorder() as recorder:
agent.invoke("What's the current stock price of Apple?")
# Should have called search, not just used training data
assert "get_stock_price" in [call.tool_name for call in recorder.calls]
# 3. Trace-based testing with LangSmith
from langsmith import Client, expect
@pytest.mark.langsmith
def test_agent_trace():
result = agent.invoke("Book a flight to Paris")
expect.edit_distance(result, "flight booked").to_be_less_than(0.5)
# 4. LLM judge for quality
def assert_factual_accuracy(response: str, ground_truth: str) -> None:
score = llm_judge(f"Is this response factually consistent with: {ground_truth}\nResponse: {response}")
assert score > 0.8, f"Factual accuracy too low: {score}"
# 5. Fixed seed + deterministic mock for unit tests
with patch("tools.search_web") as mock_search:
mock_search.return_value = "AAPL: $189.43"
result = agent.invoke("What's Apple's stock price?")
assert "189" in result
assert mock_search.calledQ5: What's the difference between multi-agent and single-agent systems? When do you need multiple agents?
Answer:
Single-agent with many tools:
1 LLM decides everything
All tools visible in context (limited by context window)
Simple, less overhead
Context gets bloated with all tool descriptions
Multi-agent:
Each agent specializes → smaller context, better focus
Parallel execution of independent subtasks
Different models per agent (cheap model for simple tasks)
Isolation: researcher doesn't need coding tools
When to use multi-agent:
1. Task is too complex for single context window
→ Orchestrator breaks it into parallel subtasks
2. Different specializations needed
→ Research agent (web search) + Coding agent (code execution) + Writer agent
3. Independent subtasks that can run in parallel
→ Analyze 10 documents simultaneously with 10 agents
4. Quality via adversarial agents
→ Generator agent + Critic agent + Refiner agent
Multi-agent patterns:python# Pattern 1: Supervisor (orchestrator + workers)
from langgraph.prebuilt import create_react_agent
research_agent = create_react_agent(model, [search_web, read_url])
code_agent = create_react_agent(model, [run_python, write_file])
def supervisor_node(state):
# Supervisor decides which agent to call
decision = supervisor_model.invoke(state["messages"])
return {"next": decision.content} # "research" or "code" or "FINISH"
# Pattern 2: Parallel execution
from langgraph.constants import Send
def fan_out(state):
# Send same task to multiple agents in parallel
return [Send("analyze_agent", {"doc": doc}) for doc in state["documents"]]
# Pattern 3: Debate (adversarial)
# Generator → produces answer
# Critic → finds flaws
# Generator → revises
# (repeat N rounds)[prev·next]