AI Basics — Foundational Interview Questions
Entry-level and mid-level questions that come up in every AI/ML engineering interview. Know these cold before the advanced topics.
What is AI / ML / Deep Learning?
AI ⊃ Machine Learning ⊃ Deep Learning ⊃ LLMs
AI — any technique making machines mimic intelligence
(rule-based systems, search algorithms, etc.)
ML — systems that learn patterns from data automatically
Deep Learning — ML using multi-layer neural networks
LLMs — Deep Learning on massive text data → language understandingWhat is a neural network?
Layers of "neurons" (weighted nodes) connected together. Input passes through layers, each applying a linear transformation + activation function, until the output layer produces a prediction.
Input Layer → Hidden Layers → Output Layer
[x1, x2] → [weights, activation] → [prediction]
Activation functions:
ReLU: max(0, x) — most common in hidden layers
Sigmoid: 1/(1+e^-x) — binary classification output
Softmax: e^xi / Σe^xj — multi-class classification outputTraining = adjusting weights to minimize loss via gradient descent + backpropagation.
What is training vs inference?
| Training | Inference | |
|---|---|---|
| What happens | Model learns from data, weights updated via gradient descent | Model makes predictions using fixed weights |
| Hardware | GPU-intensive (days/weeks) | Can run on CPU, lighter GPU |
| Cost | Very expensive | Much cheaper per request |
| When | Offline, one-time (or periodic) | Real-time, per request |
GPT-4 training cost ~$100M. Running one query costs fractions of a cent.
What are tokens?
Tokens are the units an LLM processes — roughly a word or word-piece:
"Hello world" → ["Hello", " world"] = 2 tokens
"unhappiness" → ["un", "happiness"] = 2 tokens
"ChatGPT is great"→ ["Chat", "G", "PT", " is", " great"] = 5 tokens
Rule of thumb: 1 token ≈ 4 characters ≈ ¾ of a word
100 tokens ≈ 75 words
1,000 tokens ≈ 750 wordsEverything in the prompt + response costs tokens. LLMs have a context window = max tokens they can see at once.
What is the context window?
The maximum number of tokens an LLM can process in a single call (input + output combined):
| Model | Context Window |
|---|---|
| GPT-4o | 128K tokens (~100K words) |
| Claude 3.5 Sonnet | 200K tokens |
| Gemini 1.5 Pro | 1M tokens |
| Llama 3.1 70B | 128K tokens |
Important: Larger context ≠ better. Models struggle with information in the middle of very long contexts ("lost in the middle" problem). For RAG, retrieve only what's relevant rather than dumping everything.
What is a Large Language Model (LLM)?
A neural network (transformer architecture) trained on huge amounts of text to predict the next token. The model learns grammar, facts, reasoning patterns, and code through self-supervised learning — no human labels needed.
Key properties:
- Emergent abilities — capabilities that appear at scale (reasoning, code generation) that smaller models don't have
- Few-shot learning — can perform new tasks just from examples in the prompt
- Not a knowledge base — knowledge is baked into weights at training time (cutoff date)
What is temperature and how does it affect output?
Controls randomness in token selection:
temperature = 0 → always pick the highest-probability token (deterministic)
temperature = 0.7 → balanced creativity and coherence (good default)
temperature = 1.0 → more random, creative
temperature = 2.0 → very random, often incoherent
Use cases:
Code generation: temperature = 0 – 0.2 (want deterministic)
Creative writing: temperature = 0.8 – 1.0
Factual Q&A: temperature = 0
RAG answers: temperature = 0.1 – 0.3top_p (nucleus sampling) — alternative to temperature. Only considers top tokens whose cumulative probability ≥ p. Set either temperature OR top_p, not both.
What is prompt engineering?
Designing the text you send to an LLM to get better outputs. Core techniques:
python# 1. System prompt — sets role and constraints
system = "You are a senior Python developer. Be concise. Only answer coding questions."
# 2. Few-shot examples — show the model what you want
prompt = """
Convert to JSON:
Input: "Name: Tarun, Age: 28"
Output: {"name": "Tarun", "age": 28}
Input: "Name: Alice, Age: 32"
Output:"""
# 3. Chain-of-thought — ask for step-by-step reasoning
prompt = "Solve this step by step: [problem]"
# 4. Output format constraints
prompt = "Return ONLY valid JSON with keys: {name, score, reason}. No explanation."
# 5. Role + task + format
prompt = """
Role: You are a code reviewer.
Task: Review this Python function for bugs.
Format: List each bug as: [LINE] [SEVERITY] [DESCRIPTION]
Code: {code}
"""What is RAG (Retrieval-Augmented Generation)?
A pattern where you:
- Retrieve relevant documents from a knowledge base
- Inject them into the prompt
- Generate a grounded answer
Solves: hallucination (model invents facts), knowledge cutoff (model has stale training data), proprietary data (model was never trained on your docs).
python# Simple RAG flow
def rag_answer(question: str) -> str:
# 1. Embed the question
query_vector = embed(question)
# 2. Search vector DB for relevant docs
docs = vector_db.search(query_vector, top_k=3)
# 3. Build prompt with context
context = "\n\n".join(d.text for d in docs)
prompt = f"Answer based on context only:\n\n{context}\n\nQ: {question}"
# 4. Call LLM
return llm.complete(prompt)What is a vector embedding?
A fixed-size numerical representation of text (or images, audio) where semantic similarity = geometric closeness:
python"dog" → [0.12, -0.34, 0.87, ...] # 1536 numbers
"puppy" → [0.11, -0.35, 0.88, ...] # very close to "dog"
"quantum physics" → [-0.45, 0.21, -0.12, ...] # far from "dog"
# Similarity measured by cosine distance (angle between vectors)Embeddings are the foundation of semantic search, RAG, recommendation systems, and duplicate detection.
What is fine-tuning?
Taking a pretrained model and continuing to train it on your specific dataset to adapt its behavior:
Pretrained LLM (general knowledge)
↓
Fine-tune on your data (customer support tickets, medical records, legal docs)
↓
Specialized model (better at your specific task, domain-aware)When to fine-tune vs RAG:
| Scenario | Use RAG | Use Fine-tuning |
|---|---|---|
| Need up-to-date info | ✅ | ❌ |
| Custom behavior/style | ❌ | ✅ |
| Large knowledge base | ✅ | ❌ (too much to bake in) |
| Consistent output format | Prompt engineering first | Fine-tune if prompting fails |
| Latency sensitive | ❌ (extra retrieval step) | ✅ |
What is an AI agent?
An LLM that can take actions to complete multi-step goals — not just respond in one turn:
Agent loop:
1. Receive goal
2. Think: "What tool do I need?"
3. Call tool (search web, run code, query DB, call API)
4. Observe result
5. Decide: goal reached? → return answer | else → go to step 2python# ReAct agent pattern
while not done:
thought = llm("Think step by step. What should I do next?")
action = llm("What tool to call? Format: tool_name(args)")
result = call_tool(action)
observation = f"Tool returned: {result}"
# feed observation back into contextKey components: LLM brain, tools (functions it can call), memory (context window or external), orchestrator (LangGraph, LangChain AgentExecutor).
What is LangChain?
A Python/JS framework for building LLM-powered apps. Core abstractions:
pythonfrom langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Basic chain (LCEL — LangChain Expression Language)
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("user", "{question}")
])
chain = prompt | llm | StrOutputParser()
result = chain.invoke({"question": "What is RAG?"})
# RAG chain
from langchain_core.runnables import RunnablePassthrough
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)What is the difference between OpenAI, Anthropic, and open-source LLMs?
| Provider | Models | Key Strengths | Notes |
|---|---|---|---|
| OpenAI | GPT-4o, o1, o3 | Widest ecosystem, function calling | Proprietary, usage-based pricing |
| Anthropic | Claude 3.5 Sonnet/Opus | Long context (200K), safety, coding | Proprietary |
| Gemini 1.5 Pro/Flash | 1M context, multimodal | Proprietary | |
| Meta | Llama 3.1 (8B/70B/405B) | Open weights, self-hostable | Open source (custom license) |
| Mistral | Mistral 7B, Mixtral | Efficient MoE, open weights | Open source |
Self-hosted (Ollama):
bashollama run llama3.1 # run locally
curl http://localhost:11434/api/generate -d '{"model":"llama3.1","prompt":"Hello"}'Common AI interview gotchas
Q: What's the difference between parameters and hyperparameters? Parameters = weights the model learns during training (billions of floats). Hyperparameters = settings YOU choose before training (learning rate, batch size, number of layers, temperature at inference).
Q: What does "GPT-4 has 1.7 trillion parameters" mean? The model has 1.7T numbers (float16) stored as its weights. More parameters generally = more capacity to learn patterns = larger model size in memory. GPT-4 @ float16 ≈ 3.4TB RAM.
Q: What is the difference between generative and discriminative models? Discriminative models (classifiers) learn to distinguish between classes — output is a label (spam/not spam). Generative models learn the underlying data distribution and can generate new samples — LLMs generate text, diffusion models generate images.
Q: What is overfitting? Model memorizes training data too well — performs great on training examples, poorly on new data. Signs: training loss low, validation loss high. Fixes: more data, regularization (dropout, weight decay), early stopping, simpler model.
Q: What is zero-shot vs few-shot learning? Zero-shot: model performs a task with no examples in the prompt — relies purely on training. Few-shot: provide 2–5 examples in the prompt showing input→output pairs — dramatically improves performance on specific formats/tasks. In-context learning (few-shot) is a core LLM superpower.