Related AI Frameworks
Framework Landscape
LangChain / LangGraph General-purpose LLM pipelines + stateful agents
LlamaIndex RAG-first, data connectors, query engines
DSPy Programmatic prompt optimization (compile prompts, not write them)
CrewAI Multi-agent teams with roles, built on LangChain
AutoGen (Microsoft) Multi-agent conversations, code execution focus
Haystack Production NLP pipelines, strong enterprise features
Semantic Kernel Microsoft's SDK — .NET first, Python/Java support
Pydantic AI Type-safe agent framework using PydanticLlamaIndex
LlamaIndex (formerly GPT Index) is optimized for RAG and data ingestion. Where LangChain is a general framework, LlamaIndex is laser-focused on querying your data.
Core Concepts
Data Connectors (Readers) Load data from 100+ sources (PDF, Notion, Slack, SQL...)
Node Parser Chunk documents into Nodes (with metadata preserved)
Index Store/index nodes (VectorStore, Summary, Knowledge Graph...)
Query Engine Answer questions over an index
Retriever Fetch relevant nodes for a query
Response Synthesizer Turn retrieved nodes into a final responseBasic RAG with LlamaIndex
pythonfrom llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# Configure globally
Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
# Load and index
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What is the refund policy?")
print(response.response) # Answer
print(response.source_nodes) # Retrieved chunks with scoresAdvanced Query Engines
pythonfrom llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool
# Tool wrapping a query engine
tool = QueryEngineTool.from_defaults(
query_engine=index.as_query_engine(),
name="docs_search",
description="Searches company documentation"
)
# Sub-question engine: breaks complex queries into sub-questions
sub_question_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=[tool],
use_async=True
)
response = sub_question_engine.query(
"Compare the refund policy and the shipping policy"
)
# Internally generates:
# - "What is the refund policy?"
# - "What is the shipping policy?"
# Then synthesizes a combined answerLlamaIndex vs LangChain for RAG
LlamaIndex strengths:
- Richer index types (Knowledge Graph, Summary, Tree index)
- Better metadata filtering (filter by date, source, author)
- Sub-question decomposition built-in
- Node-level citation tracking out of the box
- 100+ data connectors via LlamaHub
LangChain strengths:
- Better for agentic workflows beyond RAG
- LangGraph for stateful multi-step agents
- More LLM provider integrations
- Better memory abstractions
- Larger ecosystem overall
In practice: combine them — LlamaIndex for retrieval/indexing, LangChain/LangGraph for the agent loop.DSPy
DSPy (Declarative Self-improving Python) takes a different approach: instead of writing prompts, you write programs and DSPy compiles (optimizes) the prompts for you.
Core Idea
Traditional approach:
You: "Here is the question: {question}\nThink step by step and answer:"
Problem: prompt quality depends on your intuition. Brittle to model changes.
DSPy approach:
You: define input/output fields + examples (a dataset)
DSPy: runs an optimizer to find the best prompts/few-shots automatically
Output: a compiled program where prompts are optimized for your task + modelBasic DSPy
pythonimport dspy
# Configure LLM
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)
# Define a signature (input → output spec)
class SentimentAnalysis(dspy.Signature):
"""Classify the sentiment of text."""
text: str = dspy.InputField()
sentiment: Literal["positive", "negative", "neutral"] = dspy.OutputField()
confidence: float = dspy.OutputField(desc="0.0 to 1.0")
# Use a module (Chain of Thought wraps signature in step-by-step reasoning)
analyzer = dspy.ChainOfThought(SentimentAnalysis)
result = analyzer(text="The product broke after two days. Terrible.")
print(result.sentiment) # "negative"
print(result.confidence) # 0.95
print(result.rationale) # The reasoning traceCompilation (the key differentiator)
pythonfrom dspy.teleprompt import BootstrapFewShot
# Your dataset (input + expected output)
trainset = [
dspy.Example(text="Amazing product!", sentiment="positive").with_inputs("text"),
dspy.Example(text="Never buying again", sentiment="negative").with_inputs("text"),
# ... more examples
]
# Metric: is the prediction correct?
def sentiment_metric(example, prediction, trace=None) -> bool:
return prediction.sentiment == example.sentiment
# Compile: DSPy runs experiments to find best few-shots / prompt format
teleprompter = BootstrapFewShot(metric=sentiment_metric, max_bootstrapped_demos=4)
compiled_analyzer = teleprompter.compile(analyzer, trainset=trainset)
# compiled_analyzer now has auto-selected few-shot examples baked in
# It will perform better than the un-compiled version on your task
result = compiled_analyzer(text="Exceeded my expectations!")When to use DSPy
Good for:
- Classification tasks with labeled data
- Structured extraction (consistent schema)
- Tasks where you have eval metrics
- Optimizing prompts across model versions
Not for:
- Free-form generation (creative writing, chat)
- Tasks with no measurable output
- When you want full control over the promptCrewAI
CrewAI is a high-level multi-agent framework built on LangChain. You define Agents with roles and goals, assign Tasks, form a Crew, and kick it off.
pythonfrom crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool
# Define agents with roles
researcher = Agent(
role="Senior Research Analyst",
goal="Find accurate, up-to-date information on the given topic",
backstory="Expert at synthesizing complex information from multiple sources",
tools=[SerperDevTool(), WebsiteSearchTool()],
llm="gpt-4o-mini",
verbose=True,
max_iter=5, # Max tool call loops
allow_delegation=True, # Can delegate to other agents
)
writer = Agent(
role="Technical Writer",
goal="Write clear, well-structured technical content",
backstory="Experienced at turning complex research into readable prose",
llm="gpt-4o-mini",
)
# Define tasks (order matters for sequential process)
research_task = Task(
description="Research the current state of {topic}. Focus on trends and key players.",
expected_output="Bullet-point summary with 5-7 key findings and sources",
agent=researcher,
)
writing_task = Task(
description="Write a 500-word article based on the research.",
expected_output="Well-structured article with title, intro, body, conclusion",
agent=writer,
context=[research_task], # Writer gets researcher's output as context
)
# Assemble and run
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential, # or Process.hierarchical (uses a manager LLM)
verbose=True,
)
result = crew.kickoff(inputs={"topic": "AI agents in 2025"})
print(result.raw)CrewAI vs LangGraph
CrewAI:
Pros: Fast to set up, readable role-based config, good for role-play style pipelines
Cons: Less control over flow, limited state management, fixed process types,
hard to add custom routing logic, less debuggable
LangGraph:
Pros: Full control over graph structure, explicit state, time-travel, streaming,
dynamic routing, subgraphs, production-grade checkpointing
Cons: More verbose, steeper learning curve
Rule: CrewAI for quick prototypes / demos. LangGraph for production.AutoGen (Microsoft)
AutoGen is Microsoft's multi-agent framework focused on conversational agents — particularly code-generating agents with execution.
pythonimport autogen
config_list = [{"model": "gpt-4o", "api_key": "..."}]
# Assistant agent — generates code/solutions
assistant = autogen.AssistantAgent(
name="assistant",
llm_config={"config_list": config_list},
system_message="You are a Python expert. Write clean, tested code.",
)
# User proxy — executes code, provides feedback
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER", # NEVER / ALWAYS / TERMINATE
max_consecutive_auto_reply=10,
is_termination_msg=lambda msg: "DONE" in msg.get("content", ""),
code_execution_config={
"executor": autogen.coding.LocalCommandLineCodeExecutor(work_dir="coding/")
},
)
# Start conversation
user_proxy.initiate_chat(
assistant,
message="Write a Python script to scrape Hacker News top stories and save to CSV"
)
# AutoGen loop:
# assistant generates code → user_proxy executes it → sends output back → assistant refines → ...AutoGen Key Concepts
AssistantAgent LLM-powered — generates text, code, plans
UserProxyAgent Can execute code, represent the human, or both
GroupChat Multiple agents in one conversation, manager routes turns
ConversableAgent Base class — any agent that can send/receive messages
Termination conditions:
is_termination_msg function that returns True to stop
max_consecutive_auto_reply hard limit on auto replies
human_input_mode=ALWAYS pause for human input every turnHaystack
Haystack (deepset) is enterprise-focused with strong pipelines and evaluation tooling.
pythonfrom haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import RAGPromptBuilder
from haystack.dataclasses import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
# Setup document store
store = InMemoryDocumentStore()
store.write_documents([
Document(content="Python was created by Guido van Rossum."),
Document(content="LangChain was founded in 2022."),
])
# Build pipeline
pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=store))
pipeline.add_component("prompt_builder", RAGPromptBuilder())
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
# Connect components
pipeline.connect("retriever", "prompt_builder.documents")
pipeline.connect("prompt_builder", "llm")
# Run
result = pipeline.run({
"retriever": {"query": "Who created Python?"},
"prompt_builder": {"query": "Who created Python?"},
})
print(result["llm"]["replies"][0])Pydantic AI
Type-safe agent framework. Agents are defined with strict input/output schemas. Works well when you need strong typing and validation in agentic loops.
pythonfrom pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic import BaseModel
class SearchResult(BaseModel):
title: str
summary: str
relevance_score: float
# Agent with typed output
agent = Agent(
OpenAIModel("gpt-4o-mini"),
result_type=SearchResult,
system_prompt="You are a research assistant. Return structured results.",
)
@agent.tool
async def search_web(query: str) -> str:
"""Search the web for information."""
return await actual_web_search(query)
# Typed response guaranteed
result = await agent.run("Find information about LangGraph")
print(result.data.title) # str — type-safe
print(result.data.relevance_score) # float — type-safeComparison Summary
| Framework | Best For | Avoid When |
|---|---|---|
| LangChain | General LLM pipelines, RAG, tool use | You need complex stateful agents |
| LangGraph | Stateful agents, multi-step, production | Simple single-call chains |
| LlamaIndex | RAG-heavy apps, complex data sources | General agentic workflows |
| DSPy | Optimizing prompts with labeled data | No eval metrics, creative tasks |
| CrewAI | Quick role-based agent prototypes | Production, custom routing |
| AutoGen | Code-gen agents, execute-and-iterate | Non-code tasks, strict control |
| Haystack | Enterprise NLP pipelines, evaluation | Rapid prototyping |
| Pydantic AI | Typed, validated agent I/O | Flexible/dynamic schemas |