Building and Deploying a Multi-Agent AI System with LangGraph and FastAPI

The landscape of artificial intelligence is rapidly shifting from single-model solutions to complex, orchestrated systems where multiple specialized agents collaborate. A multi-agent system (MAS) allows you to decompose a sophisticated task—like answering a technical support query, generating a report, or analyzing a complex dataset—into smaller, manageable subtasks performed by dedicated AI agents. Each agent can have its own persona, context, tools, and even a different underlying model. This approach not only improves accuracy and modularity but also enables more transparent and debuggable AI workflows.

In this comprehensive guide, we will explore how to design, build, and deploy a production-ready multi-agent system using LangGraph for orchestrating agent workflows and FastAPI for creating a robust, scalable API layer. By the end of this article, you will have a clear understanding of agent state management, conditional routing, human-in-the-loop patterns, and how to containerize and serve your system.

Understanding Multi-Agent Architectures

Before diving into code, it is essential to grasp the fundamental concepts that make multi-agent systems powerful. Unlike a monolithic chain or a single LLM call, an MAS separates concerns:

Specialization: Each agent is optimized for a specific domain or function (e.g., code generation, data retrieval, summarization).
State Management: A shared state object tracks the conversation history, intermediate results, and metadata across all agents.
Routing and Control Flow: A supervisor agent or a graph-based logic decides which agent should act next based on the current state.
Tool Integration: Agents can call external APIs, databases, or even other agents as tools.

Why LangGraph?

LangGraph, part of the LangChain ecosystem, is a library designed specifically for building stateful, multi-actor applications with LLMs. It models the workflow as a directed graph where nodes represent agents or functions, and edges define the flow of control. Key features include:

Cyclic Graphs: Unlike linear chains, graphs can loop back, allowing agents to refine their outputs or ask clarifying questions.
Persistence: Built-in checkpointing enables fault tolerance and human-in-the-loop interactions.
Streaming: Real-time output streaming from individual agents to the end user.
Parallel Execution: Run multiple agents concurrently when their tasks are independent.

System Design Overview

Our example system will be a Research and Content Generator that takes a user query about a technical topic and produces a structured article outline with cited sources. The architecture includes:

Router Agent: Interprets the user query and classifies the type of research needed (technical, business, or academic).
Search Agent: Calls a web search API or a local vector database to retrieve relevant information.
Outline Agent: Synthesizes the search results into a coherent article outline.
Reviewer Agent: Checks the outline for completeness and accuracy, potentially requesting revisions.
Human-in-the-Loop (HITL): A pause point where a human can approve or modify the outline before finalization.

Setting Up the Environment

We will use Python 3.11+, LangGraph, LangChain, FastAPI, and Uvicorn. Create a virtual environment and install the core dependencies:

pip install langgraph langchain langchain-openai fastapi uvicorn pydantic

For demonstration, we will use OpenAI’s GPT-4o model. Set your API key as an environment variable:

export OPENAI_API_KEY='your-api-key-here'

Defining the State Schema

The state is the central nervous system of our graph. It is a typed dictionary that is passed and updated by each node. With LangGraph, we define it using TypedDict:

from typing import TypedDict, List, Optional

class AgentState(TypedDict):
    user_query: str
    classification: Optional[str]
    search_results: Optional[str]
    outline: Optional[str]
    review_feedback: Optional[str]
    final_outline: Optional[str]
    messages: List[str]

Building the Agent Nodes

Each node is a Python function that takes the current state and returns a partial update. We will create the classify_query node first:

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o", temperature=0)

def classify_query(state: AgentState) -> dict:
    prompt = ChatPromptTemplate.from_messages([
        ("system", "Classify the user's query into one of these categories: Technical, Business, Academic. Respond with only the category name."),
        ("human", "{query}")
    ])
    chain = prompt | llm
    response = chain.invoke({"query": state["user_query"]})
    return {"classification": response.content.strip()}

Next, the search_web node. For simplicity, we simulate a search; in production, you would integrate the Tavily API or a custom retriever:

def search_web(state: AgentState) -> dict:
    # Simulated search based on classification
    if state["classification"] == "Technical":
        results = "Found technical articles on LangGraph deployment, FastAPI performance tuning."
    elif state["classification"] == "Business":
        results = "Market analysis reports on multi-agent AI adoption in enterprises."
    else:
        results = "Recent academic papers on agent-based modeling and LLM orchestration."
    return {"search_results": results}

The generate_outline node uses the LLM to create a structured outline:

def generate_outline(state: AgentState) -> dict:
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an expert content strategist. Based on the user query and search results, create a detailed article outline with 5-7 sections. Use markdown headings."),
        ("human", "User query: {query}\nSearch results: {results}")
    ])
    chain = prompt | llm
    response = chain.invoke({"query": state["user_query"], "results": state["search_results"]})
    return {"outline": response.content}

The review_outline node acts as a quality gate:

def review_outline(state: AgentState) -> dict:
    prompt = ChatPromptTemplate.from_messages([
        ("system", "Review the following article outline for clarity, completeness, and correctness. Provide 2-3 specific suggestions for improvement, or say 'APPROVED' if it is ready."),
        ("human", "{outline}")
    ])
    chain = prompt | llm
    response = chain.invoke({"outline": state["outline"]})
    feedback = response.content
    if "APPROVED" in feedback:
        return {"review_feedback": "", "final_outline": state["outline"]}
    else:
        return {"review_feedback": feedback}

Finally, a conditional routing function decides whether to loop back to the outline generator or proceed to finalization:

def needs_revision(state: AgentState) -> str:
    if state.get("final_outline"):
        return "finalize"
    elif state.get("review_feedback"):
        return "revise"
    else:
        return "finalize"

Assembling the Graph

Now we wire everything together using StateGraph:

from langgraph.graph import StateGraph, END

# Initialize the graph
builder = StateGraph(AgentState)

# Add nodes
builder.add_node("classify", classify_query)
builder.add_node("search", search_web)
builder.add_node("outline_gen", generate_outline)
builder.add_node("review", review_outline)

# Set entry point
builder.set_entry_point("classify")

# Define edges
builder.add_edge("classify", "search")
builder.add_edge("search", "outline_gen")
builder.add_edge("outline_gen", "review")

# Conditional edge for feedback loop
builder.add_conditional_edges(
    "review",
    needs_revision,
    {
        "revise": "outline_gen",
        "finalize": END
    }
)

# Compile the graph
app = builder.compile()

Adding Human-in-the-Loop

For critical workflows, you may want a human to approve the outline before finalization. LangGraph supports this via interrupts. We modify the review node to pause execution:

from langgraph.checkpoint import MemorySaver

# Add a checkpoint before finalizing
def human_review(state: AgentState) -> dict:
    # Simulate waiting for human input
    # In real app, this is an API endpoint that resumes the graph
    feedback = input("Enter approval or modifications: ")
    if feedback.lower() == "approve":
        return {"final_outline": state["outline"]}
    else:
        return {"review_feedback": feedback}

Then add this node between review and the conditional edge. The MemorySaver persist the state between interrupts.

Deploying with FastAPI

To make our system accessible as a web service, we wrap it with FastAPI. Create a file named app.py:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uuid

app = FastAPI(title="Multi-Agent Research API")

# In-memory store for graph runs (use Redis or DB in production)
sessions = {}

class QueryRequest(BaseModel):
    query: str

class QueryResponse(BaseModel):
    session_id: str
    outline: str = None
    status: str

@app.post("/start", response_model=QueryResponse)
async def start_research(request: QueryRequest):
    session_id = str(uuid.uuid4())
    initial_state = AgentState(
        user_query=request.query,
        classification=None,
        search_results=None,
        outline=None,
        review_feedback=None,
        final_outline=None,
        messages=[]
    )
    # Run the graph
    final_state = app.invoke(initial_state)
    sessions[session_id] = final_state
    return QueryResponse(
        session_id=session_id,
        outline=final_state.get("final_outline"),
        status="completed"
    )

@app.post("/human-input/{session_id}")
async def provide_human_input(session_id: str, feedback: str):
    if session_id not in sessions:
        raise HTTPException(status_code=404, detail="Session not found")
    state = sessions[session_id]
    # Update state with human feedback and resume
    # This requires checkpointing which is omitted for brevity
    return {"message": "Feedback received"}

Run the API with Uvicorn:

uvicorn app:app --host 0.0.0.0 --port 8000

Production Considerations

When moving to production, consider the following enhancements:

Persistence: Replace in-memory session storage with PostgreSQL or Redis. LangGraph integrates natively with PostgresSaver.
Scaling: Use a task queue like Celery or RabbitMQ for long-running graph executions. FastAPI endpoints should return immediately with a job ID.
Observability: Add logging, metrics, and tracing with OpenTelemetry. LangSmith provides excellent debugging for LangGraph workflows.
Security: Validate user input, implement authentication (API keys or JWT), and rate-limit endpoints to prevent abuse.
Dockerization: Create a Dockerfile for easy deployment:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

Testing the System

Let’s simulate a full run with a sample query. In a test script:

import requests

response = requests.post("http://localhost:8000/start", json={"query": "Explain how Kubernetes service mesh works"})
print(response.json())
# Expected output includes a session_id and a structured outline

Conclusion

Multi-agent AI systems represent a significant leap forward in building intelligent, modular, and explainable applications. LangGraph provides the infrastructure to design complex, stateful workflows with ease, while FastAPI delivers a high-performance API layer suitable for cloud-native deployment. By combining agent specialization, dynamic routing, and human oversight, you can create solutions that are not only powerful but also trustworthy.

As you extend this pattern, consider integrating specialized tools like code interpreters, image generators, or domain-specific databases. The future of AI engineering lies not in bigger models, but in smarter systems. Start building your agent ecosystem today.