Testing LangGraph State Machine Agents: Nodes, Edges, State Transitions, and Checkpointer Mocking

Testing LangGraph State Machine Agents: Nodes, Edges, State Transitions, and Checkpointer Mocking

LangGraph compiles agent behavior into a directed graph. Nodes are functions. Edges are transitions. State flows through the graph, getting modified at each node. It's a clean mental model — and it creates a natural seam for testing.

The problem is that most LangGraph developers test the graph as a black box: give it an input, assert on the final output. That works until a multi-hop workflow starts producing wrong results and you have no idea which node introduced the bug.

Testing LangGraph properly means testing nodes in isolation, testing edge conditions, validating state transitions, and mocking the checkpointer so your tests don't depend on external storage. Here's how.

The LangGraph Testing Pyramid

Node unit tests — test individual node functions with typed state input/output. Fast, deterministic, no LLM calls needed.

Edge condition tests — test the conditional logic that determines which node runs next. The most common source of workflow bugs.

State transition tests — run the graph through a sequence of nodes and assert on accumulated state at each checkpoint.

End-to-end graph tests — run the full compiled graph with mocked LLM calls. Slow but necessary before shipping.

Start at the bottom of the pyramid and work up. Most bugs live at the node and edge levels.

Testing Individual Nodes

Each LangGraph node is a Python function that takes state and returns a state update. Test it like any function:

import pytest
from typing import TypedDict
from langchain_core.messages import HumanMessage, AIMessage

class AgentState(TypedDict):
    messages: list
    next_action: str
    research_results: list
    error: str | None

def research_node(state: AgentState) -> dict:
    """Searches for information based on the last user message."""
    last_message = state["messages"][-1]
    query = last_message.content
    
    # In production this calls a real search API
    results = search_api.search(query)
    
    return {
        "research_results": results,
        "next_action": "synthesize" if results else "clarify"
    }

# Test the node in isolation
class TestResearchNode:
    def test_returns_results_for_valid_query(self, mock_search_api):
        mock_search_api.search.return_value = [
            {"title": "Result 1", "content": "relevant content"}
        ]
        
        state = {
            "messages": [HumanMessage(content="What is LangGraph?")],
            "next_action": "",
            "research_results": [],
            "error": None
        }
        
        update = research_node(state)
        
        assert len(update["research_results"]) == 1
        assert update["next_action"] == "synthesize"
    
    def test_routes_to_clarify_when_no_results(self, mock_search_api):
        mock_search_api.search.return_value = []
        
        state = {
            "messages": [HumanMessage(content="xkzqjbdf")],
            "next_action": "",
            "research_results": [],
            "error": None
        }
        
        update = research_node(state)
        
        assert update["research_results"] == []
        assert update["next_action"] == "clarify"
    
    def test_returns_state_update_not_full_state(self, mock_search_api):
        # Nodes should return partial updates, not full state copies
        mock_search_api.search.return_value = [{"title": "r"}]
        
        state = {
            "messages": [HumanMessage(content="query")],
            "next_action": "",
            "research_results": [],
            "error": None
        }
        
        update = research_node(state)
        
        # Should not include unchanged fields
        assert "messages" not in update  # Messages weren't modified
        assert "research_results" in update
        assert "next_action" in update

Key insight: A node should only return the fields it modifies. Test that the return value is a partial state update, not a full state copy — returning a full copy masks bugs where the node incorrectly modifies fields it shouldn't touch.

Testing Edge Conditions

Conditional edges are where logic errors cluster. A conditional edge is a function that takes state and returns the name of the next node.

from langgraph.graph import END

def route_after_research(state: AgentState) -> str:
    """Decides next step based on research results and error state."""
    if state.get("error"):
        return "handle_error"
    
    if not state["research_results"]:
        return "clarify"
    
    if len(state["research_results"]) > 10:
        return "filter_results"
    
    return "synthesize"

class TestRouteAfterResearch:
    def test_routes_to_handle_error_on_error(self):
        state = {
            "error": "API timeout",
            "research_results": [],
            "next_action": ""
        }
        assert route_after_research(state) == "handle_error"
    
    def test_error_takes_priority_over_empty_results(self):
        # Edge case: both error AND empty results
        state = {
            "error": "API timeout",
            "research_results": [],
            "next_action": ""
        }
        # Error should take priority
        assert route_after_research(state) == "handle_error"
    
    def test_routes_to_clarify_on_empty_results(self):
        state = {
            "error": None,
            "research_results": [],
            "next_action": ""
        }
        assert route_after_research(state) == "clarify"
    
    def test_routes_to_filter_on_many_results(self):
        state = {
            "error": None,
            "research_results": [{"r": i} for i in range(11)],
            "next_action": ""
        }
        assert route_after_research(state) == "filter_results"
    
    def test_routes_to_synthesize_on_normal_results(self):
        state = {
            "error": None,
            "research_results": [{"r": 1}, {"r": 2}],
            "next_action": ""
        }
        assert route_after_research(state) == "synthesize"
    
    def test_boundary_exactly_10_results(self):
        state = {
            "error": None,
            "research_results": [{"r": i} for i in range(10)],
            "next_action": ""
        }
        # 10 results = normal, not "many"
        assert route_after_research(state) == "synthesize"
    
    def test_boundary_exactly_11_results(self):
        state = {
            "error": None,
            "research_results": [{"r": i} for i in range(11)],
            "next_action": ""
        }
        # 11 results = filter
        assert route_after_research(state) == "filter_results"

The boundary conditions (10 vs 11 results) are where bugs hide. Test them explicitly.

Testing State Transitions Through the Graph

After testing nodes and edges individually, test multi-step flows through the compiled graph with mocked LLM calls.

from unittest.mock import patch, MagicMock
from langchain_core.messages import AIMessage
from your_agent import create_research_graph

class TestGraphStateTransitions:
    @pytest.fixture
    def graph(self):
        return create_research_graph()
    
    @patch("your_agent.nodes.llm")
    @patch("your_agent.nodes.search_api")
    def test_happy_path_state_at_each_step(self, mock_search, mock_llm, graph):
        # Setup mocks
        mock_search.search.return_value = [
            {"title": "LangGraph docs", "content": "state machine framework"}
        ]
        mock_llm.invoke.return_value = AIMessage(
            content="LangGraph is a framework for building stateful agent workflows."
        )
        
        # Run with checkpointer to capture intermediate state
        from langgraph.checkpoint.memory import MemorySaver
        checkpointer = MemorySaver()
        graph_with_checkpoint = create_research_graph(checkpointer=checkpointer)
        
        thread_id = "test-thread-1"
        config = {"configurable": {"thread_id": thread_id}}
        
        # Stream to capture state after each node
        states = []
        for event in graph_with_checkpoint.stream(
            {"messages": [HumanMessage(content="What is LangGraph?")]},
            config=config
        ):
            states.append(event)
        
        # After research node: should have results
        research_event = next(e for e in states if "research_node" in e)
        assert len(research_event["research_node"]["research_results"]) == 1
        
        # After synthesize node: should have final response
        synth_event = next(e for e in states if "synthesize_node" in e)
        assert "LangGraph" in synth_event["synthesize_node"]["messages"][-1].content
    
    @patch("your_agent.nodes.llm")
    @patch("your_agent.nodes.search_api")
    def test_error_path_transitions(self, mock_search, mock_llm, graph):
        mock_search.search.side_effect = Exception("Search API unavailable")
        
        from langgraph.checkpoint.memory import MemorySaver
        graph_with_checkpoint = create_research_graph(checkpointer=MemorySaver())
        
        final_state = graph_with_checkpoint.invoke(
            {"messages": [HumanMessage(content="query")]},
            {"configurable": {"thread_id": "error-test"}}
        )
        
        # Graph should complete without raising
        assert final_state is not None
        # Error state should be populated
        assert final_state.get("error") is not None or "error" in str(final_state["messages"][-1].content).lower()

Mocking the Checkpointer

The checkpointer persists agent state between interrupts. In tests, you want deterministic, fast persistence — not a Redis or PostgreSQL dependency.

from langgraph.checkpoint.memory import MemorySaver

def test_resuming_from_checkpoint():
    """Test that the agent correctly resumes from a saved checkpoint."""
    checkpointer = MemorySaver()  # In-memory, no external deps
    graph = create_research_graph(checkpointer=checkpointer)
    
    thread_config = {"configurable": {"thread_id": "resume-test"}}
    
    # First run: graph hits an interrupt (human-in-the-loop node)
    with patch("your_agent.nodes.search_api") as mock_search:
        mock_search.search.return_value = [{"title": "result"}]
        
        # Run until interrupt
        events = list(graph.stream(
            {"messages": [HumanMessage(content="research something")]},
            config=thread_config
        ))
    
    # Verify state was checkpointed
    checkpoint = checkpointer.get(thread_config)
    assert checkpoint is not None
    assert len(checkpoint["channel_values"]["research_results"]) == 1
    
    # Resume with human approval
    resumed_events = list(graph.stream(
        {"messages": [HumanMessage(content="looks good, proceed")]},
        config=thread_config
    ))
    
    # Should continue from where it left off, not restart
    final_messages = resumed_events[-1].get("messages", [])
    assert len(final_messages) > 1  # Has the original + resumed messages

def test_checkpoint_isolation_between_threads():
    """State from one thread should not bleed into another."""
    checkpointer = MemorySaver()
    graph = create_research_graph(checkpointer=checkpointer)
    
    with patch("your_agent.nodes.search_api") as mock_search, \
         patch("your_agent.nodes.llm") as mock_llm:
        mock_search.search.return_value = [{"title": "thread-specific result"}]
        mock_llm.invoke.return_value = AIMessage(content="response")
        
        # Run two separate threads
        thread_1 = {"configurable": {"thread_id": "thread-1"}}
        thread_2 = {"configurable": {"thread_id": "thread-2"}}
        
        graph.invoke({"messages": [HumanMessage(content="query 1")]}, config=thread_1)
        graph.invoke({"messages": [HumanMessage(content="query 2")]}, config=thread_2)
        
        state_1 = checkpointer.get(thread_1)
        state_2 = checkpointer.get(thread_2)
        
        # Thread states must be independent
        assert state_1 != state_2

The MemorySaver is your test checkpointer. It's thread-safe for sequential test execution but not for parallel tests — use a separate MemorySaver() instance per test if you run tests in parallel.

Testing LangGraph with pytest Fixtures

Structure your test suite with fixtures that create reusable graph instances:

import pytest
from unittest.mock import MagicMock, patch
from langgraph.checkpoint.memory import MemorySaver

@pytest.fixture
def mock_llm():
    llm = MagicMock()
    llm.invoke.return_value = AIMessage(content="mocked response")
    return llm

@pytest.fixture
def mock_search():
    search = MagicMock()
    search.search.return_value = [
        {"title": "Test Result", "content": "test content", "url": "https://example.com"}
    ]
    return search

@pytest.fixture
def graph(mock_llm, mock_search):
    with patch("your_agent.nodes.llm", mock_llm), \
         patch("your_agent.nodes.search_api", mock_search):
        yield create_research_graph(checkpointer=MemorySaver())

class TestResearchWorkflow:
    def test_basic_research_flow(self, graph):
        result = graph.invoke(
            {"messages": [HumanMessage(content="test query")]},
            config={"configurable": {"thread_id": "test-1"}}
        )
        assert result["messages"][-1].type == "ai"
    
    def test_multiple_independent_runs(self, graph):
        # Each thread is independent
        for i in range(3):
            result = graph.invoke(
                {"messages": [HumanMessage(content=f"query {i}")]},
                config={"configurable": {"thread_id": f"thread-{i}"}}
            )
            assert result is not None

Running LangGraph Tests in CI

LangGraph tests are pure Python — no special infrastructure needed.

# .github/workflows/test.yml
- name: Run LangGraph agent tests
  run: pytest tests/agents/ -v --tb=short
  env:
    PYTHONDONTWRITEBYTECODE: 1
    # No real API keys needed — everything is mocked

Keep a tests/agents/ directory with one test file per graph. Name tests by workflow step: test_research_node.py, test_routing.py, test_graph_integration.py.

Test execution order: Run node tests first (fastest), then routing tests, then full graph tests. Fail fast on node regressions before spending time on integration tests.

LangGraph's state machine structure is a gift for testing. Use it.

Read more