Python Doctest: Testing Code Examples in Your Docstrings

Python Doctest: Testing Code Examples in Your Docstrings

Documentation lies. Not because developers intend to mislead, but because code evolves and docstrings do not. A function gets refactored, its return format changes, and somewhere in the codebase an outdated >>> example claims it returns a list when it now returns a generator. The user copies that example, runs it, and gets a confusing error.

Python's doctest module solves this by making examples executable. Every >>> prompt in a docstring becomes a test that fails loudly if the output no longer matches. Your documentation stays honest because it is also your test suite.

This guide covers everything from the basic mechanics through pytest integration, directives, and when doctest is the right tool versus when you should reach for pytest.

The Core Concept

doctest finds all strings formatted as interactive Python sessions — >>> statement followed by the expected output — in your docstrings, module files, or standalone text files, and runs them. If the output matches, the test passes. If it does not, the test fails with a diff.

A minimal example:

# math_utils.py

def add(a, b):
    """Return the sum of a and b.

    >>> add(2, 3)
    5
    >>> add(-1, 1)
    0
    >>> add(0.1, 0.2)  # doctest: +ELLIPSIS
    0.3...
    """
    return a + b

Run it:

python -m doctest math_utils.py -v

Output:

Trying:
    add(2, 3)
Expecting:
    5
ok
Trying:
    add(-1, 1)
Expecting:
    0
ok
Trying:
    add(0.1, 0.2)  # doctest: +ELLIPSIS
Expecting:
    0.3...
ok
1 items passed all tests:
   3 tests in math_utils.add
3 tests, 3 passed, 0 failed.

Without -v, doctest is silent on success. Failures print diffs:

Failed example:
    add(2, 3)
Expected:
    6
Got:
    5

testmod and testfile

doctest.testmod() tests all docstrings in a module:

# math_utils.py

def add(a, b):
    """
    >>> add(2, 3)
    5
    """
    return a + b

def multiply(a, b):
    """
    >>> multiply(3, 4)
    12
    >>> multiply(0, 100)
    0
    """
    return a * b

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Run with python math_utils.py. This is the classic pattern: running the module directly also runs its tests. The downside is that if __name__ == "__main__" blocks can be confusing to readers who expect application entry points, not test runners.

doctest.testmod() with options:

import doctest
results = doctest.testmod(verbose=True, optionflags=doctest.ELLIPSIS)
if results.failed:
    raise SystemExit(1)

doctest.testfile() runs tests from a standalone text file:

# test_examples.txt
The add function returns the sum of two numbers:

    >>> from math_utils import add
    >>> add(10, 20)
    30

It handles negative numbers correctly:

    >>> add(-5, 3)
    -2
import doctest
doctest.testfile("test_examples.txt")

testfile is useful for tutorial-style documentation where narrative text surrounds interactive examples. The entire file is both human-readable documentation and an executable test suite.

Directive Flags

Directives modify behavior for individual examples without affecting the whole test run. They appear as comments after the code:

ELLIPSIS — matches ... as a wildcard in expected output:

def get_user(user_id):
    """
    Returns a dict with user data.

    >>> user = get_user(1)
    >>> user['name']
    'Alice'
    >>> user  # doctest: +ELLIPSIS
    {'id': 1, 'name': 'Alice', 'created_at': datetime.datetime(...)}
    """

Without ELLIPSIS, the datetime.datetime(2024, 1, 15, ...) would have to match exactly. With it, ... matches any text.

NORMALIZE_WHITESPACE — treats any sequence of whitespace as equivalent:

def format_table(data):
    """
    >>> print(format_table([('a', 1), ('b', 2)]))  # doctest: +NORMALIZE_WHITESPACE
    a    1
    b    2
    """

Useful when output formatting varies by platform or Python version.

SKIP — skip an example entirely:

def connect_to_db():
    """
    >>> conn = connect_to_db()  # doctest: +SKIP
    >>> conn.execute("SELECT 1")
    [(1,)]
    """

Use SKIP for examples that require external services, credentials, or state that cannot be guaranteed in CI.

IGNORE_EXCEPTION_DETAIL — match exceptions ignoring the module path and message details:

def divide(a, b):
    """
    >>> divide(10, 0)  # doctest: +IGNORE_EXCEPTION_DETAIL
    Traceback (most recent call last):
        ...
    ZeroDivisionError: ...
    """
    return a / b

DONT_ACCEPT_TRUE_FOR_1 and DONT_ACCEPT_BLANKLINE** — edge cases for strict output matching.

Testing Exceptions

Exceptions are tested by including the traceback pattern. You must include Traceback (most recent call last): and the exception line, but the intermediate frames can be replaced with ...:

def parse_int(s):
    """Parse a string as an integer.

    >>> parse_int("42")
    42
    >>> parse_int("3.14")
    Traceback (most recent call last):
        ...
    ValueError: invalid literal for int() with base 10: '3.14'
    >>> parse_int("")
    Traceback (most recent call last):
        ...
    ValueError: invalid literal for int() with base 10: ''
    """
    return int(s)

The ... between the Traceback line and the exception type matches any intermediate stack frames. This is the correct way to test exceptions in doctest — do not try to match the full traceback.

Multi-line Examples and Setup

Examples can span multiple lines using the ... continuation prompt:

def greet(name, greeting="Hello"):
    """Return a greeting string.

    >>> greet("Alice")
    'Hello, Alice!'
    >>> greet(
    ...     "Bob",
    ...     greeting="Hi"
    ... )
    'Hi, Bob!'
    """
    return f"{greeting}, {name}!"

Setup code that should not be visible in output uses the ELLIPSIS or runs without expected output:

def process_items(items):
    """
    >>> import json
    >>> data = json.loads('["a", "b", "c"]')
    >>> process_items(data)
    ['A', 'B', 'C']
    """
    return [item.upper() for item in items]

Lines with no expected output (like the import and assignment above) are still executed — doctest just does not check their output.

pytest Integration

pytest can discover and run doctest examples with the --doctest-modules flag:

# Run all doctests in the project
pytest --doctest-modules

<span class="hljs-comment"># Run doctests in a specific module
pytest --doctest-modules src/math_utils.py

<span class="hljs-comment"># Run doctests alongside unit tests
pytest --doctest-modules tests/ src/

Configure pytest to always run doctests via pyproject.toml:

[tool.pytest.ini_options]
addopts = "--doctest-modules"
doctest_optionflags = ["NORMALIZE_WHITESPACE", "ELLIPSIS"]

Or pytest.ini:

[pytest]
addopts = --doctest-modules
doctest_optionflags = NORMALIZE_WHITESPACE ELLIPSIS

Fixtures in doctests — pytest supports doctest_namespace to inject fixtures:

# conftest.py
import pytest

@pytest.fixture(autouse=True)
def add_np(doctest_namespace):
    import numpy as np
    doctest_namespace["np"] = np

Now doctests can use np without importing:

def normalize(arr):
    """
    >>> normalize(np.array([1, 2, 3]))
    array([0.  , 0.5, 1.  ])
    """
    return (arr - arr.min()) / (arr.max() - arr.min())

Testing docstrings in .rst files with pytest:

pytest --doctest-glob="*.rst" docs/

Doctest in Standalone .txt Files

testfile supports narrative documentation that includes runnable examples. This is particularly useful for tutorial guides and how-to documents:

# docs/tutorial.txt

Getting Started
===============

First, import the calculator:

    >>> from mypackage.calc import Calculator

Create an instance:

    >>> calc = Calculator()

Basic arithmetic:

    >>> calc.add(10, 5)
    15
    >>> calc.subtract(10, 5)
    5
    >>> calc.multiply(3, 7)
    21

Division raises an error on zero:

    >>> calc.divide(10, 0)
    Traceback (most recent call last):
        ...
    ValueError: Cannot divide by zero

Run it:

import doctest
doctest.testfile("docs/tutorial.txt")

Or via pytest:

pytest --doctest-glob="*.txt" docs/

This pattern enforces that your tutorial actually works. If someone refactors Calculator.divide to raise ZeroDivisionError instead of ValueError, the tutorial test fails immediately.

When to Use Doctest vs pytest

Use doctest when:

  • You want to verify that documentation examples are accurate and up to date
  • Your examples are simple and self-contained (no complex setup or teardown)
  • You are writing a library and want to ensure the public API examples work
  • You are writing tutorials or how-to guides that should be executable
  • The test coverage comes for free from the documentation you would write anyway

Use pytest when:

  • You need fixtures, mocking, or complex test setup
  • You want parameterized tests across many input combinations
  • You are testing internal logic that should not appear in public documentation
  • You need conftest.py for shared state across test files
  • You want detailed test reports, coverage, and CI integration features

The tools are complementary. A well-tested Python project often has both: doctests that validate the documented interface and pytest tests that validate the internal implementation. The doctests serve as executable examples for users. The pytest suite serves as the comprehensive regression safety net for developers.

Limitations and Pitfalls

Output must match exactly (by default) — whitespace, trailing commas, object repr formatting, all must match. Python 3.8 changed the repr of some types; Python 3.11 changed traceback formatting. Use NORMALIZE_WHITESPACE and ELLIPSIS liberally.

Dictionary ordering — before Python 3.7, dicts had unpredictable ordering. Even in Python 3.7+, repr(d) order depends on insertion order:

def get_config():
    """
    >>> sorted(get_config().items())  # use sorted() for stable output
    [('debug', False), ('host', 'localhost'), ('port', 8080)]
    """
    return {"host": "localhost", "port": 8080, "debug": False}

Floating point0.1 + 0.2 is not 0.3. Use ELLIPSIS or round():

>>> round(0.1 + 0.2, 1)
0.3

No fixtures or shared state — each doctest example in a single docstring shares state (variables persist), but there is no setUp equivalent. If you need fresh state between examples, you must reset it manually.

Print vs return — doctest captures stdout and the repr of the last expression. A function that prints but returns None must use print() in the example:

def greet(name):
    """
    >>> greet("Alice")
    Hello, Alice!
    """
    print(f"Hello, {name}!")

Without print, the expected output Hello, Alice! would need to be empty (since None is not printed by the REPL when it is the return value).

doctest is not a replacement for a full test suite. It is a contract between your documentation and your code. When you write >>> add(2, 3) and expect 5, you are promising that the example works. The doctest runner enforces that promise on every commit. For library authors especially, that is a powerful guarantee: users who copy your examples will not be immediately confused by a function that changed behavior two releases ago.

HelpMeTest adds AI-powered test generation and 24/7 monitoring beyond what doctests provide — start free at helpmetest.com

Read more