Hypothesis: Property-Based Testing for Python

Hypothesis: Property-Based Testing for Python

Hypothesis is Python's most powerful property-based testing library. It generates test inputs automatically, shrinks failures to minimal examples, and remembers past failures across test runs. If you write Python and care about test quality, Hypothesis is worth learning.

Installation

pip install hypothesis

Hypothesis integrates with pytest, unittest, and Django's test runner without any special configuration.

The Basics: @given and Strategies

The @given decorator wraps a test function and provides it with generated inputs:

from hypothesis import given
from hypothesis import strategies as st

@given(st.integers())
def test_absolute_value_non_negative(n):
    assert abs(n) >= 0

@given(st.text())
def test_string_length_non_negative(s):
    assert len(s) >= 0

@given(st.lists(st.integers()))
def test_sum_of_empty_list_is_zero(lst):
    if len(lst) == 0:
        assert sum(lst) == 0

Run with pytest as usual:

pytest test_hypothesis.py -v

Hypothesis generates 100 examples by default (configurable). If any example fails, it shrinks the input and reports the minimal failing case.

Core Strategies

Strategies describe the space of inputs to generate:

# Primitives
st.integers()                       # any integer
st.integers(min_value=0, max_value=100)  # bounded
st.floats()                         # any float
st.floats(min_value=0.0, max_value=1.0, allow_nan=False)
st.text()                           # any string
st.text(alphabet=st.characters(whitelist_categories=['Lu', 'Ll']))  # letters only
st.binary()                         # bytes
st.booleans()

# Collections
st.lists(st.integers())
st.lists(st.integers(), min_size=1, max_size=10)
st.sets(st.integers())
st.dictionaries(st.text(), st.integers())
st.tuples(st.integers(), st.text())

# Special
st.none()
st.one_of(st.integers(), st.text())  # union type
st.just(42)                          # always returns 42
st.sampled_from([1, 2, 3, "a"])     # picks from a list

# Dates and times
st.dates()
st.datetimes()
st.timedeltas()

# Network types
st.ip_addresses()
st.emails()
st.from_regex(r'[A-Z]{3}-\d{4}')   # regex-constrained strings

Building Complex Strategies

Composing Strategies

from hypothesis import given, strategies as st
from dataclasses import dataclass

@dataclass
class User:
    name: str
    age: int
    email: str

users = st.builds(
    User,
    name=st.text(min_size=1, max_size=50),
    age=st.integers(min_value=13, max_value=120),
    email=st.emails(),
)

@given(users)
def test_user_display_name(user):
    display = f"{user.name} ({user.age})"
    assert user.name in display
    assert str(user.age) in display

Filtering Strategies

positive_integers = st.integers().filter(lambda x: x > 0)

# Or use min_value for better performance (filter is slow)
positive_integers = st.integers(min_value=1)

Mapping Strategies

# Generate sorted lists
sorted_lists = st.lists(st.integers()).map(sorted)

# Generate uppercase strings
uppercase = st.text().map(str.upper)

Dependent Strategies (flatmap)

# Generate a list and a valid index into it
@given(
    st.lists(st.integers(), min_size=1).flatmap(
        lambda lst: st.tuples(st.just(lst), st.integers(min_value=0, max_value=len(lst)-1))
    )
)
def test_list_index_valid(lst_and_index):
    lst, idx = lst_and_index
    assert lst[idx] in lst  # index is always valid

Settings: Controlling Test Behavior

from hypothesis import given, settings, HealthCheck
from hypothesis import strategies as st

@settings(
    max_examples=500,          # run 500 examples instead of 100
    deadline=None,             # no time limit per example
    suppress_health_check=[HealthCheck.too_slow],
)
@given(st.text())
def test_heavy_operation(s):
    result = expensive_function(s)
    assert is_valid(result)

Common settings:

  • max_examples=100 — default, increase for thorough testing
  • deadline=timedelta(milliseconds=200) — fail if any example takes too long
  • deriving_from_default=False — don't inherit parent settings

The Hypothesis Database

Hypothesis stores failing examples in a local database (.hypothesis/ directory). On subsequent runs, it replays known failures first — so previously discovered bugs are always retested.

This means:

  • Bugs found once are permanently in your test suite
  • CI reruns don't lose discovered examples
  • The database should be committed for team use
# Commit the hypothesis database
<span class="hljs-built_in">echo <span class="hljs-string">".hypothesis/" >> .gitignore  <span class="hljs-comment"># remove from gitignore if present
git add .hypothesis/
git commit -m <span class="hljs-string">"Add hypothesis example database"

Assume: Narrowing Input Space

Use assume() to skip examples that don't meet preconditions:

from hypothesis import given, assume
from hypothesis import strategies as st

@given(st.integers(), st.integers())
def test_division(a, b):
    assume(b != 0)  # skip b=0 cases
    result = a / b
    assert abs(result * b - a) < 0.001  # floating point tolerance

assume() is cleaner than .filter() when the condition is complex. However, if too many examples are rejected, Hypothesis will raise UnsatisfiedAssumption. Prefer bounded strategies over heavy assume() use.

Stateful Testing with RuleBasedStateMachine

For testing stateful systems (databases, queues, state machines):

from hypothesis.stateful import RuleBasedStateMachine, rule, initialize, invariant
from hypothesis import strategies as st

class StackMachine(RuleBasedStateMachine):
    
    @initialize()
    def new_stack(self):
        self.stack = []
    
    @rule(value=st.integers())
    def push(self, value):
        self.stack.append(value)
    
    @rule()
    def pop(self):
        if self.stack:
            result = self.stack.pop()
            assert isinstance(result, int)
    
    @invariant()
    def length_non_negative(self):
        assert len(self.stack) >= 0

TestStack = StackMachine.TestCase

Hypothesis generates sequences of operations and checks invariants after each step.

Practical Example: API Validation

from hypothesis import given, strategies as st
from myapp import validate_order

valid_products = st.sampled_from(['SKU-001', 'SKU-002', 'SKU-003'])
valid_quantities = st.integers(min_value=1, max_value=100)
valid_prices = st.decimals(min_value='0.01', max_value='9999.99', places=2)

order_items = st.lists(
    st.fixed_dictionaries({
        'product_id': valid_products,
        'quantity': valid_quantities,
        'unit_price': valid_prices,
    }),
    min_size=1,
    max_size=20,
)

@given(order_items)
def test_valid_orders_always_accepted(items):
    result = validate_order(items)
    assert result.is_valid, f"Validation failed: {result.errors}"
    assert result.total > 0

@given(
    st.lists(
        st.fixed_dictionaries({
            'product_id': valid_products,
            'quantity': st.integers(max_value=0),  # invalid: zero or negative
            'unit_price': valid_prices,
        }),
        min_size=1,
    )
)
def test_invalid_quantities_rejected(items):
    result = validate_order(items)
    assert not result.is_valid
    assert 'quantity' in str(result.errors).lower()

Django Integration

# Install django-hypothesis
pip install hypothesis[django]

from hypothesis.extra.django import TestCase, from_model
from myapp.models import Product

class ProductTests(TestCase):
    @given(from_model(Product, name=st.text(min_size=1, max_size=200)))
    def test_product_str(self, product):
        assert product.name in str(product)

CI Configuration

Run Hypothesis with fewer examples in CI (faster) but more thorough locally:

from hypothesis import settings, Phase

# Minimal CI profile
settings.register_profile("ci", max_examples=50)

# Full local profile
settings.register_profile("dev", max_examples=500)

# Load profile from environment
settings.load_profile(os.getenv("HYPOTHESIS_PROFILE", "dev"))
# GitHub Actions
- name: Run tests with Hypothesis
  run: pytest tests/
  env:
    HYPOTHESIS_PROFILE: ci

Common Mistakes

Using assume() excessively: If you reject >50% of examples, Hypothesis raises HealthCheck.filter_too_much. Use bounded strategies instead.

Testing implementation instead of behavior: Test what the function should do, not how it does it. Properties are about observable behavior.

Ignoring the database: Not committing .hypothesis/ means your team loses previously discovered failures.

Missing allow_nan=False for floats: By default, st.floats() includes NaN and Infinity. If your code doesn't handle them, add allow_nan=False, allow_infinity=False.

Pair with Functional Testing

Hypothesis validates the logic inside your Python functions. For production monitoring — verifying your Python API, web app, or service behaves correctly for real users — HelpMeTest provides AI-powered functional testing with 24/7 monitoring.

Start free with HelpMeTest — 10 tests, no code required, monitoring every 5 minutes.

Read more