Hypothesis: Property-Based Testing for Python
Hypothesis is Python's most powerful property-based testing library. It generates test inputs automatically, shrinks failures to minimal examples, and remembers past failures across test runs. If you write Python and care about test quality, Hypothesis is worth learning.
Installation
pip install hypothesisHypothesis integrates with pytest, unittest, and Django's test runner without any special configuration.
The Basics: @given and Strategies
The @given decorator wraps a test function and provides it with generated inputs:
from hypothesis import given
from hypothesis import strategies as st
@given(st.integers())
def test_absolute_value_non_negative(n):
assert abs(n) >= 0
@given(st.text())
def test_string_length_non_negative(s):
assert len(s) >= 0
@given(st.lists(st.integers()))
def test_sum_of_empty_list_is_zero(lst):
if len(lst) == 0:
assert sum(lst) == 0Run with pytest as usual:
pytest test_hypothesis.py -vHypothesis generates 100 examples by default (configurable). If any example fails, it shrinks the input and reports the minimal failing case.
Core Strategies
Strategies describe the space of inputs to generate:
# Primitives
st.integers() # any integer
st.integers(min_value=0, max_value=100) # bounded
st.floats() # any float
st.floats(min_value=0.0, max_value=1.0, allow_nan=False)
st.text() # any string
st.text(alphabet=st.characters(whitelist_categories=['Lu', 'Ll'])) # letters only
st.binary() # bytes
st.booleans()
# Collections
st.lists(st.integers())
st.lists(st.integers(), min_size=1, max_size=10)
st.sets(st.integers())
st.dictionaries(st.text(), st.integers())
st.tuples(st.integers(), st.text())
# Special
st.none()
st.one_of(st.integers(), st.text()) # union type
st.just(42) # always returns 42
st.sampled_from([1, 2, 3, "a"]) # picks from a list
# Dates and times
st.dates()
st.datetimes()
st.timedeltas()
# Network types
st.ip_addresses()
st.emails()
st.from_regex(r'[A-Z]{3}-\d{4}') # regex-constrained stringsBuilding Complex Strategies
Composing Strategies
from hypothesis import given, strategies as st
from dataclasses import dataclass
@dataclass
class User:
name: str
age: int
email: str
users = st.builds(
User,
name=st.text(min_size=1, max_size=50),
age=st.integers(min_value=13, max_value=120),
email=st.emails(),
)
@given(users)
def test_user_display_name(user):
display = f"{user.name} ({user.age})"
assert user.name in display
assert str(user.age) in displayFiltering Strategies
positive_integers = st.integers().filter(lambda x: x > 0)
# Or use min_value for better performance (filter is slow)
positive_integers = st.integers(min_value=1)Mapping Strategies
# Generate sorted lists
sorted_lists = st.lists(st.integers()).map(sorted)
# Generate uppercase strings
uppercase = st.text().map(str.upper)Dependent Strategies (flatmap)
# Generate a list and a valid index into it
@given(
st.lists(st.integers(), min_size=1).flatmap(
lambda lst: st.tuples(st.just(lst), st.integers(min_value=0, max_value=len(lst)-1))
)
)
def test_list_index_valid(lst_and_index):
lst, idx = lst_and_index
assert lst[idx] in lst # index is always validSettings: Controlling Test Behavior
from hypothesis import given, settings, HealthCheck
from hypothesis import strategies as st
@settings(
max_examples=500, # run 500 examples instead of 100
deadline=None, # no time limit per example
suppress_health_check=[HealthCheck.too_slow],
)
@given(st.text())
def test_heavy_operation(s):
result = expensive_function(s)
assert is_valid(result)Common settings:
max_examples=100— default, increase for thorough testingdeadline=timedelta(milliseconds=200)— fail if any example takes too longderiving_from_default=False— don't inherit parent settings
The Hypothesis Database
Hypothesis stores failing examples in a local database (.hypothesis/ directory). On subsequent runs, it replays known failures first — so previously discovered bugs are always retested.
This means:
- Bugs found once are permanently in your test suite
- CI reruns don't lose discovered examples
- The database should be committed for team use
# Commit the hypothesis database
<span class="hljs-built_in">echo <span class="hljs-string">".hypothesis/" >> .gitignore <span class="hljs-comment"># remove from gitignore if present
git add .hypothesis/
git commit -m <span class="hljs-string">"Add hypothesis example database"Assume: Narrowing Input Space
Use assume() to skip examples that don't meet preconditions:
from hypothesis import given, assume
from hypothesis import strategies as st
@given(st.integers(), st.integers())
def test_division(a, b):
assume(b != 0) # skip b=0 cases
result = a / b
assert abs(result * b - a) < 0.001 # floating point toleranceassume() is cleaner than .filter() when the condition is complex. However, if too many examples are rejected, Hypothesis will raise UnsatisfiedAssumption. Prefer bounded strategies over heavy assume() use.
Stateful Testing with RuleBasedStateMachine
For testing stateful systems (databases, queues, state machines):
from hypothesis.stateful import RuleBasedStateMachine, rule, initialize, invariant
from hypothesis import strategies as st
class StackMachine(RuleBasedStateMachine):
@initialize()
def new_stack(self):
self.stack = []
@rule(value=st.integers())
def push(self, value):
self.stack.append(value)
@rule()
def pop(self):
if self.stack:
result = self.stack.pop()
assert isinstance(result, int)
@invariant()
def length_non_negative(self):
assert len(self.stack) >= 0
TestStack = StackMachine.TestCaseHypothesis generates sequences of operations and checks invariants after each step.
Practical Example: API Validation
from hypothesis import given, strategies as st
from myapp import validate_order
valid_products = st.sampled_from(['SKU-001', 'SKU-002', 'SKU-003'])
valid_quantities = st.integers(min_value=1, max_value=100)
valid_prices = st.decimals(min_value='0.01', max_value='9999.99', places=2)
order_items = st.lists(
st.fixed_dictionaries({
'product_id': valid_products,
'quantity': valid_quantities,
'unit_price': valid_prices,
}),
min_size=1,
max_size=20,
)
@given(order_items)
def test_valid_orders_always_accepted(items):
result = validate_order(items)
assert result.is_valid, f"Validation failed: {result.errors}"
assert result.total > 0
@given(
st.lists(
st.fixed_dictionaries({
'product_id': valid_products,
'quantity': st.integers(max_value=0), # invalid: zero or negative
'unit_price': valid_prices,
}),
min_size=1,
)
)
def test_invalid_quantities_rejected(items):
result = validate_order(items)
assert not result.is_valid
assert 'quantity' in str(result.errors).lower()Django Integration
# Install django-hypothesis
pip install hypothesis[django]
from hypothesis.extra.django import TestCase, from_model
from myapp.models import Product
class ProductTests(TestCase):
@given(from_model(Product, name=st.text(min_size=1, max_size=200)))
def test_product_str(self, product):
assert product.name in str(product)CI Configuration
Run Hypothesis with fewer examples in CI (faster) but more thorough locally:
from hypothesis import settings, Phase
# Minimal CI profile
settings.register_profile("ci", max_examples=50)
# Full local profile
settings.register_profile("dev", max_examples=500)
# Load profile from environment
settings.load_profile(os.getenv("HYPOTHESIS_PROFILE", "dev"))# GitHub Actions
- name: Run tests with Hypothesis
run: pytest tests/
env:
HYPOTHESIS_PROFILE: ciCommon Mistakes
Using assume() excessively: If you reject >50% of examples, Hypothesis raises HealthCheck.filter_too_much. Use bounded strategies instead.
Testing implementation instead of behavior: Test what the function should do, not how it does it. Properties are about observable behavior.
Ignoring the database: Not committing .hypothesis/ means your team loses previously discovered failures.
Missing allow_nan=False for floats: By default, st.floats() includes NaN and Infinity. If your code doesn't handle them, add allow_nan=False, allow_infinity=False.
Pair with Functional Testing
Hypothesis validates the logic inside your Python functions. For production monitoring — verifying your Python API, web app, or service behaves correctly for real users — HelpMeTest provides AI-powered functional testing with 24/7 monitoring.
Start free with HelpMeTest — 10 tests, no code required, monitoring every 5 minutes.