Third-Party API Testing Strategies: Contract Tests vs Integration Tests vs Mocks
Every third-party API integration needs tests — but mocking, contract testing, and real integration testing each have different tradeoffs. Mocks are fast but drift from reality. Real integration tests are accurate but slow and expensive. Contract tests are the middle ground that most teams miss. This guide covers when to use each approach and how to combine them.
Your application integrates with Stripe, GitHub, Twilio, SendGrid, and Slack. Each needs tests. But which kind?
Most teams default to one of two extremes: mock everything (fast, but mocks drift from real APIs) or test against real APIs (accurate, but slow, expensive, and rate-limited). The answer is neither extreme — it's a deliberate combination of all three approaches, each at the right level.
The Testing Triangle for Third-Party APIs
Real Integration Tests
(top: few, slow, accurate)
/\
/ \
/ \
Contract Tests
(middle: some, medium speed)
/ \
/ \
Unit Tests with Mocks
(bottom: many, fast, may drift)Most testing should happen at the bottom. Some at the middle. A few at the top.
When to Use Mocks
Mocks are appropriate when:
- Testing your code's behavior, not the third-party's behavior
- Running tests on every commit (needs to be fast)
- Testing error handling that's hard to trigger with real APIs
- The API has irreversible effects (send email, charge card, delete record)
- You're testing business logic that happens to call an API
# Good use of mocks: testing business logic
def test_order_fulfillment_marks_order_complete_after_charge():
"""After a successful charge, the order should be marked complete."""
# We're testing OUR logic, not Stripe's behavior
with patch("app.payments.stripe_client") as mock_stripe:
mock_charge = MagicMock()
mock_charge.status = "succeeded"
mock_charge.id = "ch_test_123"
mock_stripe.charges.create.return_value = mock_charge
order = Order(id="ord_123", total_cents=4999)
result = fulfill_order(order)
assert result.status == "complete"
assert result.charge_id == "ch_test_123"
# Stripe behavior was mocked — we're testing our order logicThe Mock Drift Problem
Mocks are a snapshot of your understanding of the API at the time you wrote the test. APIs evolve. Stripe deprecated charges.create in favor of PaymentIntents. GitHub changed scope names. Twilio added required parameters.
Mock drift means tests can pass while real integrations are broken. The solution is contract tests.
When to Use Contract Tests
Contract testing validates your assumptions about an API against that API's actual behavior. It's the layer between mocks and full integration tests.
Approach 1: Schema Validation
Download the API's OpenAPI spec and validate your mock responses against it:
import jsonschema
import requests
import pytest
GITHUB_API_SPEC_URL = "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json"
@pytest.fixture(scope="session")
def github_spec():
response = requests.get(GITHUB_API_SPEC_URL, timeout=30)
return response.json()
def test_user_mock_matches_github_schema(github_spec):
"""Our mock GitHub user must match the real GitHub API response schema."""
user_schema = github_spec["components"]["schemas"]["simple-user"]
# Our mock user response
our_mock_user = {
"login": "alice",
"id": 12345,
"avatar_url": "https://avatars.githubusercontent.com/alice",
"url": "https://api.github.com/users/alice",
"html_url": "https://github.com/alice",
"type": "User",
"site_admin": False
}
try:
jsonschema.validate(our_mock_user, user_schema)
except jsonschema.ValidationError as e:
pytest.fail(f"Mock user doesn't match GitHub schema: {e.message}")Approach 2: Record and Replay
Record real API interactions once, commit them, replay in tests:
# Using VCR.py
import vcr
@vcr.use_cassette("cassettes/stripe_create_payment_intent.yaml",
record_mode="none") # "none" = replay only, never re-record
def test_stripe_payment_intent_response_format():
"""Validate that Stripe's response format matches our assumptions."""
import stripe
stripe.api_key = "sk_test_cassette_playback"
intent = stripe.PaymentIntent.create(
amount=2000,
currency="usd"
)
# These fields must always be in the response — if Stripe changes them, test fails
assert hasattr(intent, "id")
assert hasattr(intent, "status")
assert hasattr(intent, "client_secret")
assert intent.object == "payment_intent"
assert intent.amount == 2000Recording cassettes: run once with record_mode="new_episodes", commit the cassette file, then switch to record_mode="none". The test always uses the recorded response — if Stripe changes their response format, the test will fail when you re-record.
Approach 3: Provider Contract Tests (Pact)
Pact is the leading consumer-driven contract testing framework. Your consumer (your app) defines what it expects from the provider (third-party API), and Pact verifies the provider satisfies the contract:
# pact/github_contract.py
from pact import Consumer, Provider
pact = Consumer("MyApp").has_pact_with(Provider("GitHub"))
def test_get_user_contract():
"""Define and verify the GitHub User API contract."""
expected_user = {
"login": "alice",
"id": 12345,
"email": "alice@example.com"
}
(pact
.given("a GitHub user with login alice exists")
.upon_receiving("a request for alice's profile")
.with_request("GET", "/users/alice",
headers={"Authorization": "token test_token"})
.will_respond_with(200,
headers={"Content-Type": "application/json"},
body={
"login": "alice",
"id": 12345,
"email": "alice@example.com",
# Pact matchers for flexible matching
}))
with pact:
from app.github import get_user
user = get_user("alice")
assert user.login == "alice"
assert user.id == 12345Pact generates a contract file that can be shared with the API provider for verification. GitHub, Stripe, and other major providers don't participate in Pact verification, but the consumer-side contract test still documents your assumptions and catches drift when you update cassettes.
When to Use Real Integration Tests
Real integration tests against actual APIs are appropriate for:
- Critical paths where correctness matters more than speed (actual payment processing)
- Pre-release validation before deploying to production
- Smoke tests that verify your credentials and connectivity are working
- Certification of new API versions before upgrading
# tests/integration/test_stripe_real.py
import pytest
import stripe
pytestmark = pytest.mark.integration # Skip unless explicitly requested
def test_payment_intent_lifecycle():
"""Full payment lifecycle: create, confirm, refund."""
stripe.api_key = os.environ["STRIPE_TEST_SECRET_KEY"]
# Create
intent = stripe.PaymentIntent.create(
amount=1000,
currency="usd",
payment_method="pm_card_visa", # Stripe test payment method
confirm=True,
return_url="https://example.com"
)
assert intent.status in ("succeeded", "requires_action")
if intent.status == "succeeded":
# Refund
refund = stripe.Refund.create(payment_intent=intent.id)
assert refund.status == "succeeded"
assert refund.amount == 1000Run real integration tests with:
# Only on main branch, pre-release
pytest tests/integration/ -m integration --<span class="hljs-built_in">timeout=60Decision Framework
Use this decision tree when deciding how to test a third-party API interaction:
Q: Does this test verify OUR code logic (error handling, data transformation, business rules)?
→ YES: Use mocks. Fast, controlled, no external dependencies.
Q: Does this test verify that our understanding of the API hasn't drifted?
→ YES: Use contract tests (schema validation, record/replay, or Pact).
Q: Does this test verify that the API actually works with our credentials?
→ YES: Use real integration tests. Run sparingly (nightly, pre-release).
Q: Does this test involve irreversible actions (send email, charge card)?
→ If testing business logic: mock the action, test the result.
→ If testing the action itself: use API test mode (Stripe test mode, SendGrid sandbox).Combining All Three Approaches
A mature third-party API test suite uses all three layers:
tests/
unit/ # Mocks — run on every commit
test_payment_logic.py # Tests order fulfillment, refund logic, etc.
test_email_templates.py # Tests template rendering, variable substitution
contracts/ # Contract tests — run on every commit, slightly slower
cassettes/ # VCR cassettes (recorded responses)
stripe_create_intent.yaml
github_get_user.yaml
test_stripe_contract.py # Schema validation, cassette replay
test_github_contract.py
integration/ # Real API tests — run nightly + pre-release
test_stripe_payment_flow.py # Against real Stripe test environment
test_github_api.py # Against real GitHub APICI configuration:
name: API Tests
on:
push:
branches: [main, 'release/**']
pull_request:
schedule:
- cron: '0 3 * * *' # Nightly
jobs:
unit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install -r requirements.txt
- run: pytest tests/unit/ -v
contracts:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install -r requirements.txt
- run: pytest tests/contracts/ -v
integration:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' || github.event_name == 'schedule'
steps:
- uses: actions/checkout@v4
- run: pip install -r requirements.txt
- run: pytest tests/integration/ -v -m integration --timeout=120
env:
STRIPE_TEST_SECRET_KEY: ${{ secrets.STRIPE_TEST_SECRET_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TEST_TOKEN }}Handling Rate Limits in Tests
Real integration tests hit real rate limits. Strategies:
1. Use separate test credentials with their own rate limits. Don't share test API keys between local development and CI.
2. Add delays between requests. Slow tests are better than flaky ones:
import time
@pytest.fixture(autouse=True)
def rate_limit_delay():
yield
time.sleep(0.5) # 500ms between each test3. Run integration tests sequentially, not in parallel:
pytest tests/integration/ -p no:randomly --workers=14. Cache responses that don't change between test runs:
@pytest.fixture(scope="session")
def github_repos():
"""Cache repo list for the test session — doesn't change between tests."""
return github_client.get_repos("myorg")Testing Third-Party Error Responses
Real APIs return errors that are hard to trigger consistently. Maintain a list of error conditions and test each one:
STRIPE_ERROR_SCENARIOS = [
# (error_code, error_message, expected_behavior)
("card_declined", "Your card was declined", "Order should remain pending"),
("insufficient_funds", "Your card has insufficient funds", "User should see specific message"),
("rate_limit", "Too many requests", "App should retry with exponential backoff"),
("api_connection_error", "Connection error", "App should queue for retry"),
("authentication_error", "No such API key", "App should alert operations team"),
]
@pytest.mark.parametrize("error_code,error_message,expected", STRIPE_ERROR_SCENARIOS)
def test_stripe_error_handling(error_code, error_message, expected):
"""Each Stripe error code must be handled specifically."""
from stripe.error import StripeError
with patch("stripe.PaymentIntent.create") as mock_create:
mock_create.side_effect = StripeError(message=error_message, code=error_code)
result = process_payment(amount=1000, payment_method_id="pm_test")
# Verify specific handling (not generic "payment failed")
assert result["error_code"] == error_code
# The expected behavior is documented in the parametrize listConclusion
Third-party API testing requires three layers working together: mocks for fast, controlled unit tests of your business logic; contract tests to catch API drift without live network calls; and real integration tests for high-stakes validation before release. Most teams over-use mocks and skip contracts entirely — the result is tests that pass while production integrations are broken. Add contract tests to your pipeline and you get the speed of mocking with the accuracy of integration testing where it matters most.