Testing

Third-Party API Testing Strategies: Contract Tests vs Integration Tests vs Mocks

HelpMeTest

17 May 2026 — 6 min read

Every third-party API integration needs tests — but mocking, contract testing, and real integration testing each have different tradeoffs. Mocks are fast but drift from reality. Real integration tests are accurate but slow and expensive. Contract tests are the middle ground that most teams miss. This guide covers when to use each approach and how to combine them.

Your application integrates with Stripe, GitHub, Twilio, SendGrid, and Slack. Each needs tests. But which kind?

Most teams default to one of two extremes: mock everything (fast, but mocks drift from real APIs) or test against real APIs (accurate, but slow, expensive, and rate-limited). The answer is neither extreme — it's a deliberate combination of all three approaches, each at the right level.

The Testing Triangle for Third-Party APIs

            Real Integration Tests
                  (top: few, slow, accurate)
                       /\
                      /  \
                     /    \
          Contract Tests
                   (middle: some, medium speed)
                  /              \
                 /                \
        Unit Tests with Mocks
           (bottom: many, fast, may drift)

Most testing should happen at the bottom. Some at the middle. A few at the top.

When to Use Mocks

Mocks are appropriate when:

Testing your code's behavior, not the third-party's behavior
Running tests on every commit (needs to be fast)
Testing error handling that's hard to trigger with real APIs
The API has irreversible effects (send email, charge card, delete record)
You're testing business logic that happens to call an API

# Good use of mocks: testing business logic
def test_order_fulfillment_marks_order_complete_after_charge():
    """After a successful charge, the order should be marked complete."""
    # We're testing OUR logic, not Stripe's behavior
    with patch("app.payments.stripe_client") as mock_stripe:
        mock_charge = MagicMock()
        mock_charge.status = "succeeded"
        mock_charge.id = "ch_test_123"
        mock_stripe.charges.create.return_value = mock_charge
        
        order = Order(id="ord_123", total_cents=4999)
        result = fulfill_order(order)
    
    assert result.status == "complete"
    assert result.charge_id == "ch_test_123"
    
    # Stripe behavior was mocked — we're testing our order logic

The Mock Drift Problem

Mocks are a snapshot of your understanding of the API at the time you wrote the test. APIs evolve. Stripe deprecated charges.create in favor of PaymentIntents. GitHub changed scope names. Twilio added required parameters.

Mock drift means tests can pass while real integrations are broken. The solution is contract tests.

When to Use Contract Tests

Contract testing validates your assumptions about an API against that API's actual behavior. It's the layer between mocks and full integration tests.

Approach 1: Schema Validation

Download the API's OpenAPI spec and validate your mock responses against it:

import jsonschema
import requests
import pytest

GITHUB_API_SPEC_URL = "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json"

@pytest.fixture(scope="session")
def github_spec():
    response = requests.get(GITHUB_API_SPEC_URL, timeout=30)
    return response.json()

def test_user_mock_matches_github_schema(github_spec):
    """Our mock GitHub user must match the real GitHub API response schema."""
    user_schema = github_spec["components"]["schemas"]["simple-user"]
    
    # Our mock user response
    our_mock_user = {
        "login": "alice",
        "id": 12345,
        "avatar_url": "https://avatars.githubusercontent.com/alice",
        "url": "https://api.github.com/users/alice",
        "html_url": "https://github.com/alice",
        "type": "User",
        "site_admin": False
    }
    
    try:
        jsonschema.validate(our_mock_user, user_schema)
    except jsonschema.ValidationError as e:
        pytest.fail(f"Mock user doesn't match GitHub schema: {e.message}")

Approach 2: Record and Replay

Record real API interactions once, commit them, replay in tests:

# Using VCR.py
import vcr

@vcr.use_cassette("cassettes/stripe_create_payment_intent.yaml", 
                  record_mode="none")  # "none" = replay only, never re-record
def test_stripe_payment_intent_response_format():
    """Validate that Stripe's response format matches our assumptions."""
    import stripe
    stripe.api_key = "sk_test_cassette_playback"
    
    intent = stripe.PaymentIntent.create(
        amount=2000,
        currency="usd"
    )
    
    # These fields must always be in the response — if Stripe changes them, test fails
    assert hasattr(intent, "id")
    assert hasattr(intent, "status")
    assert hasattr(intent, "client_secret")
    assert intent.object == "payment_intent"
    assert intent.amount == 2000

Recording cassettes: run once with record_mode="new_episodes", commit the cassette file, then switch to record_mode="none". The test always uses the recorded response — if Stripe changes their response format, the test will fail when you re-record.

Approach 3: Provider Contract Tests (Pact)

Pact is the leading consumer-driven contract testing framework. Your consumer (your app) defines what it expects from the provider (third-party API), and Pact verifies the provider satisfies the contract:

# pact/github_contract.py
from pact import Consumer, Provider

pact = Consumer("MyApp").has_pact_with(Provider("GitHub"))

def test_get_user_contract():
    """Define and verify the GitHub User API contract."""
    expected_user = {
        "login": "alice",
        "id": 12345,
        "email": "alice@example.com"
    }
    
    (pact
     .given("a GitHub user with login alice exists")
     .upon_receiving("a request for alice's profile")
     .with_request("GET", "/users/alice",
                   headers={"Authorization": "token test_token"})
     .will_respond_with(200,
                        headers={"Content-Type": "application/json"},
                        body={
                            "login": "alice",
                            "id": 12345,
                            "email": "alice@example.com",
                            # Pact matchers for flexible matching
                        }))
    
    with pact:
        from app.github import get_user
        user = get_user("alice")
    
    assert user.login == "alice"
    assert user.id == 12345

Pact generates a contract file that can be shared with the API provider for verification. GitHub, Stripe, and other major providers don't participate in Pact verification, but the consumer-side contract test still documents your assumptions and catches drift when you update cassettes.

When to Use Real Integration Tests

Real integration tests against actual APIs are appropriate for:

Critical paths where correctness matters more than speed (actual payment processing)
Pre-release validation before deploying to production
Smoke tests that verify your credentials and connectivity are working
Certification of new API versions before upgrading

# tests/integration/test_stripe_real.py
import pytest
import stripe

pytestmark = pytest.mark.integration  # Skip unless explicitly requested

def test_payment_intent_lifecycle():
    """Full payment lifecycle: create, confirm, refund."""
    stripe.api_key = os.environ["STRIPE_TEST_SECRET_KEY"]
    
    # Create
    intent = stripe.PaymentIntent.create(
        amount=1000,
        currency="usd",
        payment_method="pm_card_visa",  # Stripe test payment method
        confirm=True,
        return_url="https://example.com"
    )
    
    assert intent.status in ("succeeded", "requires_action")
    
    if intent.status == "succeeded":
        # Refund
        refund = stripe.Refund.create(payment_intent=intent.id)
        assert refund.status == "succeeded"
        assert refund.amount == 1000

Run real integration tests with:

# Only on main branch, pre-release
pytest tests/integration/ -m integration --<span class="hljs-built_in">timeout=60

Decision Framework

Use this decision tree when deciding how to test a third-party API interaction:

Q: Does this test verify OUR code logic (error handling, data transformation, business rules)?
→ YES: Use mocks. Fast, controlled, no external dependencies.

Q: Does this test verify that our understanding of the API hasn't drifted?
→ YES: Use contract tests (schema validation, record/replay, or Pact).

Q: Does this test verify that the API actually works with our credentials?
→ YES: Use real integration tests. Run sparingly (nightly, pre-release).

Q: Does this test involve irreversible actions (send email, charge card)?
→ If testing business logic: mock the action, test the result.
→ If testing the action itself: use API test mode (Stripe test mode, SendGrid sandbox).

Combining All Three Approaches

A mature third-party API test suite uses all three layers:

tests/
  unit/                         # Mocks — run on every commit
    test_payment_logic.py       # Tests order fulfillment, refund logic, etc.
    test_email_templates.py     # Tests template rendering, variable substitution
    
  contracts/                    # Contract tests — run on every commit, slightly slower
    cassettes/                  # VCR cassettes (recorded responses)
      stripe_create_intent.yaml
      github_get_user.yaml
    test_stripe_contract.py     # Schema validation, cassette replay
    test_github_contract.py
    
  integration/                  # Real API tests — run nightly + pre-release
    test_stripe_payment_flow.py # Against real Stripe test environment
    test_github_api.py          # Against real GitHub API

CI configuration:

name: API Tests

on:
  push:
    branches: [main, 'release/**']
  pull_request:
  schedule:
    - cron: '0 3 * * *'  # Nightly

jobs:
  unit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements.txt
      - run: pytest tests/unit/ -v

  contracts:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements.txt
      - run: pytest tests/contracts/ -v

  integration:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main' || github.event_name == 'schedule'
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements.txt
      - run: pytest tests/integration/ -v -m integration --timeout=120
        env:
          STRIPE_TEST_SECRET_KEY: ${{ secrets.STRIPE_TEST_SECRET_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TEST_TOKEN }}

Handling Rate Limits in Tests

Real integration tests hit real rate limits. Strategies:

1. Use separate test credentials with their own rate limits. Don't share test API keys between local development and CI.

2. Add delays between requests. Slow tests are better than flaky ones:

import time

@pytest.fixture(autouse=True)
def rate_limit_delay():
    yield
    time.sleep(0.5)  # 500ms between each test

3. Run integration tests sequentially, not in parallel:

pytest tests/integration/ -p no:randomly --workers=1

4. Cache responses that don't change between test runs:

@pytest.fixture(scope="session")
def github_repos():
    """Cache repo list for the test session — doesn't change between tests."""
    return github_client.get_repos("myorg")

Testing Third-Party Error Responses

Real APIs return errors that are hard to trigger consistently. Maintain a list of error conditions and test each one:

STRIPE_ERROR_SCENARIOS = [
    # (error_code, error_message, expected_behavior)
    ("card_declined", "Your card was declined", "Order should remain pending"),
    ("insufficient_funds", "Your card has insufficient funds", "User should see specific message"),
    ("rate_limit", "Too many requests", "App should retry with exponential backoff"),
    ("api_connection_error", "Connection error", "App should queue for retry"),
    ("authentication_error", "No such API key", "App should alert operations team"),
]

@pytest.mark.parametrize("error_code,error_message,expected", STRIPE_ERROR_SCENARIOS)
def test_stripe_error_handling(error_code, error_message, expected):
    """Each Stripe error code must be handled specifically."""
    from stripe.error import StripeError
    
    with patch("stripe.PaymentIntent.create") as mock_create:
        mock_create.side_effect = StripeError(message=error_message, code=error_code)
        
        result = process_payment(amount=1000, payment_method_id="pm_test")
    
    # Verify specific handling (not generic "payment failed")
    assert result["error_code"] == error_code
    # The expected behavior is documented in the parametrize list

Conclusion

Third-party API testing requires three layers working together: mocks for fast, controlled unit tests of your business logic; contract tests to catch API drift without live network calls; and real integration tests for high-stakes validation before release. Most teams over-use mocks and skip contracts entirely — the result is tests that pass while production integrations are broken. Add contract tests to your pipeline and you get the speed of mocking with the accuracy of integration testing where it matters most.