Testing Data Isolation in Multi-Tenant Microservices

Testing Data Isolation in Multi-Tenant Microservices

Testing data isolation in multi-tenant microservices is a security-critical requirement that most teams under-test. A failure in tenant isolation — where tenant A can read or modify tenant B's data — is not just a bug. It is a data breach. It triggers regulatory notification requirements under GDPR, SOC 2, and HIPAA. It can destroy customer trust overnight. And the failure mode is often invisible during normal testing because isolation bugs only manifest when you deliberately test cross-tenant access.

This guide covers the isolation patterns, the tests that verify them, and how to automate those tests in CI using multiple test tenants.

Tenant Isolation Patterns

There are three primary isolation architectures, each with different test implications:

Schema-Per-Tenant (PostgreSQL)

Each tenant gets their own schema within a shared database. Application sets search_path to the tenant schema on each connection.

-- Tenant A schema
CREATE SCHEMA tenant_a;
CREATE TABLE tenant_a.orders (id uuid, customer_id uuid, amount decimal);

-- Tenant B schema
CREATE SCHEMA tenant_b;
CREATE TABLE tenant_b.orders (id uuid, customer_id uuid, amount decimal);

Isolation failure mode: If the application fails to set search_path correctly (e.g., a pooled connection retains the previous tenant's path), tenant B's queries run against tenant A's schema.

Row-Level Security (PostgreSQL RLS)

All tenants share tables. RLS policies enforce isolation at the database level using a session variable for tenant context.

-- Enable RLS
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- Policy: users only see their tenant's rows
CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

-- Application sets this before every query
SET app.current_tenant_id = '550e8400-e29b-41d4-a716-446655440000';

Isolation failure mode: If app.current_tenant_id is not set, or is set to the wrong value, or if a query bypasses RLS (e.g., with a superuser connection), data leaks across tenants.

Separate Databases per Tenant

Each tenant has an entirely separate database (or database cluster). The application routes connections based on tenant identifier.

Isolation failure mode: Connection routing bug sends tenant B's queries to tenant A's database. Connection pool leak where a pooled connection from tenant A's pool is returned to tenant B's pool.

Pattern Isolation Level Ops Complexity Test Complexity
Separate Databases Strongest Highest Medium
Schema-per-Tenant Strong Medium High
Row-Level Security Good (DB-enforced) Low High
Application-Layer Only Weakest Low Very High

Setting Up Multi-Tenant Test Fixtures

The foundation of isolation testing is having at least two fully provisioned test tenants, each with known test data:

# conftest.py
import pytest
import psycopg2
import uuid
import requests

TENANT_A_ID = "aaaaaaaa-0000-0000-0000-000000000001"
TENANT_B_ID = "bbbbbbbb-0000-0000-0000-000000000002"

TENANT_A_TOKEN = None
TENANT_B_TOKEN = None

@pytest.fixture(scope="session", autouse=True)
def setup_test_tenants(db_connection):
    """Create two isolated test tenants with known data before any isolation tests run."""
    global TENANT_A_TOKEN, TENANT_B_TOKEN

    # Create tenant A with known data
    db_connection.execute("""
        INSERT INTO tenants (id, name, slug) VALUES (%s, 'Test Tenant A', 'tenant-a')
        ON CONFLICT DO NOTHING
    """, (TENANT_A_ID,))

    db_connection.execute("""
        INSERT INTO orders (id, tenant_id, customer_id, amount, reference)
        VALUES (%s, %s, %s, 99.99, 'ORDER-TENANT-A-001')
        ON CONFLICT DO NOTHING
    """, (str(uuid.uuid4()), TENANT_A_ID, str(uuid.uuid4())))

    # Create tenant B with known data
    db_connection.execute("""
        INSERT INTO tenants (id, name, slug) VALUES (%s, 'Test Tenant B', 'tenant-b')
        ON CONFLICT DO NOTHING
    """, (TENANT_B_ID,))

    db_connection.execute("""
        INSERT INTO orders (id, tenant_id, customer_id, amount, reference)
        VALUES (%s, %s, %s, 199.99, 'ORDER-TENANT-B-001')
        ON CONFLICT DO NOTHING
    """, (str(uuid.uuid4()), TENANT_B_ID, str(uuid.uuid4())))

    db_connection.commit()

    # Authenticate as each tenant and store tokens
    TENANT_A_TOKEN = authenticate("admin@tenant-a.example.com", "test-password-a")
    TENANT_B_TOKEN = authenticate("admin@tenant-b.example.com", "test-password-b")

    yield

    # Cleanup after test session
    db_connection.execute("DELETE FROM orders WHERE tenant_id IN (%s, %s)",
                          (TENANT_A_ID, TENANT_B_ID))
    db_connection.execute("DELETE FROM tenants WHERE id IN (%s, %s)",
                          (TENANT_A_ID, TENANT_B_ID))
    db_connection.commit()


def authenticate(email, password):
    response = requests.post("http://localhost:8000/auth/login",
                             json={"email": email, "password": password})
    assert response.status_code == 200, f"Failed to authenticate {email}"
    return response.json()["access_token"]

Testing API-Level Isolation

Every API endpoint that returns tenant-scoped data must be tested for cross-tenant access:

# test_api_isolation.py
import pytest
import requests

BASE_URL = "http://localhost:8000"

def get_headers(token):
    return {"Authorization": f"Bearer {token}"}

class TestOrdersIsolation:

    def test_tenant_a_cannot_list_tenant_b_orders(self):
        """Tenant A's token must not return any of Tenant B's orders."""
        response = requests.get(
            f"{BASE_URL}/api/orders",
            headers=get_headers(TENANT_A_TOKEN)
        )
        assert response.status_code == 200
        orders = response.json()["data"]

        # Verify no order belongs to tenant B
        for order in orders:
            assert order["tenant_id"] != TENANT_B_ID, \
                f"Cross-tenant data leak: Tenant A received order belonging to Tenant B: {order}"

    def test_tenant_a_cannot_access_tenant_b_order_by_id(self):
        """Tenant A must receive 404 (not 403) when accessing Tenant B's order by ID."""
        # Get a known Tenant B order ID
        tenant_b_order = requests.get(
            f"{BASE_URL}/api/orders",
            headers=get_headers(TENANT_B_TOKEN)
        ).json()["data"][0]

        # Try to access it as Tenant A
        response = requests.get(
            f"{BASE_URL}/api/orders/{tenant_b_order['id']}",
            headers=get_headers(TENANT_A_TOKEN)
        )

        # CRITICAL: Must be 404, not 403. 403 leaks the existence of the resource.
        # An attacker can enumerate resources via 403 vs 404 responses.
        assert response.status_code == 404, \
            f"Expected 404 (resource not found for this tenant), got {response.status_code}. " \
            f"403 leaks resource existence; 404 is the correct isolation response."

    def test_tenant_a_cannot_update_tenant_b_order(self):
        """Tenant A must not be able to update Tenant B's order."""
        tenant_b_order_id = get_first_order_id(TENANT_B_TOKEN)

        response = requests.patch(
            f"{BASE_URL}/api/orders/{tenant_b_order_id}",
            json={"amount": 0.01},  # attempt to modify
            headers=get_headers(TENANT_A_TOKEN)
        )

        assert response.status_code in (403, 404), \
            f"Cross-tenant write should be blocked, got {response.status_code}"

        # Verify the order was NOT modified
        tenant_b_order = requests.get(
            f"{BASE_URL}/api/orders/{tenant_b_order_id}",
            headers=get_headers(TENANT_B_TOKEN)
        ).json()
        assert tenant_b_order["amount"] == 199.99, \
            "Tenant B's order amount was modified by Tenant A — critical isolation failure"

    def test_tenant_a_cannot_delete_tenant_b_order(self):
        """Tenant A must not be able to delete Tenant B's order."""
        tenant_b_order_id = get_first_order_id(TENANT_B_TOKEN)

        response = requests.delete(
            f"{BASE_URL}/api/orders/{tenant_b_order_id}",
            headers=get_headers(TENANT_A_TOKEN)
        )
        assert response.status_code in (403, 404)

        # Verify it still exists for tenant B
        response = requests.get(
            f"{BASE_URL}/api/orders/{tenant_b_order_id}",
            headers=get_headers(TENANT_B_TOKEN)
        )
        assert response.status_code == 200, \
            "Tenant B's order was deleted by Tenant A — critical isolation failure"

Testing PostgreSQL RLS Policies

Test RLS policies directly at the database level to catch bypasses that might not surface through the API:

# test_rls_policies.py
import psycopg2
import pytest

@pytest.fixture
def tenant_a_connection(pg_dsn):
    """Database connection with Tenant A's RLS context set."""
    conn = psycopg2.connect(pg_dsn)
    conn.autocommit = True
    with conn.cursor() as cur:
        cur.execute("SET app.current_tenant_id = %s", (TENANT_A_ID,))
    return conn

@pytest.fixture
def tenant_b_connection(pg_dsn):
    """Database connection with Tenant B's RLS context set."""
    conn = psycopg2.connect(pg_dsn)
    conn.autocommit = True
    with conn.cursor() as cur:
        cur.execute("SET app.current_tenant_id = %s", (TENANT_B_ID,))
    return conn

def test_rls_blocks_cross_tenant_select(tenant_a_connection):
    """Direct SQL query as Tenant A should not return Tenant B's rows."""
    with tenant_a_connection.cursor() as cur:
        cur.execute("SELECT id, tenant_id, reference FROM orders")
        rows = cur.fetchall()

    tenant_ids_seen = {row[1] for row in rows}
    assert TENANT_B_ID not in str(tenant_ids_seen), \
        f"RLS policy leak: Tenant A's query returned rows with Tenant B's tenant_id. " \
        f"Tenant IDs seen: {tenant_ids_seen}"
    assert all(str(row[1]) == TENANT_A_ID for row in rows), \
        "All rows returned to Tenant A connection must have Tenant A's tenant_id"

def test_rls_blocks_cross_tenant_insert(tenant_a_connection):
    """Tenant A should not be able to insert a row claiming to be Tenant B."""
    with tenant_a_connection.cursor() as cur:
        with pytest.raises(psycopg2.errors.RaiseException) as exc_info:
            cur.execute("""
                INSERT INTO orders (id, tenant_id, customer_id, amount, reference)
                VALUES (gen_random_uuid(), %s, gen_random_uuid(), 1.00, 'MALICIOUS-INSERT')
            """, (TENANT_B_ID,))

    assert "rls" in str(exc_info.value).lower() or \
           "policy" in str(exc_info.value).lower() or \
           exc_info.type == psycopg2.errors.CheckViolation, \
        "INSERT with wrong tenant_id should be blocked by RLS policy"

def test_rls_not_bypassed_when_context_not_set(pg_dsn):
    """Queries without tenant context set must return no rows (fail safe)."""
    conn = psycopg2.connect(pg_dsn)
    conn.autocommit = True
    # Deliberately do NOT set app.current_tenant_id

    with conn.cursor() as cur:
        cur.execute("SELECT COUNT(*) FROM orders")
        count = cur.fetchone()[0]

    # RLS policy using current_setting with missing_ok=true will default to empty/null
    # which should match no rows — this is the fail-safe behavior
    assert count == 0, \
        f"Queries without tenant context should return 0 rows (fail safe). Got {count} rows."
    conn.close()

Testing Cache Leakage

Shared caches (Redis, Memcached, CDN) are a common source of cross-tenant data leakage:

# test_cache_isolation.py
import time

def test_cached_response_not_shared_between_tenants():
    """A cached response for Tenant A must not be returned to Tenant B."""
    # Warm cache for Tenant A
    response_a1 = requests.get(
        f"{BASE_URL}/api/dashboard/summary",
        headers=get_headers(TENANT_A_TOKEN)
    )
    assert response_a1.status_code == 200
    assert response_a1.headers.get("X-Cache") in (None, "MISS"), \
        "First request should be a cache miss"

    # Second request for Tenant A should hit cache
    response_a2 = requests.get(
        f"{BASE_URL}/api/dashboard/summary",
        headers=get_headers(TENANT_A_TOKEN)
    )
    # It might be a HIT now — but that's for Tenant A's own data, which is fine

    # Now request as Tenant B — must NOT get Tenant A's cached response
    response_b = requests.get(
        f"{BASE_URL}/api/dashboard/summary",
        headers=get_headers(TENANT_B_TOKEN)
    )
    assert response_b.status_code == 200

    # The response body for Tenant B must contain Tenant B's data, not A's
    data_b = response_b.json()
    if "tenant_id" in data_b:
        assert data_b["tenant_id"] == TENANT_B_ID, \
            "Cache returned Tenant A's data to Tenant B — cache key isolation failure"

    # Order references specific to Tenant A must not appear in Tenant B's response
    response_b_text = response_b.text
    assert "ORDER-TENANT-A-001" not in response_b_text, \
        "Tenant A's order reference appeared in Tenant B's response — cache leakage"

def test_cache_key_includes_tenant_identifier():
    """Verify cache keys are scoped to tenant (by inspecting cache directly or response headers)."""
    # If your cache layer adds an X-Cache-Key debug header:
    response = requests.get(
        f"{BASE_URL}/api/dashboard/summary",
        headers={**get_headers(TENANT_A_TOKEN), "X-Debug-Cache-Key": "1"}
    )

    cache_key = response.headers.get("X-Cache-Key", "")
    if cache_key:
        assert TENANT_A_ID in cache_key or "tenant-a" in cache_key, \
            f"Cache key '{cache_key}' must include tenant identifier to prevent cross-tenant cache hits"

Automating Isolation Tests in CI

Configure isolation tests as a mandatory CI gate. They should run on every PR, not just nightly:

# .github/workflows/isolation-tests.yml
name: Data Isolation Tests

on:
  pull_request:
  push:
    branches: [main]

jobs:
  isolation-tests:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: testdb
          POSTGRES_USER: testuser
          POSTGRES_PASSWORD: testpass
        options: >-
          --health-cmd pg_isready
          --health-interval 5s
          --health-timeout 5s
          --health-retries 10

      redis:
        image: redis:7
        options: --health-cmd "redis-cli ping"

    steps:
      - uses: actions/checkout@v4

      - name: Run database migrations
        run: |
          DATABASE_URL=postgresql://testuser:testpass@localhost/testdb \
          python manage.py migrate

      - name: Seed test tenants
        run: |
          DATABASE_URL=postgresql://testuser:testpass@localhost/testdb \
          python scripts/seed_test_tenants.py

      - name: Start application
        run: |
          DATABASE_URL=postgresql://testuser:testpass@localhost/testdb \
          REDIS_URL=redis://localhost:6379 \
          python app.py &
          sleep 3  # wait for startup

      - name: Run isolation tests
        run: |
          pytest tests/isolation/ -v \
            --tb=short \
            -m "isolation" \
            --junit-xml=isolation-results.xml

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: isolation-test-results
          path: isolation-results.xml

Isolation Test Coverage Matrix

Data Path Test Required
GET list endpoint Cross-tenant rows must not appear
GET single resource by ID Must return 404 for other tenant's IDs
POST create Must scope to authenticated tenant
PATCH update Must block cross-tenant modification
DELETE Must block cross-tenant deletion
File/attachment access Signed URLs must be tenant-scoped
Export/report generation Must include only requesting tenant's data
Search/filter endpoints Full-text search must not cross tenant boundary
Webhook delivery Events must go to correct tenant's endpoints
Cached API responses Cache keys must include tenant identifier
Database direct queries RLS must block without app.current_tenant_id set
Queue message consumption Consumer must validate tenant_id on each message

Multi-tenant isolation failures are high-severity incidents. The tests above are not optional quality improvements — they are the minimum bar for operating a SaaS product responsibly. Running them in CI on every PR, with real database-level isolation tests including RLS validation, is the only way to catch isolation regressions before they reach production and become breach notifications.

Tools like HelpMeTest can run these isolation test suites continuously across environments, giving you permanent audit evidence that your data isolation controls were functioning at every deployment — which is exactly what SOC 2 Type II auditors ask for.

Read more