Object Storage Testing Best Practices: Mocking vs Real Buckets, CORS, CDN Invalidation, and Encryption
Object storage tests fall into three tiers: unit tests with mocks (fast, free, no network), integration tests against local emulators (slower, but real behavior), and end-to-end tests against real cloud buckets (slowest, costs money, but catches IAM and CDN issues nothing else will). The key is knowing which tier each test belongs in and keeping them separated.
Key Takeaways
Mocks are for testing your code, not the cloud service. If the test is about whether your application handles a 403 correctly, mock it. If it's about whether your CORS configuration allows your frontend domain, you need a real bucket or a faithful emulator.
CORS policy testing must run against real endpoints. The browser enforces CORS via preflight requests — you can't verify a CORS policy works without making actual HTTP requests to a server that evaluates it.
CDN cache invalidation is notoriously hard to test. The only reliable approach is testing the invalidation API call itself, then verifying with a GET that returns a cache-miss header — never sleep and hope.
Server-side encryption testing verifies metadata, not the encrypted content. You can't decrypt and inspect SSE-encrypted objects — you test that the ServerSideEncryption header in the response matches what you configured.
Cost-aware test design matters. A test that creates 1000 objects in a real S3 bucket in CI will cost you money every day. Parameterize bucket names, tag test resources, and clean them up in teardown — every time.
Teams who write good unit tests for their business logic but skip integration tests for their storage configuration get burned. The CORS policy you wrote in Terraform looks fine but your frontend still gets blocked. The CDN invalidation you call in your deploy script doesn't actually clear the right paths. The SSE-KMS key you specified in your bucket policy doesn't apply to uploads from a particular role. None of these are catchable with mocks.
Here's how to test each layer correctly.
The Three-Tier Test Strategy
Before writing any test, classify it:
| Tier | Tool | Speed | Cost | What it catches |
|---|---|---|---|---|
| Unit | moto / mocks | < 1s | $0 | Application logic errors |
| Integration | Azurite / minio / LocalStack | 5–30s | $0 | SDK usage, protocol behavior |
| E2E | Real AWS/Azure/GCS bucket | 30s–2min | Low but real | IAM, CDN, CORS, SSE-KMS |
The unit tier runs on every commit. Integration runs on every PR. E2E runs on merge to main or nightly.
# conftest.py — separate markers for each tier
import pytest
def pytest_configure(config):
config.addinivalue_line("markers", "unit: fast, mocked tests")
config.addinivalue_line("markers", "integration: requires local emulator (minio/azurite)")
config.addinivalue_line("markers", "e2e: requires real cloud bucket, runs nightly")
# Run only unit tests in CI on every push:
# pytest -m unit
# Run integration + unit on PR:
# pytest -m "unit or integration"
# Run everything nightly:
# pytest (no -m filter)Testing CORS Policies
CORS is enforced by the browser, which means it must be tested with real HTTP requests that include Origin headers — mocks won't help here. LocalStack Pro or a real bucket is required.
import pytest
import requests
import json
import boto3
# This test requires a real S3 bucket or LocalStack Pro
@pytest.mark.e2e
def test_cors_allows_frontend_origin(s3_client, real_bucket, frontend_origin="https://app.example.com"):
"""Test that the CORS policy allows requests from the frontend domain."""
# Set the CORS configuration
cors_config = {
"CORSRules": [
{
"AllowedOrigins": [frontend_origin],
"AllowedMethods": ["GET", "PUT", "POST", "DELETE"],
"AllowedHeaders": ["*"],
"ExposeHeaders": ["ETag", "x-amz-request-id"],
"MaxAgeSeconds": 3600
}
]
}
s3_client.put_bucket_cors(
Bucket=real_bucket,
CORSConfiguration=cors_config
)
# Get the bucket's endpoint URL
bucket_url = f"https://{real_bucket}.s3.amazonaws.com/test-object.txt"
# Simulate a browser preflight request
preflight_response = requests.options(
bucket_url,
headers={
"Origin": frontend_origin,
"Access-Control-Request-Method": "GET",
"Access-Control-Request-Headers": "content-type"
}
)
assert preflight_response.status_code == 200
assert preflight_response.headers.get("Access-Control-Allow-Origin") == frontend_origin
assert "GET" in preflight_response.headers.get("Access-Control-Allow-Methods", "")
@pytest.mark.e2e
def test_cors_blocks_unknown_origin(s3_client, real_bucket):
"""Verify that unlisted origins are blocked."""
bucket_url = f"https://{real_bucket}.s3.amazonaws.com/test-object.txt"
preflight_response = requests.options(
bucket_url,
headers={
"Origin": "https://evil-attacker.com",
"Access-Control-Request-Method": "GET"
}
)
# S3 returns 403 or omits CORS headers for disallowed origins
allowed_origin = preflight_response.headers.get("Access-Control-Allow-Origin", "")
assert "evil-attacker.com" not in allowed_originFor unit tests of CORS configuration generation (e.g., your IaC code), test the JSON structure:
@pytest.mark.unit
def test_cors_config_generator_produces_correct_structure():
"""Unit test for code that generates CORS config JSON."""
from myapp.storage_config import build_cors_config
config = build_cors_config(
allowed_origins=["https://app.example.com", "https://staging.example.com"],
allowed_methods=["GET", "PUT"],
max_age_seconds=3600
)
assert len(config["CORSRules"]) == 1
rule = config["CORSRules"][0]
assert "https://app.example.com" in rule["AllowedOrigins"]
assert "GET" in rule["AllowedMethods"]
assert "PUT" in rule["AllowedMethods"]
assert rule["MaxAgeSeconds"] == 3600Testing CDN Cache Invalidation
CDN invalidation is one of the hardest things to test reliably. Sleep-and-hope approaches ("wait 30 seconds then check") are unreliable and slow. The right approach: test the invalidation API call itself, then verify the response headers on a subsequent request show a cache miss.
import boto3
import time
@pytest.mark.e2e
def test_cloudfront_invalidation_is_submitted(cloudfront_client, distribution_id):
"""Test that invalidation is created and enters the invalidation queue."""
response = cloudfront_client.create_invalidation(
DistributionId=distribution_id,
InvalidationBatch={
"Paths": {
"Quantity": 2,
"Items": ["/images/*", "/assets/bundle.js"]
},
"CallerReference": f"test-{int(time.time())}"
}
)
invalidation = response["Invalidation"]
assert invalidation["Status"] in ("InProgress", "Completed")
assert len(invalidation["InvalidationBatch"]["Paths"]["Items"]) == 2
@pytest.mark.e2e
def test_invalidation_completes_within_timeout(cloudfront_client, distribution_id):
"""Test that an invalidation completes within a reasonable time."""
create_response = cloudfront_client.create_invalidation(
DistributionId=distribution_id,
InvalidationBatch={
"Paths": {"Quantity": 1, "Items": ["/*"]},
"CallerReference": f"test-wait-{int(time.time())}"
}
)
invalidation_id = create_response["Invalidation"]["Id"]
# Poll until complete or timeout
deadline = time.time() + 300 # 5 minute timeout
while time.time() < deadline:
status_response = cloudfront_client.get_invalidation(
DistributionId=distribution_id,
Id=invalidation_id
)
status = status_response["Invalidation"]["Status"]
if status == "Completed":
break
time.sleep(10)
assert status == "Completed", f"Invalidation did not complete within 5 minutes, status: {status}"For unit tests of the code that calls invalidation, mock the boto3 call:
from unittest.mock import MagicMock, patch
import pytest
@pytest.mark.unit
def test_deploy_triggers_cdn_invalidation():
"""Unit test: verify deploy code calls invalidation with correct paths."""
from myapp.deploy import DeployService
mock_cloudfront = MagicMock()
mock_cloudfront.create_invalidation.return_value = {
"Invalidation": {"Id": "ABCDEF", "Status": "InProgress"}
}
service = DeployService(cloudfront_client=mock_cloudfront, distribution_id="E123")
service.deploy(version="2.0.0")
mock_cloudfront.create_invalidation.assert_called_once()
call_args = mock_cloudfront.create_invalidation.call_args
paths = call_args[1]["InvalidationBatch"]["Paths"]["Items"]
assert "/*" in paths or any(p.endswith("*") for p in paths)Testing Server-Side Encryption
SSE testing verifies that objects are encrypted with the expected algorithm and key — you can't inspect the encrypted content itself, but you can inspect the metadata AWS returns.
from moto import mock_aws
import boto3
import pytest
@pytest.mark.unit
@mock_aws
def test_sse_s3_encryption_applied():
"""Test that SSE-S3 (AES-256) encryption is applied on upload."""
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket="encrypted-bucket")
# Set default encryption on the bucket
s3.put_bucket_encryption(
Bucket="encrypted-bucket",
ServerSideEncryptionConfiguration={
"Rules": [
{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
},
"BucketKeyEnabled": False
}
]
}
)
# Upload an object
s3.put_object(Bucket="encrypted-bucket", Key="sensitive/data.json", Body=b'{"ssn": "123-45-6789"}')
# Verify the encryption header
head = s3.head_object(Bucket="encrypted-bucket", Key="sensitive/data.json")
assert head.get("ServerSideEncryption") == "AES256"
@pytest.mark.unit
@mock_aws
def test_sse_kms_encryption_with_specific_key():
"""Test SSE-KMS with a specific KMS key ID."""
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket="kms-bucket")
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/abcd1234-ef56-7890-ab12-cd34ef567890"
s3.put_object(
Bucket="kms-bucket",
Key="secret/payload.json",
Body=b"encrypted payload",
ServerSideEncryption="aws:kms",
SSEKMSKeyId=kms_key_id
)
head = s3.head_object(Bucket="kms-bucket", Key="secret/payload.json")
assert head.get("ServerSideEncryption") == "aws:kms"
assert head.get("SSEKMSKeyId") == kms_key_id
@pytest.mark.unit
@mock_aws
def test_bucket_enforces_encryption_at_rest():
"""Verify bucket policy blocks non-encrypted uploads."""
import json
from botocore.exceptions import ClientError
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket="policy-enforced-bucket")
# Policy that denies PutObject without SSE
deny_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyUnencryptedObjectUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::policy-enforced-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}
]
}
s3.put_bucket_policy(
Bucket="policy-enforced-bucket",
Policy=json.dumps(deny_policy)
)
# Verify the policy is saved correctly
policy_response = s3.get_bucket_policy(Bucket="policy-enforced-bucket")
saved_policy = json.loads(policy_response["Policy"])
assert saved_policy["Statement"][0]["Effect"] == "Deny"
assert "s3:x-amz-server-side-encryption" in str(saved_policy)Cost-Aware Test Design
Real buckets cost money. A poorly designed test suite running 50 times a day can cost more than you expect. Here are the rules:
import boto3
import uuid
import pytest
@pytest.fixture(scope="session")
def e2e_bucket():
"""Create a uniquely named test bucket and clean it up after all E2E tests."""
s3 = boto3.client("s3", region_name="us-east-1")
# Unique name prevents collisions between parallel CI runs
bucket_name = f"hmt-test-{uuid.uuid4().hex[:12]}"
s3.create_bucket(
Bucket=bucket_name,
CreateBucketConfiguration={"LocationConstraint": "us-east-1"}
)
# Tag for cost allocation and cleanup
s3.put_bucket_tagging(
Bucket=bucket_name,
Tagging={
"TagSet": [
{"Key": "Environment", "Value": "test"},
{"Key": "ManagedBy", "Value": "pytest"},
{"Key": "AutoDelete", "Value": "true"}
]
}
)
yield bucket_name
# Cleanup: delete all objects then the bucket
paginator = s3.get_paginator("list_objects_v2")
for page in paginator.paginate(Bucket=bucket_name):
objects = [{"Key": obj["Key"]} for obj in page.get("Contents", [])]
if objects:
s3.delete_objects(Bucket=bucket_name, Delete={"Objects": objects})
s3.delete_bucket(Bucket=bucket_name)
# Lambda that queries for test buckets that were abandoned (CI failed before teardown)
def find_abandoned_test_buckets(s3_client, max_age_hours=24):
"""Find test buckets that were not cleaned up."""
from datetime import datetime, timezone, timedelta
cutoff = datetime.now(timezone.utc) - timedelta(hours=max_age_hours)
abandoned = []
response = s3_client.list_buckets()
for bucket in response["Buckets"]:
if bucket["Name"].startswith("hmt-test-") and bucket["CreationDate"] < cutoff:
abandoned.append(bucket["Name"])
return abandonedThe teardown runs even if tests fail because pytest fixtures with yield always run their cleanup block. The unique bucket name per test session prevents concurrent CI runs from interfering with each other.
Bucket Policy Testing
Test your IAM bucket policies verify the policy JSON you generate — not AWS's IAM evaluation engine (that requires real AWS):
import json
from moto import mock_aws
import boto3
@pytest.mark.unit
@mock_aws
def test_bucket_policy_restricts_access_to_vpc():
"""Verify the bucket policy generator produces a VPC-condition policy."""
from myapp.policy_generator import generate_vpc_restricted_policy
vpc_id = "vpc-0123456789abcdef0"
bucket_name = "production-assets"
policy = generate_vpc_restricted_policy(bucket_name=bucket_name, vpc_id=vpc_id)
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket=bucket_name)
s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy))
# Read back and verify
saved = json.loads(s3.get_bucket_policy(Bucket=bucket_name)["Policy"])
# Find the VPC condition statement
vpc_statement = next(
(s for s in saved["Statement"] if "aws:SourceVpc" in str(s.get("Condition", {}))),
None
)
assert vpc_statement is not None
assert vpc_id in str(vpc_statement["Condition"])
assert vpc_statement["Effect"] == "Deny"The pattern throughout all these tests is the same: unit tests verify your code's output structure with mocks, integration tests verify the SDK behaves correctly against a local server, and E2E tests verify the cloud service enforces your configuration. Match the test tier to what you're actually testing.