Ephemeral Test Environments with Docker: Spin Up, Test, Tear Down

Ephemeral Test Environments with Docker: Spin Up, Test, Tear Down

Flaky tests that pass locally but fail in CI. Tests that corrupt shared state and cause downstream failures. Debugging sessions that end with "works on my machine." These problems share a root cause: your tests are fighting over shared, mutable environments.

Ephemeral test environments solve this by treating each test run as a fresh start. Spin up a complete environment, run your tests, tear everything down. No leftovers, no contamination, no mystery failures caused by last Tuesday's broken test run.

Why Ephemeral Environments Matter

A persistent staging environment accumulates state. Developers push half-finished features. Someone runs a migration that wasn't ready. A test from last week left a record in the database that breaks today's assertion. Before long, your staging environment is a snowflake — unique, fragile, and impossible to reproduce.

Ephemeral environments eliminate this class of problem entirely. Each environment is:

  • Identical at start — same image, same seed data, same configuration
  • Independent — no environment shares state with another
  • Disposable — tear it down without ceremony; create another in seconds

Docker and Docker Compose make this practical even for complex multi-service applications.

Structuring Your Docker Compose Test File

Keep a separate Compose file for testing. Your production docker-compose.yml likely has volume mounts and restart policies that are wrong for test runs.

# docker-compose.test.yml
version: "3.9"

services:
  app:
    build:
      context: .
      target: test
    environment:
      - NODE_ENV=test
      - DATABASE_URL=postgres://testuser:testpass@db:5432/testdb
      - REDIS_URL=redis://cache:6379
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_started
    ports:
      - "3000:3000"

  db:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: testuser
      POSTGRES_PASSWORD: testpass
      POSTGRES_DB: testdb
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U testuser -d testdb"]
      interval: 2s
      timeout: 5s
      retries: 10
    tmpfs:
      - /var/lib/postgresql/data

  cache:
    image: redis:7-alpine
    tmpfs:
      - /data

Two details worth highlighting. First, tmpfs mounts for the database and cache — data lives in RAM, never touches disk, and vanishes completely when the container stops. This is faster than a volume and guarantees clean state. Second, the healthcheck on the database with depends_on using condition: service_healthy — your app won't start until Postgres is actually accepting connections, not just running.

The Test Lifecycle Script

Wrap your test execution in a script that handles setup, execution, and teardown as a unit:

#!/bin/bash
<span class="hljs-built_in">set -e

PROJECT_NAME=<span class="hljs-string">"test-${BUILD_ID:-$(date +%s)}"

<span class="hljs-function">cleanup() {
  <span class="hljs-built_in">echo <span class="hljs-string">"Tearing down environment: $PROJECT_NAME"
  docker compose -p <span class="hljs-string">"$PROJECT_NAME" -f docker-compose.test.yml down --volumes --remove-orphans
}

<span class="hljs-built_in">trap cleanup EXIT

<span class="hljs-built_in">echo <span class="hljs-string">"Starting environment: $PROJECT_NAME"
docker compose -p <span class="hljs-string">"$PROJECT_NAME" -f docker-compose.test.yml up -d --build

<span class="hljs-built_in">echo <span class="hljs-string">"Waiting for app to be ready..."
<span class="hljs-built_in">timeout 60 bash -c <span class="hljs-string">'until curl -sf http://localhost:3000/health; do sleep 1; done'

<span class="hljs-built_in">echo <span class="hljs-string">"Running migrations and seed..."
docker compose -p <span class="hljs-string">"$PROJECT_NAME" -f docker-compose.test.yml <span class="hljs-built_in">exec -T app npm run db:migrate
docker compose -p <span class="hljs-string">"$PROJECT_NAME" -f docker-compose.test.yml <span class="hljs-built_in">exec -T app npm run db:seed:<span class="hljs-built_in">test

<span class="hljs-built_in">echo <span class="hljs-string">"Running tests..."
npm <span class="hljs-built_in">test -- --base-url http://localhost:3000

<span class="hljs-comment"># cleanup runs via trap on EXIT

The PROJECT_NAME with a unique suffix is critical. It namespaces every container, network, and volume under that project. Run this script twice simultaneously and you get two completely independent environments with no conflicts. The trap cleanup EXIT ensures teardown happens whether tests pass, fail, or the script is interrupted.

Parallelizing With Isolated Environments

Once your tests run in isolated environments, parallelism becomes safe. Each worker gets its own stack:

# .github/workflows/test.yml
jobs:
  test:
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - name: Run test shard
        env:
          BUILD_ID: "shard-${{ matrix.shard }}-${{ github.run_id }}"
          TEST_SHARD: "${{ matrix.shard }}/4"
        run: ./scripts/run-tests.sh

Four shards, four independent Docker environments, running in parallel. Total test time drops to roughly a quarter without any test interference — because there is no shared state to interfere with.

Resource Management and Cleanup

Ephemeral environments fail quietly if you don't clean up after failures. A crashed CI runner leaves containers and volumes behind, eventually exhausting disk or hitting Docker's container limit.

Add a periodic cleanup job:

# Remove containers older than 2 hours with the test label
docker ps -a --filter <span class="hljs-string">"label=purpose=test" --filter <span class="hljs-string">"status=exited" \
  --format <span class="hljs-string">"{{.ID}} {{.CreatedAt}}" <span class="hljs-pipe">| \
  awk -v cutoff=<span class="hljs-string">"$(date -d '2 hours ago' +%s)" \
  <span class="hljs-string">'{if ($2 < cutoff) print $1}' <span class="hljs-pipe">| \
  xargs -r docker <span class="hljs-built_in">rm

<span class="hljs-comment"># Prune dangling volumes
docker volume prune -f --filter <span class="hljs-string">"label=purpose=test"

Label your test containers in the Compose file:

services:
  app:
    labels:
      purpose: test
      build-id: "${BUILD_ID}"

Integrating With HelpMeTest

When you're running browser-level tests with HelpMeTest, ephemeral Docker environments give you the cleanest possible target. Start your environment, note the URL, run your HelpMeTest suite against it, then tear it down.

HelpMeTest's test runner accepts a base URL at runtime, so pointing it at http://localhost:3000 (or your CI host's equivalent) requires no test changes between environments. Your tests describe behavior — the environment is just a detail injected at runtime.

The result is a test suite that runs identically on a developer's laptop, in GitHub Actions, and in any other CI system you connect to. No "it worked in staging" conversations. No debugging environment-specific failures. Just tests that describe what your application should do, running against a fresh environment every time.

Read more