Playwright in CI/CD: GitHub Actions, Docker, and Parallel Sharding

Playwright in CI/CD: GitHub Actions, Docker, and Parallel Sharding

Getting Playwright tests to run reliably in CI is one of the most common challenges teams face. Tests that pass locally fail in CI, pipelines are slow, and flaky results erode trust in the suite. This guide covers everything you need: GitHub Actions setup, Docker configuration, parallel sharding, retries, and reporting.

The Playwright Docker Image

Playwright tests require browser binaries (Chromium, Firefox, WebKit) and their system dependencies. The easiest way to get a consistent environment is to use Playwright's official Docker image:

mcr.microsoft.com/playwright:v1.44.0-jammy

This image includes Node.js, all three browser binaries, and every system dependency. No need to install anything manually.

GitHub Actions: Basic Setup

# .github/workflows/playwright.yml
name: Playwright Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.44.0-jammy

    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Run Playwright tests
        run: npx playwright test
        env:
          BASE_URL: ${{ vars.BASE_URL }}
          API_TOKEN: ${{ secrets.API_TOKEN }}

      - name: Upload test results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: playwright-report/
          retention-days: 30

The if: always() on the upload step ensures reports are uploaded even when tests fail — which is exactly when you need them most.

Using the ubuntu-latest Runner (Without Docker)

If you prefer not to use a container, you can install browsers directly on the runner:

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Install Playwright browsers
        run: npx playwright install --with-deps chromium

      - name: Run tests
        run: npx playwright test --project=chromium

Using --with-deps installs browser binaries and their OS dependencies in one command. Specifying chromium only (instead of all browsers) speeds up the install and the run.

Caching Browser Binaries

Browser installations are slow (30-60 seconds). Cache them between runs:

- name: Cache Playwright browsers
  uses: actions/cache@v4
  with:
    path: ~/.cache/ms-playwright
    key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}

- name: Install Playwright browsers
  run: npx playwright install --with-deps

When package-lock.json doesn't change, browsers are restored from cache instead of downloaded.

Parallel Sharding

For large test suites, sharding splits tests across multiple CI jobs that run in parallel. A 400-test suite that takes 20 minutes on one runner takes 5 minutes split across 4.

Configure shards in the workflow:

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]

    container:
      image: mcr.microsoft.com/playwright:v1.44.0-jammy

    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      
      - name: Run shard ${{ matrix.shard }}/4
        run: npx playwright test --shard=${{ matrix.shard }}/4

      - name: Upload shard blob report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: blob-report-${{ matrix.shard }}
          path: blob-report/
          retention-days: 1

Merge shard reports into one:

  merge-reports:
    needs: test
    runs-on: ubuntu-latest
    if: always()

    steps:
      - uses: actions/checkout@v4
      - run: npm ci

      - name: Download all blob reports
        uses: actions/download-artifact@v4
        with:
          path: all-blob-reports
          pattern: blob-report-*
          merge-multiple: true

      - name: Merge reports
        run: npx playwright merge-reports --reporter html ./all-blob-reports

      - name: Upload combined report
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: playwright-report/
          retention-days: 30

This gives you one unified HTML report combining results from all shards.

Playwright Config for CI

Tune your playwright.config.ts with CI-specific settings:

import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './tests',
  
  // Fail fast on CI; allow retries locally
  forbidOnly: !!process.env.CI,
  
  // Retry flaky tests in CI
  retries: process.env.CI ? 2 : 0,
  
  // Limit parallelism per shard
  workers: process.env.CI ? 4 : undefined,
  
  reporter: [
    ['html'],
    ['blob'],              // for sharded merging
    ['github'],            // GitHub PR annotations
    ['junit', { outputFile: 'results.xml' }],
  ],

  use: {
    baseURL: process.env.BASE_URL || 'http://localhost:3000',
    trace: 'on-first-retry',    // Capture trace on retry
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },

  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
    { name: 'firefox', use: { ...devices['Desktop Firefox'] } },
    { name: 'mobile-safari', use: { ...devices['iPhone 13'] } },
  ],
});

Key settings:

  • forbidOnly — prevents accidentally committing test.only() in CI
  • retries: 2 — retries flaky tests twice before marking them failed
  • trace: 'on-first-retry' — captures a full trace for analysis when a test is retried
  • reporter: ['github'] — adds inline test failure annotations to GitHub PRs

Handling Flaky Tests

Flaky tests undermine CI trust more than failing tests do. Playwright has built-in tools to deal with them.

Identify flaky tests:

npx playwright test --reporter=json > results.json
<span class="hljs-comment"># Look for tests with status "flaky"

Quarantine known flakeys:

test.fixme('known flaky — JIRA-123', async ({ page }) => {
  // test body
});

test.fixme marks the test as expected to fail. CI stays green, the flaky test is tracked.

Find flakiness before it hits CI:

npx playwright test --repeat-each 5

Runs every test 5 times. Any test that doesn't pass all 5 runs is flaky.

Docker Compose for Local CI Parity

Reproduce the exact CI environment locally with Docker Compose:

# docker-compose.test.yml
version: '3'
services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      NODE_ENV: test
      DATABASE_URL: postgres://test:test@db:5432/testdb

  db:
    image: postgres:16
    environment:
      POSTGRES_USER: test
      POSTGRES_PASSWORD: test
      POSTGRES_DB: testdb

  playwright:
    image: mcr.microsoft.com/playwright:v1.44.0-jammy
    depends_on:
      - app
    volumes:
      - .:/app
      - /app/node_modules
    working_dir: /app
    environment:
      BASE_URL: http://app:3000
    command: npx playwright test

Run everything locally:

docker compose -f docker-compose.test.yml up --abort-on-container-exit

Test Artifacts and Debugging Failures

When a test fails in CI, you need three things: the error message, a screenshot, and a trace.

Playwright trace viewer: After downloading the artifact from CI, open it locally:

npx playwright show-trace trace.zip

The trace shows every action, network request, screenshot, and console log — in timeline view. It's the fastest way to understand why a test failed.

Automatic trace on failure:

use: {
  trace: 'on-first-retry',
  screenshot: 'only-on-failure',
  video: 'on-first-retry',
}

These settings capture artifacts only when needed, keeping storage usage low.

Running Specific Tests in CI

Sometimes you want to run only tests affected by a change:

# Run only tests tagged @critical
npx playwright <span class="hljs-built_in">test --grep @critical

<span class="hljs-comment"># Run only tests in the auth/ directory
npx playwright <span class="hljs-built_in">test tests/auth/

<span class="hljs-comment"># Skip slow tests tagged @slow
npx playwright <span class="hljs-built_in">test --grep-invert @slow

Add tags in your tests:

test('@critical user can log in', async ({ page }) => { ... });
test('@slow bulk import 1000 records', async ({ page }) => { ... });

Sending Results to a Dashboard

Storing reports in GitHub artifacts works, but it's hard to spot trends or track flakiness over time. For continuous monitoring — running your Playwright tests every 5 minutes and alerting on failure — HelpMeTest integrates with your test suite without changing your workflow.

You can also use Playwright's built-in playwright.dev/dashboard for self-hosted reporting, or third-party services like Currents or GitHub Actions test summaries.

Summary

A production-grade Playwright CI setup:

  1. Use the official Docker image — eliminates "works on my machine" browser issues
  2. Cache browser binaries — saves 30-60 seconds per run
  3. Shard across parallel jobs — reduces wall-clock time proportionally
  4. Set retries: 2 in CI — absorbs flakiness without failing the pipeline
  5. Capture traces on failure — makes debugging CI failures fast
  6. Merge shard reports — one unified report across all parallel jobs

The full config is maybe 30 lines. Don't over-engineer it — start with a single runner, add sharding only when the suite takes more than 5 minutes.

Read more