Playwright in CI/CD: GitHub Actions, Docker, and Parallel Sharding
Getting Playwright tests to run reliably in CI is one of the most common challenges teams face. Tests that pass locally fail in CI, pipelines are slow, and flaky results erode trust in the suite. This guide covers everything you need: GitHub Actions setup, Docker configuration, parallel sharding, retries, and reporting.
The Playwright Docker Image
Playwright tests require browser binaries (Chromium, Firefox, WebKit) and their system dependencies. The easiest way to get a consistent environment is to use Playwright's official Docker image:
mcr.microsoft.com/playwright:v1.44.0-jammyThis image includes Node.js, all three browser binaries, and every system dependency. No need to install anything manually.
GitHub Actions: Basic Setup
# .github/workflows/playwright.yml
name: Playwright Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
container:
image: mcr.microsoft.com/playwright:v1.44.0-jammy
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: npm ci
- name: Run Playwright tests
run: npx playwright test
env:
BASE_URL: ${{ vars.BASE_URL }}
API_TOKEN: ${{ secrets.API_TOKEN }}
- name: Upload test results
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: playwright-report/
retention-days: 30The if: always() on the upload step ensures reports are uploaded even when tests fail — which is exactly when you need them most.
Using the ubuntu-latest Runner (Without Docker)
If you prefer not to use a container, you can install browsers directly on the runner:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run tests
run: npx playwright test --project=chromiumUsing --with-deps installs browser binaries and their OS dependencies in one command. Specifying chromium only (instead of all browsers) speeds up the install and the run.
Caching Browser Binaries
Browser installations are slow (30-60 seconds). Cache them between runs:
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
- name: Install Playwright browsers
run: npx playwright install --with-depsWhen package-lock.json doesn't change, browsers are restored from cache instead of downloaded.
Parallel Sharding
For large test suites, sharding splits tests across multiple CI jobs that run in parallel. A 400-test suite that takes 20 minutes on one runner takes 5 minutes split across 4.
Configure shards in the workflow:
jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3, 4]
container:
image: mcr.microsoft.com/playwright:v1.44.0-jammy
steps:
- uses: actions/checkout@v4
- run: npm ci
- name: Run shard ${{ matrix.shard }}/4
run: npx playwright test --shard=${{ matrix.shard }}/4
- name: Upload shard blob report
if: always()
uses: actions/upload-artifact@v4
with:
name: blob-report-${{ matrix.shard }}
path: blob-report/
retention-days: 1Merge shard reports into one:
merge-reports:
needs: test
runs-on: ubuntu-latest
if: always()
steps:
- uses: actions/checkout@v4
- run: npm ci
- name: Download all blob reports
uses: actions/download-artifact@v4
with:
path: all-blob-reports
pattern: blob-report-*
merge-multiple: true
- name: Merge reports
run: npx playwright merge-reports --reporter html ./all-blob-reports
- name: Upload combined report
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: playwright-report/
retention-days: 30This gives you one unified HTML report combining results from all shards.
Playwright Config for CI
Tune your playwright.config.ts with CI-specific settings:
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './tests',
// Fail fast on CI; allow retries locally
forbidOnly: !!process.env.CI,
// Retry flaky tests in CI
retries: process.env.CI ? 2 : 0,
// Limit parallelism per shard
workers: process.env.CI ? 4 : undefined,
reporter: [
['html'],
['blob'], // for sharded merging
['github'], // GitHub PR annotations
['junit', { outputFile: 'results.xml' }],
],
use: {
baseURL: process.env.BASE_URL || 'http://localhost:3000',
trace: 'on-first-retry', // Capture trace on retry
screenshot: 'only-on-failure',
video: 'retain-on-failure',
},
projects: [
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
{ name: 'firefox', use: { ...devices['Desktop Firefox'] } },
{ name: 'mobile-safari', use: { ...devices['iPhone 13'] } },
],
});Key settings:
forbidOnly— prevents accidentally committingtest.only()in CIretries: 2— retries flaky tests twice before marking them failedtrace: 'on-first-retry'— captures a full trace for analysis when a test is retriedreporter: ['github']— adds inline test failure annotations to GitHub PRs
Handling Flaky Tests
Flaky tests undermine CI trust more than failing tests do. Playwright has built-in tools to deal with them.
Identify flaky tests:
npx playwright test --reporter=json > results.json
<span class="hljs-comment"># Look for tests with status "flaky"Quarantine known flakeys:
test.fixme('known flaky — JIRA-123', async ({ page }) => {
// test body
});test.fixme marks the test as expected to fail. CI stays green, the flaky test is tracked.
Find flakiness before it hits CI:
npx playwright test --repeat-each 5Runs every test 5 times. Any test that doesn't pass all 5 runs is flaky.
Docker Compose for Local CI Parity
Reproduce the exact CI environment locally with Docker Compose:
# docker-compose.test.yml
version: '3'
services:
app:
build: .
ports:
- "3000:3000"
environment:
NODE_ENV: test
DATABASE_URL: postgres://test:test@db:5432/testdb
db:
image: postgres:16
environment:
POSTGRES_USER: test
POSTGRES_PASSWORD: test
POSTGRES_DB: testdb
playwright:
image: mcr.microsoft.com/playwright:v1.44.0-jammy
depends_on:
- app
volumes:
- .:/app
- /app/node_modules
working_dir: /app
environment:
BASE_URL: http://app:3000
command: npx playwright testRun everything locally:
docker compose -f docker-compose.test.yml up --abort-on-container-exitTest Artifacts and Debugging Failures
When a test fails in CI, you need three things: the error message, a screenshot, and a trace.
Playwright trace viewer: After downloading the artifact from CI, open it locally:
npx playwright show-trace trace.zipThe trace shows every action, network request, screenshot, and console log — in timeline view. It's the fastest way to understand why a test failed.
Automatic trace on failure:
use: {
trace: 'on-first-retry',
screenshot: 'only-on-failure',
video: 'on-first-retry',
}These settings capture artifacts only when needed, keeping storage usage low.
Running Specific Tests in CI
Sometimes you want to run only tests affected by a change:
# Run only tests tagged @critical
npx playwright <span class="hljs-built_in">test --grep @critical
<span class="hljs-comment"># Run only tests in the auth/ directory
npx playwright <span class="hljs-built_in">test tests/auth/
<span class="hljs-comment"># Skip slow tests tagged @slow
npx playwright <span class="hljs-built_in">test --grep-invert @slowAdd tags in your tests:
test('@critical user can log in', async ({ page }) => { ... });
test('@slow bulk import 1000 records', async ({ page }) => { ... });Sending Results to a Dashboard
Storing reports in GitHub artifacts works, but it's hard to spot trends or track flakiness over time. For continuous monitoring — running your Playwright tests every 5 minutes and alerting on failure — HelpMeTest integrates with your test suite without changing your workflow.
You can also use Playwright's built-in playwright.dev/dashboard for self-hosted reporting, or third-party services like Currents or GitHub Actions test summaries.
Summary
A production-grade Playwright CI setup:
- Use the official Docker image — eliminates "works on my machine" browser issues
- Cache browser binaries — saves 30-60 seconds per run
- Shard across parallel jobs — reduces wall-clock time proportionally
- Set
retries: 2in CI — absorbs flakiness without failing the pipeline - Capture traces on failure — makes debugging CI failures fast
- Merge shard reports — one unified report across all parallel jobs
The full config is maybe 30 lines. Don't over-engineer it — start with a single runner, add sharding only when the suite takes more than 5 minutes.