BuildKite Testing: Parallel Test Runs and Test Analytics

BuildKite Testing: Parallel Test Runs and Test Analytics

Buildkite takes a different approach to CI/CD. Rather than running builds on Buildkite's infrastructure, you run Buildkite Agents on your own machines or cloud. Buildkite coordinates the work; your infrastructure executes it. This gives you full control over the build environment while keeping the pipeline management in Buildkite's cloud.

The other major feature that sets Buildkite apart is Test Analytics — a built-in dashboard that tracks test performance over time, surfaces flaky tests, and shows which tests are slowest. For teams with large test suites, this visibility is valuable.

How Buildkite Works

Buildkite Agents are lightweight daemons that run on your machines. You install agents on your servers, cloud VMs, or Kubernetes pods. Agents poll Buildkite for work, execute pipeline steps, and report results back.

Pipelines are defined in .buildkite/pipeline.yml. They can also be dynamically generated using scripts — a powerful feature for generating parallelized test configurations programmatically.

Plugins extend pipeline steps with pre-built functionality, like Docker integration, test uploading, and caching.

Installing Buildkite Agents

On Ubuntu/Debian:

echo deb https://apt.buildkite.com/buildkite-agent stable main \
  <span class="hljs-pipe">| <span class="hljs-built_in">sudo <span class="hljs-built_in">tee /etc/apt/sources.list.d/buildkite-agent.list
curl -fsSL https://keys.openpgp.org/vks/v1/by-fingerprint/32A37959C2FA5C3C99EFBC32A79206BE8F8979C9 \
  <span class="hljs-pipe">| <span class="hljs-built_in">sudo gpg --dearmor -o /etc/apt/keyrings/buildkite-agent.gpg

<span class="hljs-built_in">sudo apt-get update && <span class="hljs-built_in">sudo apt-get install buildkite-agent

<span class="hljs-comment"># Configure with your agent token
<span class="hljs-built_in">sudo sed -i <span class="hljs-string">"s/xxx/$BUILDKITE_AGENT_TOKEN/g" /etc/buildkite-agent/buildkite-agent.cfg
<span class="hljs-built_in">sudo systemctl <span class="hljs-built_in">enable buildkite-agent && <span class="hljs-built_in">sudo systemctl start buildkite-agent

On macOS:

brew install buildkite/buildkite/buildkite-agent
buildkite-agent start

With Docker:

docker run -d \
  -e BUILDKITE_AGENT_TOKEN=$BUILDKITE_AGENT_TOKEN \
  buildkite/agent

Basic Pipeline Configuration

# .buildkite/pipeline.yml
steps:
  - label: ":jest: Unit Tests"
    command: |
      npm ci
      npm test
    agents:
      queue: default

  - label: ":python: Python Tests"
    command: |
      pip install -r requirements.txt
      pytest --junitxml=junit.xml
    agents:
      queue: default

Pipeline files live in .buildkite/pipeline.yml and are uploaded to Buildkite via the API or the Buildkite UI.

Running Tests with Docker

Using the Docker plugin for isolated environments:

steps:
  - label: "Unit Tests"
    plugins:
      - docker#v5.10.0:
          image: node:20
          command: ["sh", "-c", "npm ci && npm test"]
          environment:
            - NODE_ENV=test

Docker Compose plugin for services:

steps:
  - label: "Integration Tests"
    plugins:
      - docker-compose#v5.1.0:
          run: app
          config: docker-compose.test.yml
# docker-compose.test.yml
version: '3.8'
services:
  app:
    build: .
    command: npm run test:integration
    environment:
      DATABASE_URL: postgresql://test:test@postgres/testdb
    depends_on:
      - postgres
  
  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: test
      POSTGRES_PASSWORD: test
      POSTGRES_DB: testdb

Parallel Test Execution

Buildkite's parallel feature is clean and explicit:

steps:
  - label: "Tests (shard %n)"
    command: |
      npm ci
      npx jest --shard=$BUILDKITE_PARALLEL_JOB_INDEX/$BUILDKITE_PARALLEL_JOB_COUNT
    parallelism: 4

Setting parallelism: 4 creates 4 identical steps that run simultaneously on different agents. Buildkite provides:

  • $BUILDKITE_PARALLEL_JOB_INDEX — 0-based index of the current job (0, 1, 2, 3)
  • $BUILDKITE_PARALLEL_JOB_COUNT — total number of parallel jobs (4)

For Playwright:

steps:
  - label: "E2E Tests (shard %n)"
    command: |
      npm ci
      npx playwright install --with-deps
      npx playwright test --shard=$((BUILDKITE_PARALLEL_JOB_INDEX + 1))/$BUILDKITE_PARALLEL_JOB_COUNT
    parallelism: 4

Dynamic parallelism with pipeline upload

For truly dynamic test splitting, generate the pipeline at runtime:

# scripts/generate-pipeline.sh
<span class="hljs-comment">#!/bin/bash

<span class="hljs-comment"># Count test files and generate appropriate shards
TEST_COUNT=$(find . -name <span class="hljs-string">"*.test.js" <span class="hljs-pipe">| <span class="hljs-built_in">wc -l)
SHARD_COUNT=$(( (TEST_COUNT + <span class="hljs-number">9) / <span class="hljs-number">10 ))  <span class="hljs-comment"># 10 tests per shard

<span class="hljs-built_in">cat <<<span class="hljs-string">EOF
steps:
EOF

<span class="hljs-keyword">for i <span class="hljs-keyword">in $(<span class="hljs-built_in">seq 0 $((SHARD_COUNT - <span class="hljs-number">1))); <span class="hljs-keyword">do
  <span class="hljs-built_in">cat <<<span class="hljs-string">EOF
  - label: "Tests shard $i"
    command: npx jest --shard=$((i + 1))/$SHARD_COUNT
EOF
<span class="hljs-keyword">done

In your pipeline:

steps:
  - label: "Generate pipeline"
    command: "bash scripts/generate-pipeline.sh | buildkite-agent pipeline upload"

Buildkite Test Analytics

Test Analytics is Buildkite's built-in service for tracking test performance. It ingests test results and provides:

  • Historical pass/fail rates per test
  • Execution time trends
  • Flaky test detection
  • Slowest tests by duration

Setting up Test Analytics

  1. Go to Buildkite → Test Analytics → Create a test suite
  2. Get your API token
  3. Upload test results from your pipeline

Upload JUnit XML results:

steps:
  - label: "Tests"
    command: |
      npm ci
      npx jest --ci --reporters=default --reporters=jest-junit
    plugins:
      - test-collector#v1.10.2:
          files: "test-results/**/*.xml"
          format: "junit"
          api-token-env-var: BUILDKITE_ANALYTICS_TOKEN

The plugin reads your JUNIT XML and sends results to Buildkite Test Analytics.

Ruby (RSpec)

steps:
  - label: "RSpec"
    command: |
      bundle install
      bundle exec rspec --format progress --format RspecJunitFormatter --out test-results/rspec.xml
    plugins:
      - test-collector#v1.10.2:
          files: "test-results/rspec.xml"
          format: "junit"
          api-token-env-var: BUILDKITE_ANALYTICS_TOKEN

Python (pytest)

steps:
  - label: "pytest"
    command: |
      pip install -r requirements.txt
      pytest --junitxml=test-results/junit.xml
    plugins:
      - test-collector#v1.10.2:
          files: "test-results/junit.xml"
          format: "junit"
          api-token-env-var: BUILDKITE_ANALYTICS_TOKEN

Go

steps:
  - label: "Go tests"
    command: |
      go test ./... -v 2>&1 | go-junit-report > test-results/junit.xml
    plugins:
      - test-collector#v1.10.2:
          files: "test-results/junit.xml"
          format: "junit"
          api-token-env-var: BUILDKITE_ANALYTICS_TOKEN

Caching with the Cache Plugin

steps:
  - label: "Tests"
    plugins:
      - gencer/cache#v2.4.9:
          backend: s3
          bucket: your-buildkite-cache-bucket
          region: us-east-1
          key: "v1-npm-{{ checksum 'package-lock.json' }}"
          paths:
            - node_modules/
    command: |
      npm ci
      npm test

Or using the file system on self-hosted agents:

steps:
  - label: "Tests"
    command: |
      # Restore cache if exists
      if [ -d /cache/node_modules ]; then
        cp -r /cache/node_modules ./node_modules
      fi
      
      npm ci
      npm test
      
      # Save cache
      cp -r ./node_modules /cache/

Environment Variables and Secrets

Set environment variables at the pipeline level in Buildkite settings or pass them to steps:

steps:
  - label: "Integration Tests"
    command: npm run test:integration
    env:
      NODE_ENV: test
      LOG_LEVEL: error

For secrets, use the Buildkite Elastic CI Stack's secret store or configure environment hooks:

# ~/.buildkite/hooks/environment
<span class="hljs-built_in">export DATABASE_URL=$(aws secretsmanager get-secret-value \
  --secret-id prod/database/url \
  --query SecretString \
  --output text)

Pipeline Step Dependencies

steps:
  - label: "Unit Tests"
    key: unit-tests
    command: npm run test:unit

  - label: "Build"
    key: build
    command: npm run build
    depends_on:
      - unit-tests  # only run after unit tests pass

  - label: "E2E Tests"
    depends_on:
      - build
    command: npx playwright test

  - wait: ~  # wait for all parallel steps above
  
  - label: "Deploy"
    command: npm run deploy
    branches: main

Soft Fails

Allow a step to fail without failing the entire build:

steps:
  - label: "Flaky E2E Tests"
    command: npx playwright test
    soft_fail: true  # build continues even if this fails

  - label: "Deploy"
    command: npm run deploy

Or soft-fail on specific exit codes:

steps:
  - label: "Linting"
    command: npm run lint
    soft_fail:
      - exit_status: 1

Agent Tags for Specialized Hardware

Route steps to specific agent types:

steps:
  - label: "Mac E2E Tests"
    command: npx playwright test --project=safari
    agents:
      os: macOS

  - label: "GPU Tests"
    command: python test_ml_model.py
    agents:
      gpu: "true"
      
  - label: "High Memory Tests"
    command: npm run test:large-dataset
    agents:
      memory: high

Configure agent tags when starting the agent:

buildkite-agent start --tags "os=macOS,memory=high"

Buildkite Secrets with AWS

The official Buildkite Elastic CI Stack uses AWS S3 for secret storage:

steps:
  - label: "Tests"
    command: |
      # Secrets are automatically available via the environment hook
      npm run test:integration

The environment hook fetches secrets from S3 and exports them before each step runs.

Conclusion

Buildkite's agent architecture means your builds run on infrastructure you control — your network, your hardware, your security boundaries. The pipeline YAML is clean and the parallel test execution with built-in sharding variables makes distributing tests across agents straightforward.

Test Analytics is the standout feature for larger teams. Getting visibility into which tests are flaky, which are slow, and how test health trends over time is valuable for maintaining a reliable test suite.

Start with a basic pipeline, add the Docker plugin for environment isolation, configure parallelism for your largest test suites, and set up Test Analytics to gain visibility. The agent model requires more upfront infrastructure setup than fully-hosted CI, but the control and performance benefits pay off at scale.

Read more