Bazel for Testing at Scale: Hermetic Tests, Remote Execution, and Caching

Bazel for Testing at Scale: Hermetic Tests, Remote Execution, and Caching

Bazel is Google's open-source build system, designed from the ground up for very large monorepos. It powers testing at companies where a single repository contains millions of lines of code across dozens of languages. Its approach to testing is fundamentally different from other tools: Bazel treats tests as build targets with declared inputs and outputs, enforces hermeticity through sandboxing, and enables remote execution across a cluster of machines.

This guide covers Bazel's test model, how hermetic tests work, and how to use remote execution and caching.

The Bazel Test Model

In Bazel, tests are declared in BUILD files alongside the code they test:

# lib/calculator/BUILD

load("@rules_python//python:defs.bzl", "py_library", "py_test")

py_library(
    name = "calculator",
    srcs = ["calculator.py"],
    visibility = ["//visibility:public"],
)

py_test(
    name = "calculator_test",
    srcs = ["calculator_test.py"],
    deps = [":calculator"],
)

Running the test:

bazel test //lib/calculator:calculator_test

Bazel resolves the full dependency graph, builds everything the test needs, runs it in a sandbox, and caches the result. The next time you run the same test with no file changes, it's a cache hit.

Hermeticity and Sandboxing

The most important property of Bazel tests is hermeticity — a test can only access what it explicitly declares as a dependency. Bazel achieves this through sandboxing:

  • Test processes run in an isolated directory
  • Only files listed in deps, data, or srcs are available
  • Network access is blocked by default on supported platforms
  • Environment variables are controlled and stripped of accidentals

A test that reads a file it didn't declare will fail — even if that file exists on the machine. This sounds strict, but it eliminates an entire class of CI failures caused by environmental assumptions.

# Correct: data files declared explicitly
py_test(
    name = "config_test",
    srcs = ["config_test.py"],
    data = ["//testdata:sample_config.json"],
    deps = [":config_reader"],
)

If config_test.py tries to read a file that isn't listed in data, the test fails with a file-not-found error — in CI and locally. This enforces that tests are reproducible everywhere.

Running Tests

# Run a single test
bazel <span class="hljs-built_in">test //lib/calculator:calculator_test

<span class="hljs-comment"># Run all tests under a package
bazel <span class="hljs-built_in">test //lib/...

<span class="hljs-comment"># Run all tests in the entire repo
bazel <span class="hljs-built_in">test //...

<span class="hljs-comment"># Run with verbose output
bazel <span class="hljs-built_in">test //lib/calculator:calculator_test --test_output=all

<span class="hljs-comment"># Stream test output while running
bazel <span class="hljs-built_in">test //lib/... --test_output=streamed

<span class="hljs-comment"># Run only tests tagged as "small" (fast unit tests)
bazel <span class="hljs-built_in">test //... --test_tag_filters=small

<span class="hljs-comment"># Exclude slow tests
bazel <span class="hljs-built_in">test //... --test_tag_filters=-slow,-integration

Test Sizes and Tags

Bazel classifies tests by size: small, medium, large, enormous. Size implies a timeout:

Size Timeout Intended use
small 60s Unit tests
medium 300s Integration tests
large 900s System tests
enormous 3600s End-to-end tests

Declare size in the test target:

py_test(
    name = "unit_test",
    srcs = ["unit_test.py"],
    size = "small",
    tags = ["unit"],
    deps = [":calculator"],
)

py_test(
    name = "integration_test",
    srcs = ["integration_test.py"],
    size = "medium",
    tags = ["integration", "requires-network"],
    deps = [":calculator"],
)

Use tags to control which tests run in which contexts. In CI, run all tests. Locally, skip slow ones:

bazel test //... --test_tag_filters=-slow,-requires-network

Caching

Bazel's local cache stores test results keyed by the full input hash. If inputs haven't changed, the cached result is replayed:

INFO: Analyzed 47 targets (0 packages loaded, 0 targets configured).
INFO: Found 12 test targets...
//lib/calculator:calculator_test             (cached) PASSED in 0.1s
//lib/parser:parser_test                     (cached) PASSED in 0.1s
//app/server:server_test                     PASSED in 3.2s

The cache is stored in ~/.cache/bazel by default.

Remote Cache

For teams, configure a shared remote cache. Bazel supports HTTP, gRPC, and cloud storage backends:

# Using an HTTP remote cache
bazel <span class="hljs-built_in">test //... \
  --remote_cache=http://cache.mycompany.com:9090

<span class="hljs-comment"># Using Google Cloud Storage
bazel <span class="hljs-built_in">test //... \
  --remote_cache=https://storage.googleapis.com/my-bazel-cache

Configure it permanently in .bazelrc:

# .bazelrc
build:ci --remote_cache=grpc://cache.mycompany.com:9090
build:ci --remote_cache_compression=true
build:ci --jobs=200

test:ci --config=ci

Apply the CI config in your CI pipeline:

bazel test //... --config=ci

Remote Execution

Remote execution (RBE) goes beyond caching — it executes build and test actions on a remote cluster. This enables:

  • Parallelism far beyond what one machine can provide
  • Isolation from local machine differences
  • Consistent environments (no "works on my machine" issues)
# Connect to a remote execution service
bazel <span class="hljs-built_in">test //... \
  --remote_executor=grpc://rbe.mycompany.com:9090 \
  --remote_cache=grpc://rbe.mycompany.com:9090 \
  --<span class="hljs-built_in">jobs=500

With RBE and 500 workers, a test suite that takes 30 minutes sequentially can complete in under 2 minutes.

Setting Up RBE

Popular RBE backends:

  • Google Cloud Build (BuildBazel remote execution API)
  • EngFlow — commercial RBE service
  • BuildBuddy — open-source and commercial option
  • self-hosted using Bazel's open Remote Execution API

For most teams, a managed RBE service is preferable to self-hosting. BuildBuddy has a free tier and provides a UI for build and test results.

JavaScript and TypeScript Tests with Bazel

Bazel supports JavaScript/TypeScript through rules_js:

# apps/web/BUILD

load("@aspect_rules_js//js:defs.bzl", "js_test")
load("@aspect_rules_ts//ts:defs.bzl", "ts_project")

ts_project(
    name = "lib",
    srcs = glob(["src/**/*.ts"]),
    declaration = True,
)

js_test(
    name = "lib_test",
    entry_point = "lib.test.js",
    data = [":lib"],
    node_modules = "//:node_modules",
)

For Jest specifically, use @aspect_rules_jest:

load("@aspect_rules_jest//jest:defs.bzl", "jest_test")

jest_test(
    name = "jest",
    config = "jest.config.js",
    data = [
        ":lib",
        "//node_modules/jest-circus",
    ],
    snapshots = True,
)

Sharding Tests

For test targets with many test cases, Bazel supports test sharding to split work across multiple workers:

py_test(
    name = "big_test_suite",
    srcs = ["big_test_suite.py"],
    shard_count = 4,
    size = "large",
)

Bazel spawns 4 workers, assigns each a shard via the TEST_SHARD_INDEX and TEST_TOTAL_SHARDS environment variables, and your test runner distributes test cases. Jest and pytest both support this protocol.

CI Configuration

A GitHub Actions workflow using Bazel:

name: CI

on:
  pull_request:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Bazel
        uses: bazel-contrib/setup-bazel@0.8.5
        with:
          bazelisk-cache: true
          disk-cache: ${{ github.workflow }}
          repository-cache: true

      - name: Run tests
        run: |
          bazel test //... \
            --config=ci \
            --remote_cache=${{ secrets.BAZEL_CACHE_URL }} \
            --google_credentials=${{ secrets.GOOGLE_CREDENTIALS }} \
            --test_output=errors

The setup-bazel action configures Bazelisk and sets up GitHub Actions cache for the Bazel repository cache, dramatically reducing download times.

When to Use Bazel

Bazel has a steep learning curve. It's worth the investment when:

  • Your monorepo has 50+ packages and sequential testing is painfully slow
  • You need reproducible builds across languages (Go, Python, Java, TypeScript in one repo)
  • You're investing in remote execution infrastructure for the team
  • Test flakiness from environmental differences is a recurring problem

For smaller monorepos, Nx or Turborepo with remote caching provides 80% of the benefit with 10% of the setup complexity.

Key Takeaways

  • Bazel tests are build targets with declared inputs — undeclared dependencies cause failures at build time, not runtime
  • Sandboxing enforces hermeticity: tests can only see what they explicitly declare
  • Remote caching shares test results across the team and CI — the same hash = the same result
  • Remote execution distributes test work across a cluster, enabling massive parallelism
  • Test sizes (small, medium, large) control timeouts and resource allocation
  • Use tags to filter which tests run in which contexts

Read more