Bazel for Testing at Scale: Hermetic Tests, Remote Execution, and Caching
Bazel is Google's open-source build system, designed from the ground up for very large monorepos. It powers testing at companies where a single repository contains millions of lines of code across dozens of languages. Its approach to testing is fundamentally different from other tools: Bazel treats tests as build targets with declared inputs and outputs, enforces hermeticity through sandboxing, and enables remote execution across a cluster of machines.
This guide covers Bazel's test model, how hermetic tests work, and how to use remote execution and caching.
The Bazel Test Model
In Bazel, tests are declared in BUILD files alongside the code they test:
# lib/calculator/BUILD
load("@rules_python//python:defs.bzl", "py_library", "py_test")
py_library(
name = "calculator",
srcs = ["calculator.py"],
visibility = ["//visibility:public"],
)
py_test(
name = "calculator_test",
srcs = ["calculator_test.py"],
deps = [":calculator"],
)Running the test:
bazel test //lib/calculator:calculator_testBazel resolves the full dependency graph, builds everything the test needs, runs it in a sandbox, and caches the result. The next time you run the same test with no file changes, it's a cache hit.
Hermeticity and Sandboxing
The most important property of Bazel tests is hermeticity — a test can only access what it explicitly declares as a dependency. Bazel achieves this through sandboxing:
- Test processes run in an isolated directory
- Only files listed in
deps,data, orsrcsare available - Network access is blocked by default on supported platforms
- Environment variables are controlled and stripped of accidentals
A test that reads a file it didn't declare will fail — even if that file exists on the machine. This sounds strict, but it eliminates an entire class of CI failures caused by environmental assumptions.
# Correct: data files declared explicitly
py_test(
name = "config_test",
srcs = ["config_test.py"],
data = ["//testdata:sample_config.json"],
deps = [":config_reader"],
)If config_test.py tries to read a file that isn't listed in data, the test fails with a file-not-found error — in CI and locally. This enforces that tests are reproducible everywhere.
Running Tests
# Run a single test
bazel <span class="hljs-built_in">test //lib/calculator:calculator_test
<span class="hljs-comment"># Run all tests under a package
bazel <span class="hljs-built_in">test //lib/...
<span class="hljs-comment"># Run all tests in the entire repo
bazel <span class="hljs-built_in">test //...
<span class="hljs-comment"># Run with verbose output
bazel <span class="hljs-built_in">test //lib/calculator:calculator_test --test_output=all
<span class="hljs-comment"># Stream test output while running
bazel <span class="hljs-built_in">test //lib/... --test_output=streamed
<span class="hljs-comment"># Run only tests tagged as "small" (fast unit tests)
bazel <span class="hljs-built_in">test //... --test_tag_filters=small
<span class="hljs-comment"># Exclude slow tests
bazel <span class="hljs-built_in">test //... --test_tag_filters=-slow,-integrationTest Sizes and Tags
Bazel classifies tests by size: small, medium, large, enormous. Size implies a timeout:
| Size | Timeout | Intended use |
|---|---|---|
| small | 60s | Unit tests |
| medium | 300s | Integration tests |
| large | 900s | System tests |
| enormous | 3600s | End-to-end tests |
Declare size in the test target:
py_test(
name = "unit_test",
srcs = ["unit_test.py"],
size = "small",
tags = ["unit"],
deps = [":calculator"],
)
py_test(
name = "integration_test",
srcs = ["integration_test.py"],
size = "medium",
tags = ["integration", "requires-network"],
deps = [":calculator"],
)Use tags to control which tests run in which contexts. In CI, run all tests. Locally, skip slow ones:
bazel test //... --test_tag_filters=-slow,-requires-networkCaching
Bazel's local cache stores test results keyed by the full input hash. If inputs haven't changed, the cached result is replayed:
INFO: Analyzed 47 targets (0 packages loaded, 0 targets configured).
INFO: Found 12 test targets...
//lib/calculator:calculator_test (cached) PASSED in 0.1s
//lib/parser:parser_test (cached) PASSED in 0.1s
//app/server:server_test PASSED in 3.2sThe cache is stored in ~/.cache/bazel by default.
Remote Cache
For teams, configure a shared remote cache. Bazel supports HTTP, gRPC, and cloud storage backends:
# Using an HTTP remote cache
bazel <span class="hljs-built_in">test //... \
--remote_cache=http://cache.mycompany.com:9090
<span class="hljs-comment"># Using Google Cloud Storage
bazel <span class="hljs-built_in">test //... \
--remote_cache=https://storage.googleapis.com/my-bazel-cacheConfigure it permanently in .bazelrc:
# .bazelrc
build:ci --remote_cache=grpc://cache.mycompany.com:9090
build:ci --remote_cache_compression=true
build:ci --jobs=200
test:ci --config=ciApply the CI config in your CI pipeline:
bazel test //... --config=ciRemote Execution
Remote execution (RBE) goes beyond caching — it executes build and test actions on a remote cluster. This enables:
- Parallelism far beyond what one machine can provide
- Isolation from local machine differences
- Consistent environments (no "works on my machine" issues)
# Connect to a remote execution service
bazel <span class="hljs-built_in">test //... \
--remote_executor=grpc://rbe.mycompany.com:9090 \
--remote_cache=grpc://rbe.mycompany.com:9090 \
--<span class="hljs-built_in">jobs=500With RBE and 500 workers, a test suite that takes 30 minutes sequentially can complete in under 2 minutes.
Setting Up RBE
Popular RBE backends:
- Google Cloud Build (BuildBazel remote execution API)
- EngFlow — commercial RBE service
- BuildBuddy — open-source and commercial option
- self-hosted using Bazel's open Remote Execution API
For most teams, a managed RBE service is preferable to self-hosting. BuildBuddy has a free tier and provides a UI for build and test results.
JavaScript and TypeScript Tests with Bazel
Bazel supports JavaScript/TypeScript through rules_js:
# apps/web/BUILD
load("@aspect_rules_js//js:defs.bzl", "js_test")
load("@aspect_rules_ts//ts:defs.bzl", "ts_project")
ts_project(
name = "lib",
srcs = glob(["src/**/*.ts"]),
declaration = True,
)
js_test(
name = "lib_test",
entry_point = "lib.test.js",
data = [":lib"],
node_modules = "//:node_modules",
)For Jest specifically, use @aspect_rules_jest:
load("@aspect_rules_jest//jest:defs.bzl", "jest_test")
jest_test(
name = "jest",
config = "jest.config.js",
data = [
":lib",
"//node_modules/jest-circus",
],
snapshots = True,
)Sharding Tests
For test targets with many test cases, Bazel supports test sharding to split work across multiple workers:
py_test(
name = "big_test_suite",
srcs = ["big_test_suite.py"],
shard_count = 4,
size = "large",
)Bazel spawns 4 workers, assigns each a shard via the TEST_SHARD_INDEX and TEST_TOTAL_SHARDS environment variables, and your test runner distributes test cases. Jest and pytest both support this protocol.
CI Configuration
A GitHub Actions workflow using Bazel:
name: CI
on:
pull_request:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Bazel
uses: bazel-contrib/setup-bazel@0.8.5
with:
bazelisk-cache: true
disk-cache: ${{ github.workflow }}
repository-cache: true
- name: Run tests
run: |
bazel test //... \
--config=ci \
--remote_cache=${{ secrets.BAZEL_CACHE_URL }} \
--google_credentials=${{ secrets.GOOGLE_CREDENTIALS }} \
--test_output=errorsThe setup-bazel action configures Bazelisk and sets up GitHub Actions cache for the Bazel repository cache, dramatically reducing download times.
When to Use Bazel
Bazel has a steep learning curve. It's worth the investment when:
- Your monorepo has 50+ packages and sequential testing is painfully slow
- You need reproducible builds across languages (Go, Python, Java, TypeScript in one repo)
- You're investing in remote execution infrastructure for the team
- Test flakiness from environmental differences is a recurring problem
For smaller monorepos, Nx or Turborepo with remote caching provides 80% of the benefit with 10% of the setup complexity.
Key Takeaways
- Bazel tests are build targets with declared inputs — undeclared dependencies cause failures at build time, not runtime
- Sandboxing enforces hermeticity: tests can only see what they explicitly declare
- Remote caching shares test results across the team and CI — the same hash = the same result
- Remote execution distributes test work across a cluster, enabling massive parallelism
- Test sizes (
small,medium,large) control timeouts and resource allocation - Use tags to filter which tests run in which contexts