Testing Bulkhead Patterns and Circuit Breakers with Resilience4j
Testing bulkhead patterns and circuit breakers is something most teams skip until an outage forces the conversation. The patterns themselves — isolating thread pools so one slow dependency can't starve another, tripping a circuit breaker so you stop hammering a dead service — are well understood conceptually. But verifying that they actually work in your system under realistic failure conditions requires deliberate test design. This guide covers how to write those tests in Java/Kotlin using Resilience4j and WireMock.
Why Resilience Patterns Need Their Own Tests
Resilience patterns are meta-behavior: they govern how your service behaves when dependencies misbehave. Unit tests cover the happy path. Integration tests cover normal request flows. Neither covers "what happens when the payment service is at 100% CPU and responding in 30 seconds?" Without explicit resilience tests, you only find out the answer in production, at 2 AM.
The failure mode is subtle: your service may appear healthy (it's accepting requests, returning 200s for cached or non-critical paths) while one slow dependency is quietly saturating your shared thread pool. Eventually, the thread pool is full, your healthy operations start queuing, and the service falls over entirely. The bulkhead pattern prevents this. But only if it's configured correctly — and only if you've tested that it is.
Resilience4j Core Concepts
Resilience4j is a lightweight fault-tolerance library for Java 8+. Unlike Hystrix (deprecated), it is purely functional and doesn't require any framework integration. Key components:
| Component | Purpose | Key Config |
|---|---|---|
| CircuitBreaker | Stop calling a failing service | failureRateThreshold, waitDurationInOpenState, slidingWindowSize |
| Bulkhead (Semaphore) | Limit concurrent calls | maxConcurrentCalls, maxWaitDuration |
| ThreadPoolBulkhead | Isolate with dedicated thread pool | maxThreadPoolSize, coreThreadPoolSize, queueCapacity |
| Retry | Retry failed calls with backoff | maxAttempts, waitDuration, exponentialBackoffMultiplier |
| RateLimiter | Limit call rate | limitForPeriod, limitRefreshPeriod, timeoutDuration |
Add the dependencies:
<!-- pom.xml -->
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-all</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>com.github.tomakehurst</groupId>
<artifactId>wiremock-jre8</artifactId>
<version>2.35.0</version>
<scope>test</scope>
</dependency>Testing the Semaphore Bulkhead
The semaphore bulkhead limits concurrent calls to a resource. When the limit is reached, additional calls are rejected immediately (or after a configurable wait) rather than blocking indefinitely.
// BulkheadServiceTest.java
import io.github.resilience4j.bulkhead.*;
import org.junit.jupiter.api.*;
import java.time.Duration;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;
public class BulkheadServiceTest {
private Bulkhead bulkhead;
@BeforeEach
void setUp() {
BulkheadConfig config = BulkheadConfig.custom()
.maxConcurrentCalls(3) // only 3 concurrent calls allowed
.maxWaitDuration(Duration.ofMillis(100)) // wait up to 100ms before rejecting
.build();
bulkhead = Bulkhead.of("payment-service", config);
}
@Test
void shouldAllowConcurrentCallsUpToLimit() throws InterruptedException {
CountDownLatch inFlightLatch = new CountDownLatch(3);
CountDownLatch releaseLatch = new CountDownLatch(1);
AtomicInteger successCount = new AtomicInteger(0);
ExecutorService executor = Executors.newFixedThreadPool(5);
// Start 3 long-running calls that hold the bulkhead permits
for (int i = 0; i < 3; i++) {
executor.submit(() -> {
Bulkhead.decorateRunnable(bulkhead, () -> {
inFlightLatch.countDown();
try {
releaseLatch.await(5, TimeUnit.SECONDS);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
successCount.incrementAndGet();
}).run();
});
}
// Wait until all 3 are in-flight
inFlightLatch.await(2, TimeUnit.SECONDS);
// Verify bulkhead metrics show 3 calls in flight
BulkheadMetrics metrics = bulkhead.getMetrics();
Assertions.assertEquals(3, metrics.getAvailableConcurrentCalls() == 0 ? 3 : 0,
"All permits should be consumed");
Assertions.assertEquals(0, metrics.getAvailableConcurrentCalls());
// Release the in-flight calls
releaseLatch.countDown();
executor.shutdown();
executor.awaitTermination(5, TimeUnit.SECONDS);
Assertions.assertEquals(3, successCount.get());
}
@Test
void shouldRejectCallsExceedingBulkheadLimit() throws InterruptedException {
CountDownLatch holdLatch = new CountDownLatch(1);
CountDownLatch startedLatch = new CountDownLatch(3);
AtomicInteger rejectedCount = new AtomicInteger(0);
ExecutorService executor = Executors.newFixedThreadPool(10);
// Saturate the bulkhead with 3 long-running calls
for (int i = 0; i < 3; i++) {
executor.submit(() -> {
Bulkhead.decorateRunnable(bulkhead, () -> {
startedLatch.countDown();
try { holdLatch.await(5, TimeUnit.SECONDS); }
catch (InterruptedException e) { Thread.currentThread().interrupt(); }
}).run();
});
}
startedLatch.await(2, TimeUnit.SECONDS);
// Now try 3 more calls — these should be rejected
for (int i = 0; i < 3; i++) {
try {
Bulkhead.decorateRunnable(bulkhead, () -> {}).run();
} catch (BulkheadFullException e) {
rejectedCount.incrementAndGet();
}
}
Assertions.assertEquals(3, rejectedCount.get(),
"Calls exceeding bulkhead limit should be rejected with BulkheadFullException");
holdLatch.countDown();
executor.shutdown();
}
}Testing Circuit Breaker State Transitions
The circuit breaker has three states: CLOSED (normal operation), OPEN (rejecting all calls), and HALF_OPEN (allowing a probe call to test if the service has recovered). Testing the state transitions is the most critical part of circuit breaker validation.
// CircuitBreakerStateTest.java
import com.github.tomakehurst.wiremock.WireMockServer;
import io.github.resilience4j.circuitbreaker.*;
import org.junit.jupiter.api.*;
import java.time.Duration;
import static com.github.tomakehurst.wiremock.client.WireMock.*;
public class CircuitBreakerStateTest {
private WireMockServer wireMock;
private CircuitBreaker circuitBreaker;
private PaymentServiceClient client;
@BeforeEach
void setUp() {
wireMock = new WireMockServer(8089);
wireMock.start();
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.failureRateThreshold(50) // trip at 50% failure rate
.slidingWindowSize(4) // evaluate last 4 calls
.waitDurationInOpenState(Duration.ofMillis(500)) // stay open for 500ms
.permittedNumberOfCallsInHalfOpenState(2)
.build();
circuitBreaker = CircuitBreaker.of("payment", config);
client = new PaymentServiceClient("http://localhost:8089", circuitBreaker);
}
@AfterEach
void tearDown() {
wireMock.stop();
}
@Test
void circuitBreakerShouldTransitionFromClosedToOpen() {
// Stub: 3 out of 4 calls fail (75% failure rate > 50% threshold)
wireMock.stubFor(post(urlEqualTo("/payments"))
.inScenario("circuit-breaker")
.whenScenarioStateIs("STARTED")
.willReturn(serverError())
.willSetStateTo("FAIL_2"));
wireMock.stubFor(post(urlEqualTo("/payments"))
.inScenario("circuit-breaker")
.whenScenarioStateIs("FAIL_2")
.willReturn(serverError())
.willSetStateTo("FAIL_3"));
wireMock.stubFor(post(urlEqualTo("/payments"))
.inScenario("circuit-breaker")
.whenScenarioStateIs("FAIL_3")
.willReturn(serverError())
.willSetStateTo("SUCCESS"));
wireMock.stubFor(post(urlEqualTo("/payments"))
.inScenario("circuit-breaker")
.whenScenarioStateIs("SUCCESS")
.willReturn(ok()));
// Make 4 calls through the circuit breaker
for (int i = 0; i < 4; i++) {
try { client.processPayment("100.00"); } catch (Exception ignored) {}
}
// Circuit breaker should now be OPEN
Assertions.assertEquals(CircuitBreaker.State.OPEN, circuitBreaker.getState(),
"Circuit breaker should be OPEN after 75% failure rate");
}
@Test
void openCircuitBreakerShouldRejectCallsImmediately() throws InterruptedException {
// Force the circuit breaker into OPEN state
circuitBreaker.transitionToOpenState();
wireMock.stubFor(post(urlEqualTo("/payments")).willReturn(ok()));
long start = System.currentTimeMillis();
Assertions.assertThrows(CallNotPermittedException.class,
() -> client.processPayment("100.00"));
long elapsed = System.currentTimeMillis() - start;
// Rejection should be immediate — no network call made
Assertions.assertTrue(elapsed < 50, "Open circuit should reject immediately, not wait");
wireMock.verify(0, postRequestedFor(urlEqualTo("/payments")));
}
@Test
void halfOpenCircuitBreakerShouldTransitionToClosedOnSuccess() throws InterruptedException {
circuitBreaker.transitionToOpenState();
Thread.sleep(600); // Wait for waitDurationInOpenState to expire
// CB automatically transitions to HALF_OPEN
Assertions.assertEquals(CircuitBreaker.State.HALF_OPEN, circuitBreaker.getState());
// Stub successful response for the probe calls
wireMock.stubFor(post(urlEqualTo("/payments")).willReturn(ok().withBody("{\"id\":\"123\"}")));
// Make the permitted probe calls (2 in our config)
client.processPayment("50.00");
client.processPayment("50.00");
// After 2 successful probe calls, CB should be CLOSED
Assertions.assertEquals(CircuitBreaker.State.CLOSED, circuitBreaker.getState(),
"Circuit breaker should close after successful probe calls");
}
}Testing Retry with Exponential Backoff
Retry logic needs to be tested for both the success case (eventual success after retries) and the failure case (exhausted retries with the correct exception propagation):
@Test
void retryShouldSucceedOnThirdAttempt() {
// First 2 calls fail, 3rd succeeds
wireMock.stubFor(get(urlEqualTo("/inventory/sku-123"))
.inScenario("retry")
.whenScenarioStateIs("STARTED")
.willReturn(serverError())
.willSetStateTo("FAIL_2"));
wireMock.stubFor(get(urlEqualTo("/inventory/sku-123"))
.inScenario("retry")
.whenScenarioStateIs("FAIL_2")
.willReturn(serverError())
.willSetStateTo("SUCCESS"));
wireMock.stubFor(get(urlEqualTo("/inventory/sku-123"))
.inScenario("retry")
.whenScenarioStateIs("SUCCESS")
.willReturn(okJson("{\"sku\":\"sku-123\",\"quantity\":10}")));
RetryConfig retryConfig = RetryConfig.custom()
.maxAttempts(3)
.waitDuration(Duration.ofMillis(10)) // short for tests
.build();
Retry retry = Retry.of("inventory", retryConfig);
String result = Retry.decorateCheckedSupplier(retry,
() -> inventoryClient.getStock("sku-123")).get();
Assertions.assertNotNull(result);
wireMock.verify(3, getRequestedFor(urlEqualTo("/inventory/sku-123")));
}
@Test
void retryShouldNotRetryOnClientErrors() {
// 400 Bad Request should NOT be retried
wireMock.stubFor(post(urlEqualTo("/orders"))
.willReturn(badRequest().withBody("{\"error\":\"invalid_item\"}")));
RetryConfig retryConfig = RetryConfig.custom()
.maxAttempts(3)
.retryOnException(e -> e instanceof ServerErrorException) // only retry 5xx
.build();
Retry retry = Retry.of("orders", retryConfig);
Assertions.assertThrows(ClientErrorException.class,
() -> Retry.decorateCheckedSupplier(retry,
() -> orderClient.createOrder(payload)).get());
// Should have called exactly once — no retries for client errors
wireMock.verify(1, postRequestedFor(urlEqualTo("/orders")));
}Measuring Isolation Effectiveness Under Load
Bulkhead testing should include a load test that proves isolation holds: a slow dependency in one bulkhead does not affect throughput on a different bulkhead.
@Test
void slowPaymentServiceShouldNotDegradeInventoryService() throws InterruptedException {
// Stub: payment service is slow (2 second response)
wireMock.stubFor(post(urlEqualTo("/payments"))
.willReturn(ok().withFixedDelay(2000)));
// Stub: inventory service is fast
wireMock.stubFor(get(urlPathMatching("/inventory/.*"))
.willReturn(okJson("{\"quantity\":10}")));
// Two separate thread pool bulkheads
ThreadPoolBulkhead paymentBulkhead = ThreadPoolBulkhead.of("payment",
ThreadPoolBulkheadConfig.custom()
.maxThreadPoolSize(2).coreThreadPoolSize(2).queueCapacity(5).build());
ThreadPoolBulkhead inventoryBulkhead = ThreadPoolBulkhead.of("inventory",
ThreadPoolBulkheadConfig.custom()
.maxThreadPoolSize(5).coreThreadPoolSize(5).queueCapacity(10).build());
long start = System.currentTimeMillis();
// Saturate the payment bulkhead (slow calls)
for (int i = 0; i < 10; i++) {
ThreadPoolBulkhead.decorateSupplier(paymentBulkhead,
() -> paymentClient.charge("10.00")).get();
}
// Inventory calls should complete quickly despite payment bulkhead being saturated
List<CompletableFuture<String>> inventoryFutures = new ArrayList<>();
for (int i = 0; i < 20; i++) {
inventoryFutures.add(
ThreadPoolBulkhead.decorateSupplier(inventoryBulkhead,
() -> inventoryClient.getStock("sku-" + i)).get()
.toCompletableFuture()
);
}
CompletableFuture.allOf(inventoryFutures.toArray(new CompletableFuture[0])).join();
long inventoryElapsed = System.currentTimeMillis() - start;
// All 20 inventory calls should complete in well under 2 seconds
Assertions.assertTrue(inventoryElapsed < 1500,
"Inventory calls took " + inventoryElapsed + "ms — payment bulkhead leaked");
}Configuration Validation Tests
Don't forget to test that your Resilience4j configuration is actually applied correctly. Configuration bugs (wrong threshold, wrong window size) are silent and only manifest under specific load patterns.
@Test
void circuitBreakerConfigShouldMatchProductionSpec() {
CircuitBreakerRegistry registry = applicationContext.getBean(CircuitBreakerRegistry.class);
CircuitBreaker cb = registry.circuitBreaker("payment-service");
CircuitBreakerConfig config = cb.getCircuitBreakerConfig();
// Assert against your architecture decision records
Assertions.assertEquals(50.0f, config.getFailureRateThreshold(),
"Payment CB failure threshold should be 50%");
Assertions.assertEquals(10, config.getSlidingWindowSize(),
"Payment CB sliding window should be 10 calls");
Assertions.assertEquals(Duration.ofSeconds(30), config.getWaitDurationInOpenState(),
"Payment CB open wait should be 30 seconds in production");
}Key Testing Checklist
| Test Case | What to Verify |
|---|---|
| Bulkhead at capacity | Extra calls get BulkheadFullException, not a hang |
| CB CLOSED → OPEN | Correct failure threshold triggers the transition |
| CB OPEN → rejects | No network calls made while OPEN |
| CB HALF_OPEN probe | Correct number of probe calls permitted |
| CB HALF_OPEN → CLOSED | Successful probes close the breaker |
| CB HALF_OPEN → OPEN | Failed probes re-open the breaker |
| Retry success | Eventual success after transient failures |
| Retry exhaustion | Correct exception propagated, correct call count |
| Retry no-retry cases | Client errors (4xx) not retried |
| Thread pool isolation | Slow bulkhead doesn't degrade separate bulkhead |
Resilience patterns are only as good as the tests that verify them. By combining WireMock for dependency simulation with Resilience4j's built-in metrics and state inspection APIs, you can write deterministic tests for every state transition and failure mode — before they happen in production.