Testing Bulkhead Patterns and Circuit Breakers with Resilience4j

Testing Bulkhead Patterns and Circuit Breakers with Resilience4j

Testing bulkhead patterns and circuit breakers is something most teams skip until an outage forces the conversation. The patterns themselves — isolating thread pools so one slow dependency can't starve another, tripping a circuit breaker so you stop hammering a dead service — are well understood conceptually. But verifying that they actually work in your system under realistic failure conditions requires deliberate test design. This guide covers how to write those tests in Java/Kotlin using Resilience4j and WireMock.

Why Resilience Patterns Need Their Own Tests

Resilience patterns are meta-behavior: they govern how your service behaves when dependencies misbehave. Unit tests cover the happy path. Integration tests cover normal request flows. Neither covers "what happens when the payment service is at 100% CPU and responding in 30 seconds?" Without explicit resilience tests, you only find out the answer in production, at 2 AM.

The failure mode is subtle: your service may appear healthy (it's accepting requests, returning 200s for cached or non-critical paths) while one slow dependency is quietly saturating your shared thread pool. Eventually, the thread pool is full, your healthy operations start queuing, and the service falls over entirely. The bulkhead pattern prevents this. But only if it's configured correctly — and only if you've tested that it is.

Resilience4j Core Concepts

Resilience4j is a lightweight fault-tolerance library for Java 8+. Unlike Hystrix (deprecated), it is purely functional and doesn't require any framework integration. Key components:

Component Purpose Key Config
CircuitBreaker Stop calling a failing service failureRateThreshold, waitDurationInOpenState, slidingWindowSize
Bulkhead (Semaphore) Limit concurrent calls maxConcurrentCalls, maxWaitDuration
ThreadPoolBulkhead Isolate with dedicated thread pool maxThreadPoolSize, coreThreadPoolSize, queueCapacity
Retry Retry failed calls with backoff maxAttempts, waitDuration, exponentialBackoffMultiplier
RateLimiter Limit call rate limitForPeriod, limitRefreshPeriod, timeoutDuration

Add the dependencies:

<!-- pom.xml -->
<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-all</artifactId>
    <version>2.2.0</version>
</dependency>
<dependency>
    <groupId>com.github.tomakehurst</groupId>
    <artifactId>wiremock-jre8</artifactId>
    <version>2.35.0</version>
    <scope>test</scope>
</dependency>

Testing the Semaphore Bulkhead

The semaphore bulkhead limits concurrent calls to a resource. When the limit is reached, additional calls are rejected immediately (or after a configurable wait) rather than blocking indefinitely.

// BulkheadServiceTest.java
import io.github.resilience4j.bulkhead.*;
import org.junit.jupiter.api.*;
import java.time.Duration;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;

public class BulkheadServiceTest {

    private Bulkhead bulkhead;

    @BeforeEach
    void setUp() {
        BulkheadConfig config = BulkheadConfig.custom()
            .maxConcurrentCalls(3)          // only 3 concurrent calls allowed
            .maxWaitDuration(Duration.ofMillis(100)) // wait up to 100ms before rejecting
            .build();
        bulkhead = Bulkhead.of("payment-service", config);
    }

    @Test
    void shouldAllowConcurrentCallsUpToLimit() throws InterruptedException {
        CountDownLatch inFlightLatch = new CountDownLatch(3);
        CountDownLatch releaseLatch = new CountDownLatch(1);
        AtomicInteger successCount = new AtomicInteger(0);
        ExecutorService executor = Executors.newFixedThreadPool(5);

        // Start 3 long-running calls that hold the bulkhead permits
        for (int i = 0; i < 3; i++) {
            executor.submit(() -> {
                Bulkhead.decorateRunnable(bulkhead, () -> {
                    inFlightLatch.countDown();
                    try {
                        releaseLatch.await(5, TimeUnit.SECONDS);
                    } catch (InterruptedException e) {
                        Thread.currentThread().interrupt();
                    }
                    successCount.incrementAndGet();
                }).run();
            });
        }

        // Wait until all 3 are in-flight
        inFlightLatch.await(2, TimeUnit.SECONDS);

        // Verify bulkhead metrics show 3 calls in flight
        BulkheadMetrics metrics = bulkhead.getMetrics();
        Assertions.assertEquals(3, metrics.getAvailableConcurrentCalls() == 0 ? 3 : 0,
            "All permits should be consumed");
        Assertions.assertEquals(0, metrics.getAvailableConcurrentCalls());

        // Release the in-flight calls
        releaseLatch.countDown();
        executor.shutdown();
        executor.awaitTermination(5, TimeUnit.SECONDS);
        Assertions.assertEquals(3, successCount.get());
    }

    @Test
    void shouldRejectCallsExceedingBulkheadLimit() throws InterruptedException {
        CountDownLatch holdLatch = new CountDownLatch(1);
        CountDownLatch startedLatch = new CountDownLatch(3);
        AtomicInteger rejectedCount = new AtomicInteger(0);
        ExecutorService executor = Executors.newFixedThreadPool(10);

        // Saturate the bulkhead with 3 long-running calls
        for (int i = 0; i < 3; i++) {
            executor.submit(() -> {
                Bulkhead.decorateRunnable(bulkhead, () -> {
                    startedLatch.countDown();
                    try { holdLatch.await(5, TimeUnit.SECONDS); }
                    catch (InterruptedException e) { Thread.currentThread().interrupt(); }
                }).run();
            });
        }
        startedLatch.await(2, TimeUnit.SECONDS);

        // Now try 3 more calls — these should be rejected
        for (int i = 0; i < 3; i++) {
            try {
                Bulkhead.decorateRunnable(bulkhead, () -> {}).run();
            } catch (BulkheadFullException e) {
                rejectedCount.incrementAndGet();
            }
        }

        Assertions.assertEquals(3, rejectedCount.get(),
            "Calls exceeding bulkhead limit should be rejected with BulkheadFullException");

        holdLatch.countDown();
        executor.shutdown();
    }
}

Testing Circuit Breaker State Transitions

The circuit breaker has three states: CLOSED (normal operation), OPEN (rejecting all calls), and HALF_OPEN (allowing a probe call to test if the service has recovered). Testing the state transitions is the most critical part of circuit breaker validation.

// CircuitBreakerStateTest.java
import com.github.tomakehurst.wiremock.WireMockServer;
import io.github.resilience4j.circuitbreaker.*;
import org.junit.jupiter.api.*;
import java.time.Duration;

import static com.github.tomakehurst.wiremock.client.WireMock.*;

public class CircuitBreakerStateTest {

    private WireMockServer wireMock;
    private CircuitBreaker circuitBreaker;
    private PaymentServiceClient client;

    @BeforeEach
    void setUp() {
        wireMock = new WireMockServer(8089);
        wireMock.start();

        CircuitBreakerConfig config = CircuitBreakerConfig.custom()
            .failureRateThreshold(50)               // trip at 50% failure rate
            .slidingWindowSize(4)                   // evaluate last 4 calls
            .waitDurationInOpenState(Duration.ofMillis(500)) // stay open for 500ms
            .permittedNumberOfCallsInHalfOpenState(2)
            .build();

        circuitBreaker = CircuitBreaker.of("payment", config);
        client = new PaymentServiceClient("http://localhost:8089", circuitBreaker);
    }

    @AfterEach
    void tearDown() {
        wireMock.stop();
    }

    @Test
    void circuitBreakerShouldTransitionFromClosedToOpen() {
        // Stub: 3 out of 4 calls fail (75% failure rate > 50% threshold)
        wireMock.stubFor(post(urlEqualTo("/payments"))
            .inScenario("circuit-breaker")
            .whenScenarioStateIs("STARTED")
            .willReturn(serverError())
            .willSetStateTo("FAIL_2"));

        wireMock.stubFor(post(urlEqualTo("/payments"))
            .inScenario("circuit-breaker")
            .whenScenarioStateIs("FAIL_2")
            .willReturn(serverError())
            .willSetStateTo("FAIL_3"));

        wireMock.stubFor(post(urlEqualTo("/payments"))
            .inScenario("circuit-breaker")
            .whenScenarioStateIs("FAIL_3")
            .willReturn(serverError())
            .willSetStateTo("SUCCESS"));

        wireMock.stubFor(post(urlEqualTo("/payments"))
            .inScenario("circuit-breaker")
            .whenScenarioStateIs("SUCCESS")
            .willReturn(ok()));

        // Make 4 calls through the circuit breaker
        for (int i = 0; i < 4; i++) {
            try { client.processPayment("100.00"); } catch (Exception ignored) {}
        }

        // Circuit breaker should now be OPEN
        Assertions.assertEquals(CircuitBreaker.State.OPEN, circuitBreaker.getState(),
            "Circuit breaker should be OPEN after 75% failure rate");
    }

    @Test
    void openCircuitBreakerShouldRejectCallsImmediately() throws InterruptedException {
        // Force the circuit breaker into OPEN state
        circuitBreaker.transitionToOpenState();

        wireMock.stubFor(post(urlEqualTo("/payments")).willReturn(ok()));

        long start = System.currentTimeMillis();
        Assertions.assertThrows(CallNotPermittedException.class,
            () -> client.processPayment("100.00"));
        long elapsed = System.currentTimeMillis() - start;

        // Rejection should be immediate — no network call made
        Assertions.assertTrue(elapsed < 50, "Open circuit should reject immediately, not wait");
        wireMock.verify(0, postRequestedFor(urlEqualTo("/payments")));
    }

    @Test
    void halfOpenCircuitBreakerShouldTransitionToClosedOnSuccess() throws InterruptedException {
        circuitBreaker.transitionToOpenState();
        Thread.sleep(600); // Wait for waitDurationInOpenState to expire
        // CB automatically transitions to HALF_OPEN

        Assertions.assertEquals(CircuitBreaker.State.HALF_OPEN, circuitBreaker.getState());

        // Stub successful response for the probe calls
        wireMock.stubFor(post(urlEqualTo("/payments")).willReturn(ok().withBody("{\"id\":\"123\"}")));

        // Make the permitted probe calls (2 in our config)
        client.processPayment("50.00");
        client.processPayment("50.00");

        // After 2 successful probe calls, CB should be CLOSED
        Assertions.assertEquals(CircuitBreaker.State.CLOSED, circuitBreaker.getState(),
            "Circuit breaker should close after successful probe calls");
    }
}

Testing Retry with Exponential Backoff

Retry logic needs to be tested for both the success case (eventual success after retries) and the failure case (exhausted retries with the correct exception propagation):

@Test
void retryShouldSucceedOnThirdAttempt() {
    // First 2 calls fail, 3rd succeeds
    wireMock.stubFor(get(urlEqualTo("/inventory/sku-123"))
        .inScenario("retry")
        .whenScenarioStateIs("STARTED")
        .willReturn(serverError())
        .willSetStateTo("FAIL_2"));

    wireMock.stubFor(get(urlEqualTo("/inventory/sku-123"))
        .inScenario("retry")
        .whenScenarioStateIs("FAIL_2")
        .willReturn(serverError())
        .willSetStateTo("SUCCESS"));

    wireMock.stubFor(get(urlEqualTo("/inventory/sku-123"))
        .inScenario("retry")
        .whenScenarioStateIs("SUCCESS")
        .willReturn(okJson("{\"sku\":\"sku-123\",\"quantity\":10}")));

    RetryConfig retryConfig = RetryConfig.custom()
        .maxAttempts(3)
        .waitDuration(Duration.ofMillis(10)) // short for tests
        .build();
    Retry retry = Retry.of("inventory", retryConfig);

    String result = Retry.decorateCheckedSupplier(retry,
        () -> inventoryClient.getStock("sku-123")).get();

    Assertions.assertNotNull(result);
    wireMock.verify(3, getRequestedFor(urlEqualTo("/inventory/sku-123")));
}

@Test
void retryShouldNotRetryOnClientErrors() {
    // 400 Bad Request should NOT be retried
    wireMock.stubFor(post(urlEqualTo("/orders"))
        .willReturn(badRequest().withBody("{\"error\":\"invalid_item\"}")));

    RetryConfig retryConfig = RetryConfig.custom()
        .maxAttempts(3)
        .retryOnException(e -> e instanceof ServerErrorException) // only retry 5xx
        .build();
    Retry retry = Retry.of("orders", retryConfig);

    Assertions.assertThrows(ClientErrorException.class,
        () -> Retry.decorateCheckedSupplier(retry,
            () -> orderClient.createOrder(payload)).get());

    // Should have called exactly once — no retries for client errors
    wireMock.verify(1, postRequestedFor(urlEqualTo("/orders")));
}

Measuring Isolation Effectiveness Under Load

Bulkhead testing should include a load test that proves isolation holds: a slow dependency in one bulkhead does not affect throughput on a different bulkhead.

@Test
void slowPaymentServiceShouldNotDegradeInventoryService() throws InterruptedException {
    // Stub: payment service is slow (2 second response)
    wireMock.stubFor(post(urlEqualTo("/payments"))
        .willReturn(ok().withFixedDelay(2000)));

    // Stub: inventory service is fast
    wireMock.stubFor(get(urlPathMatching("/inventory/.*"))
        .willReturn(okJson("{\"quantity\":10}")));

    // Two separate thread pool bulkheads
    ThreadPoolBulkhead paymentBulkhead = ThreadPoolBulkhead.of("payment",
        ThreadPoolBulkheadConfig.custom()
            .maxThreadPoolSize(2).coreThreadPoolSize(2).queueCapacity(5).build());

    ThreadPoolBulkhead inventoryBulkhead = ThreadPoolBulkhead.of("inventory",
        ThreadPoolBulkheadConfig.custom()
            .maxThreadPoolSize(5).coreThreadPoolSize(5).queueCapacity(10).build());

    long start = System.currentTimeMillis();

    // Saturate the payment bulkhead (slow calls)
    for (int i = 0; i < 10; i++) {
        ThreadPoolBulkhead.decorateSupplier(paymentBulkhead,
            () -> paymentClient.charge("10.00")).get();
    }

    // Inventory calls should complete quickly despite payment bulkhead being saturated
    List<CompletableFuture<String>> inventoryFutures = new ArrayList<>();
    for (int i = 0; i < 20; i++) {
        inventoryFutures.add(
            ThreadPoolBulkhead.decorateSupplier(inventoryBulkhead,
                () -> inventoryClient.getStock("sku-" + i)).get()
                .toCompletableFuture()
        );
    }

    CompletableFuture.allOf(inventoryFutures.toArray(new CompletableFuture[0])).join();
    long inventoryElapsed = System.currentTimeMillis() - start;

    // All 20 inventory calls should complete in well under 2 seconds
    Assertions.assertTrue(inventoryElapsed < 1500,
        "Inventory calls took " + inventoryElapsed + "ms — payment bulkhead leaked");
}

Configuration Validation Tests

Don't forget to test that your Resilience4j configuration is actually applied correctly. Configuration bugs (wrong threshold, wrong window size) are silent and only manifest under specific load patterns.

@Test
void circuitBreakerConfigShouldMatchProductionSpec() {
    CircuitBreakerRegistry registry = applicationContext.getBean(CircuitBreakerRegistry.class);
    CircuitBreaker cb = registry.circuitBreaker("payment-service");
    CircuitBreakerConfig config = cb.getCircuitBreakerConfig();

    // Assert against your architecture decision records
    Assertions.assertEquals(50.0f, config.getFailureRateThreshold(),
        "Payment CB failure threshold should be 50%");
    Assertions.assertEquals(10, config.getSlidingWindowSize(),
        "Payment CB sliding window should be 10 calls");
    Assertions.assertEquals(Duration.ofSeconds(30), config.getWaitDurationInOpenState(),
        "Payment CB open wait should be 30 seconds in production");
}

Key Testing Checklist

Test Case What to Verify
Bulkhead at capacity Extra calls get BulkheadFullException, not a hang
CB CLOSED → OPEN Correct failure threshold triggers the transition
CB OPEN → rejects No network calls made while OPEN
CB HALF_OPEN probe Correct number of probe calls permitted
CB HALF_OPEN → CLOSED Successful probes close the breaker
CB HALF_OPEN → OPEN Failed probes re-open the breaker
Retry success Eventual success after transient failures
Retry exhaustion Correct exception propagated, correct call count
Retry no-retry cases Client errors (4xx) not retried
Thread pool isolation Slow bulkhead doesn't degrade separate bulkhead

Resilience patterns are only as good as the tests that verify them. By combining WireMock for dependency simulation with Resilience4j's built-in metrics and state inspection APIs, you can write deterministic tests for every state transition and failure mode — before they happen in production.

Read more