Spike Testing with Gatling: Simulating Sudden Traffic Surges

Spike Testing with Gatling: Simulating Sudden Traffic Surges

Spike testing simulates sudden, dramatic traffic increases — a product launch going viral, a flash sale, or a marketing email hitting millions of inboxes simultaneously. Unlike stress testing (which ramps gradually), spike testing applies an immediate, extreme load increase to test instantaneous system response.

Gatling is one of the best tools for spike testing. Its Scala DSL and injection profiles give precise control over traffic patterns, and its HTML reports make it easy to spot exactly when and where systems buckle.

Why Spike Testing Matters

Gradual load increases give systems time to adapt — connection pools grow, caches warm, JVM JIT kicks in. Spike testing removes that luxury.

Real traffic spikes don't warn you. A tweet from a celebrity, a Product Hunt launch, or a surprise media mention can send traffic from normal to 10x in seconds. Spike testing answers:

  • Does the system survive sudden overload?
  • How long does recovery take?
  • Does auto-scaling kick in fast enough?
  • Do circuit breakers fire before cascading failures occur?

Gatling Basics

Gatling simulates HTTP traffic using virtual users (VUs) defined in Scala (or Java). The simulation defines:

  1. Protocol: base URL, headers, authentication
  2. Scenario: the sequence of requests a user makes
  3. Injection profile: how users are introduced over time

Install Gatling:

# Using the bundle
wget https://repo1.maven.org/maven2/io/gatling/highcharts/gatling-charts-highcharts-bundle/3.10.3/gatling-charts-highcharts-bundle-3.10.3-bundle.zip
unzip gatling-charts-highcharts-bundle-3.10.3-bundle.zip

Or via Maven/Gradle for project integration.

Basic Spike Test Simulation

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class SpikeTestSimulation extends Simulation {

  val httpProtocol = http
    .baseUrl("https://api.example.com")
    .acceptHeader("application/json")
    .contentTypeHeader("application/json")

  val scn = scenario("API Spike Test")
    .exec(
      http("Get Items")
        .get("/items")
        .check(status.is(200))
    )
    .pause(1)

  setUp(
    scn.inject(
      // Normal baseline
      constantUsersPerSec(10).during(2.minutes),
      // Spike: 10x normal traffic, instantly
      atOnceUsers(1000),
      // Sustain spike
      constantUsersPerSec(100).during(3.minutes),
      // Return to normal
      constantUsersPerSec(10).during(2.minutes)
    ).protocols(httpProtocol)
  ).assertions(
    global.responseTime.percentile(95).lt(2000),
    global.failedRequests.percent.lt(5)
  )
}

The atOnceUsers(1000) injection fires 1,000 virtual users simultaneously — this is the spike. The subsequent constantUsersPerSec profile simulates sustained elevated load.

Injection Profiles for Spike Testing

Gatling provides several injection types useful for spike scenarios:

Immediate Spike

atOnceUsers(500)

All 500 users hit simultaneously. Maximum instantaneous stress.

Ramp Then Spike

rampUsers(100).during(1.minute),
atOnceUsers(2000),
rampUsers(100).during(5.minutes)

Establish baseline first, then spike. Tests whether the system state before the spike affects recovery.

Multiple Spikes

constantUsersPerSec(10).during(2.minutes),
atOnceUsers(500),
nothingFor(30.seconds),
atOnceUsers(500),
nothingFor(30.seconds),
atOnceUsers(500),
constantUsersPerSec(10).during(2.minutes)

Tests whether systems recover between spikes, or whether repeated spikes cause cumulative degradation.

Heaviside Step Function

heavisideUsers(1000).during(10.seconds)

Injects users following a smooth S-curve over 10 seconds. Less brutal than atOnceUsers, more realistic than a pure ramp.

Modeling Realistic Spike Scenarios

A spike rarely hits a single endpoint. Model realistic traffic distribution:

val browsing = scenario("Browse")
  .exec(http("Home").get("/").check(status.is(200)))
  .pause(2)
  .exec(http("Products").get("/products").check(status.is(200)))
  .pause(1)

val checkout = scenario("Checkout")
  .exec(http("Cart").get("/cart").check(status.is(200)))
  .pause(1)
  .exec(
    http("Place Order")
      .post("/orders")
      .body(StringBody("""{"product_id": 1, "qty": 1}"""))
      .check(status.is(201))
  )

setUp(
  browsing.inject(
    nothingFor(30.seconds),
    atOnceUsers(800)  // 80% browse
  ),
  checkout.inject(
    nothingFor(30.seconds),
    atOnceUsers(200)  // 20% checkout
  )
).protocols(httpProtocol)

This models a realistic spike where most users browse and a minority transact.

Adding Assertions

Assertions define pass/fail criteria:

.assertions(
  // Global assertions
  global.responseTime.percentile(95).lt(2000),     // p95 < 2s
  global.failedRequests.percent.lt(1),              // < 1% errors
  
  // Per-request assertions
  details("Place Order").responseTime.percentile(99).lt(5000),
  details("Browse").failedRequests.percent.lt(0.1),
)

Gatling exits with a non-zero code if assertions fail, making CI integration straightforward.

Interpreting Gatling Reports

After each run, Gatling generates an HTML report in results/. Key sections:

Statistics summary: shows min, max, mean, p50, p75, p95, p99 for each request. Focus on p95 and p99 — if these are orders of magnitude above p50, you have tail latency problems.

Response time over time: the graph should stabilize during the spike, not continuously increase. A continuously rising line indicates queuing — requests aren't completing fast enough.

Active users over time: compare this with response time. If response time spikes immediately when users arrive, you have a capacity problem. If it spikes with a delay, you have a buffering/queue problem.

Percentiles distribution: see what percentage of requests fall within each latency bucket.

Testing Auto-Scaling

If your infrastructure auto-scales (Kubernetes HPA, AWS ECS, etc.), spike tests validate the scaling response time:

  1. Start with baseline load
  2. Spike to 10x — watch for initial latency increase as new pods spin up
  3. Monitor how long until performance normalizes
  4. Verify the system handles load during the scaling window

The key question: is your scaling fast enough that users experience degradation for less than 30 seconds? Less than 5 minutes? Define the acceptable window before testing.

CI/CD Integration

Run spike tests via Maven:

mvn gatling:test -Dgatling.simulationClass=SpikeTestSimulation

Or via the Gatling bundle:

GATLING_HOME/bin/gatling.sh -s SpikeTestSimulation

In GitHub Actions:

- name: Run spike test
  run: mvn gatling:test -Dgatling.simulationClass=SpikeTestSimulation
  
- name: Upload Gatling report
  uses: actions/upload-artifact@v3
  if: always()
  with:
    name: gatling-report
    path: target/gatling/

Run spike tests on a schedule or before major releases — they're too slow and expensive for every commit.

Common Pitfalls

Testing in production without a kill switch: always have a way to abort the test immediately if something starts failing for real users.

Insufficient warm-up: if your system has no baseline load, JVM JIT and connection pools aren't initialized. Add a brief warm-up phase before the spike.

Ignoring the cooldown: recovery behavior is part of the test. Watch whether performance returns to baseline after the spike subsides, and how long it takes.

Not correlating with infrastructure metrics: Gatling shows client-side behavior. Combine it with server-side metrics (CPU, memory, DB connections) to identify the root cause of degradation.

Conclusion

Spike testing with Gatling gives you precise control over traffic injection patterns and detailed reports for analysis. The combination of atOnceUsers for instantaneous spikes and Gatling's assertion framework makes it straightforward to validate auto-scaling behavior and define automated pass/fail criteria.

Pair spike test results with continuous monitoring. Tools like HelpMeTest verify that your application's features stay correct under and after load — functional and performance testing together provide complete production confidence.

Read more