k6 Load Testing: A Practical Guide for Developers

k6 Load Testing: A Practical Guide for Developers

Your app works fine with five users. You've tested it manually, the happy paths are green, and everything looks good. Then traffic spikes — a Product Hunt launch, a newsletter hit, a mention on Hacker News — and your API starts returning 503s. Response times climb from 200ms to 8 seconds. The database connection pool exhausts itself. Users leave.

Load testing exists to find that breaking point before your users do. And k6 is one of the best tools for the job.

What Is k6?

k6 is an open source load testing tool built by Grafana Labs. It lets you write load tests in JavaScript, run them from the command line, and get detailed metrics about how your system behaves under pressure. Unlike older tools like JMeter that rely on XML configuration or GUI interfaces, k6 uses a clean scripting model — you write real JavaScript, import libraries, and describe exactly what virtual users should do.

A few things make k6 stand out:

JavaScript API, not config files. Your test is code. You can use variables, loops, conditionals, and helper functions. You can import data from JSON files, generate random payloads, and organize complex scenarios cleanly.

Built-in metrics. k6 tracks response times, error rates, throughput, and more — automatically. You don't have to instrument anything.

Native CI integration. k6 runs as a single binary with no external dependencies. Drop it into any CI pipeline and it works.

Thresholds as pass/fail criteria. You define what "acceptable" means — 95th percentile response time under 500ms, error rate below 1% — and k6 exits with a non-zero code if you miss them. This makes it suitable for automated quality gates.

Installation

On macOS with Homebrew:

brew install k6

On Linux (Debian/Ubuntu):

sudo gpg -k
<span class="hljs-built_in">sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
<span class="hljs-built_in">echo <span class="hljs-string">"deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" <span class="hljs-pipe">| <span class="hljs-built_in">sudo <span class="hljs-built_in">tee /etc/apt/sources.list.d/k6.list
<span class="hljs-built_in">sudo apt-get update
<span class="hljs-built_in">sudo apt-get install k6

On Windows with Chocolatey:

choco install k6

Verify the install:

k6 version

Writing Your First Script

A k6 script exports a default function. That function is what each virtual user runs, over and over, for the duration of the test.

import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  vus: 10,        // 10 virtual users
  duration: '30s', // run for 30 seconds
};

export default function () {
  http.get('https://api.example.com/products');
  sleep(1); // wait 1 second between iterations
}

Run it:

k6 run script.js

k6 spins up 10 virtual users, each calling the endpoint repeatedly for 30 seconds with a 1-second pause between calls, then prints a summary. You'll see metrics like http_req_duration (response time percentiles), http_req_failed (error rate), and http_reqs (total request count).

The sleep() call is important. Real users don't hammer endpoints as fast as possible — they pause between actions. Without sleep, your test simulates unrealistic traffic patterns and can produce misleading results.

Virtual Users and Iterations

Virtual users (VUs) in k6 are lightweight goroutines, not OS threads or browser instances. You can run thousands of them on a single machine without the overhead that browser-based load testing tools carry.

Each VU runs the default function in a loop. The number of times it runs depends on your configuration — either a fixed duration or a fixed iteration count.

Duration-based (run until time expires):

export const options = {
  vus: 50,
  duration: '2m',
};

Iteration-based (run exactly N times total):

export const options = {
  vus: 10,
  iterations: 100, // 100 total iterations split across 10 VUs
};

For API testing, duration-based runs are more useful — they tell you how the system behaves under sustained load, not just how long it takes to complete a fixed workload.

Thresholds: Defining Pass/Fail

This is where k6 becomes genuinely useful for CI pipelines. Thresholds let you declare what "acceptable performance" means, and k6 fails the test if you don't meet them.

export const options = {
  vus: 50,
  duration: '1m',
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95th percentile under 500ms
    http_req_failed: ['rate<0.01'],    // error rate under 1%
    http_req_duration: ['p(99)<1500'], // 99th percentile under 1.5s
  },
};

When you add thresholds to CI, a regression that degrades performance will fail your build — the same way a failing unit test would. This makes load testing something you run continuously, not just before a release.

You can also set per-URL thresholds using groups or custom metrics, which is useful when you have a mix of fast and slow endpoints in the same test run.

Scenarios: Constant VUs, Ramping, and Spikes

Real traffic doesn't arrive at a constant rate. k6 scenarios let you model realistic load shapes.

Constant Load

The simplest case — a fixed number of users for a fixed duration. Good for establishing a baseline.

export const options = {
  scenarios: {
    constant_load: {
      executor: 'constant-vus',
      vus: 20,
      duration: '2m',
    },
  },
};

Ramping VUs

Gradually increase load to find where the system starts to degrade.

export const options = {
  scenarios: {
    ramp_up: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 50 },  // ramp to 50 users over 2 minutes
        { duration: '5m', target: 50 },  // hold at 50 users for 5 minutes
        { duration: '2m', target: 0 },   // ramp back down
      ],
    },
  },
};

Watching response times climb as VU count increases tells you exactly where your system's capacity ceiling sits.

Spike Test

Sudden traffic bursts — launch announcements, marketing emails going out — are the scenarios that actually take down production systems.

export const options = {
  scenarios: {
    spike: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '10s', target: 5 },   // normal traffic
        { duration: '1m', target: 200 },  // sudden spike
        { duration: '10s', target: 5 },   // back to normal
        { duration: '30s', target: 0 },   // wind down
      ],
    },
  },
};

A spike test that breaks your API is a good discovery. A spike test that passes gives you confidence your architecture can absorb real-world bursts.

Key Metrics

k6 collects a rich set of built-in metrics. These are the ones you'll use most:

http_req_duration — end-to-end request time including DNS, TCP, TLS, sending, waiting, and receiving. This is your primary latency metric. Watch the p95 and p99 values, not just the average — averages hide tail latency problems.

http_req_failed — the rate of failed requests (non-2xx responses or network errors). This should be zero under normal load; if it's not, you have a reliability problem.

http_reqs — total number of requests and the throughput rate (requests/second). Tells you what load your system actually absorbed.

vus — current active virtual users. Useful when paired with response time to see the correlation between concurrency and latency.

iteration_duration — time to complete one full iteration of the default function, including sleep time. Different from http_req_duration.

You can also define custom metrics:

import { Trend, Counter } from 'k6/metrics';

const loginDuration = new Trend('login_duration');
const checkoutErrors = new Counter('checkout_errors');

export default function () {
  const start = Date.now();
  const res = http.post('/api/login', { username: 'test', password: 'test' });
  loginDuration.add(Date.now() - start);

  if (res.status !== 200) {
    checkoutErrors.add(1);
  }
}

Custom metrics let you track business-meaningful measurements alongside the technical ones.

Testing Multi-Step Flows

Real user journeys involve multiple requests. A checkout flow isn't one API call — it's browse, add to cart, enter shipping, pay. k6 handles this naturally because your test is just JavaScript.

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 20,
  duration: '3m',
  thresholds: {
    http_req_duration: ['p(95)<800'],
    http_req_failed: ['rate<0.005'],
  },
};

export default function () {
  // Step 1: Browse products
  let res = http.get('https://api.example.com/products?page=1');
  check(res, { 'products listed': (r) => r.status === 200 });

  sleep(1);

  // Step 2: View a product
  res = http.get('https://api.example.com/products/42');
  check(res, { 'product loaded': (r) => r.status === 200 });

  sleep(2);

  // Step 3: Add to cart
  res = http.post(
    'https://api.example.com/cart',
    JSON.stringify({ product_id: 42, quantity: 1 }),
    { headers: { 'Content-Type': 'application/json' } }
  );
  check(res, { 'added to cart': (r) => r.status === 201 });

  sleep(1);
}

The check() function works like an assertion — it records pass/fail results in the summary but doesn't stop the test. Failed checks show up in the checks metric at the end.

Running k6 in CI

k6 exits with code 0 on success and code 1 if any thresholds are missed. That's all you need for CI integration.

GitHub Actions:

name: Load Test

on:
  push:
    branches: [main]

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install k6
        run: |
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
            --keyserver hkp://keyserver.ubuntu.com:80 \
            --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
            | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update && sudo apt-get install k6

      - name: Run load test
        run: k6 run --out json=results.json load-test.js

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: k6-results
          path: results.json

A few practical notes for CI load tests:

Don't run the same load profile in CI as you do in pre-release testing. CI tests should be short (30–60 seconds), run against a staging environment, and use fewer VUs. Save the heavy artillery for scheduled performance regression runs or pre-release gates.

Use environment variables for base URLs:

const BASE_URL = __ENV.BASE_URL || 'https://api.example.com';

export default function () {
  http.get(`${BASE_URL}/health`);
}

Then in CI: k6 run --env BASE_URL=https://staging.example.com script.js

Reading the Output

After a run, k6 prints a summary table. Here's how to read it:

     http_req_duration.............: avg=234ms  min=89ms   med=201ms  max=2.1s   p(90)=412ms  p(95)=618ms
     http_req_failed...............: 0.43%  ✓ 8 ✗ 1856
     http_reqs.....................: 1864   31.07/s
  • avg is misleading on its own — a 234ms average with a 2.1s max means some users are experiencing very slow responses.
  • p(95)=618ms means 5% of requests took longer than 618ms — important if you've set a threshold at 500ms.
  • http_req_failed: 0.43% — even a sub-1% error rate means roughly 1 in 230 requests is failing. Is that acceptable for your use case?

The summary also shows which thresholds passed (✓) and failed (✗), giving you an instant verdict.

Monitor Performance in Production

Load testing confirms your system can handle expected traffic under controlled conditions. That's valuable, but it's only part of the picture.

Production traffic is unpredictable — request patterns shift, third-party dependencies degrade, memory leaks accumulate over hours, not seconds. Once your k6 tests establish the baseline your system should meet, you need continuous visibility to know it's staying there.

HelpMeTest monitors your endpoints 24/7 — checking availability, response time, and correctness on a schedule. If an endpoint that passed your k6 thresholds at 300ms starts responding at 900ms in production, you'll know about it before users report it. Health checks run against your live environment, not a staging clone, so the data reflects reality.

Load testing is how you validate capacity. Continuous monitoring is how you maintain it. Run k6 in your pipeline to catch regressions before deploy, and let monitoring handle the production signal once you're live.


k6 is straightforward to get started with but has enough depth to model complex, realistic load scenarios. Start with a simple script against your most critical endpoint, add thresholds, and wire it into CI. Once you know what "normal" performance looks like under load, you have a baseline worth protecting.

Read more