SaaS API Rate Limiting and Quota Testing: Per-Tenant Limits and 429 Handling

SaaS API Rate Limiting and Quota Testing: Per-Tenant Limits and 429 Handling

Rate limiting bugs in SaaS applications come in two flavors: limits that are too strict (blocking legitimate users) and limits that don't work at all (letting abuse through). Both are costly. This guide covers how to test per-tenant rate limiting, burst handling, quota enforcement, and 429 response behavior.

The Rate Limiting Test Surface

For a SaaS API, you need to test:

  1. Per-tenant limits — tenant A's traffic doesn't affect tenant B's quota
  2. Rate limit headersX-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
  3. 429 responses — correct status code, Retry-After header, response body
  4. Burst handling — requests within a burst window vs. sustained rate
  5. Quota enforcement — monthly/daily usage caps per plan
  6. Limit recovery — quota resets correctly after the window expires

Unit Testing: Rate Limiter Logic

// middleware/rateLimiter.ts
import { RateLimiter } from 'limiter'
import { redis } from '../config/redis'

export interface RateLimitResult {
  allowed: boolean
  limit: number
  remaining: number
  resetAt: Date
}

export async function checkRateLimit(
  tenantId: string,
  plan: 'free' | 'pro' | 'enterprise'
): Promise<RateLimitResult> {
  const limits = {
    free: { requests: 100, windowSeconds: 60 },
    pro: { requests: 1000, windowSeconds: 60 },
    enterprise: { requests: 10000, windowSeconds: 60 },
  }

  const config = limits[plan]
  const key = `rate:${tenantId}:${Math.floor(Date.now() / (config.windowSeconds * 1000))}`

  const current = await redis.incr(key)

  if (current === 1) {
    await redis.expire(key, config.windowSeconds)
  }

  const ttl = await redis.ttl(key)
  const resetAt = new Date(Date.now() + ttl * 1000)

  return {
    allowed: current <= config.limits.requests,
    limit: config.requests,
    remaining: Math.max(0, config.requests - current),
    resetAt,
  }
}
// middleware/rateLimiter.test.ts
import { checkRateLimit } from './rateLimiter'
import { redis } from '../config/redis'

jest.mock('../config/redis')

const mockRedis = redis as jest.Mocked<typeof redis>

describe('checkRateLimit', () => {
  beforeEach(() => {
    jest.clearAllMocks()
  })

  it('allows requests within the free plan limit', async () => {
    mockRedis.incr.mockResolvedValue(1)
    mockRedis.expire.mockResolvedValue(1)
    mockRedis.ttl.mockResolvedValue(60)

    const result = await checkRateLimit('tenant-1', 'free')

    expect(result.allowed).toBe(true)
    expect(result.limit).toBe(100)
    expect(result.remaining).toBe(99)
  })

  it('blocks requests that exceed the free plan limit', async () => {
    mockRedis.incr.mockResolvedValue(101) // Over limit
    mockRedis.expire.mockResolvedValue(1)
    mockRedis.ttl.mockResolvedValue(45)

    const result = await checkRateLimit('tenant-1', 'free')

    expect(result.allowed).toBe(false)
    expect(result.remaining).toBe(0)
  })

  it('pro plan has higher limits than free', async () => {
    mockRedis.incr.mockResolvedValue(500)
    mockRedis.expire.mockResolvedValue(1)
    mockRedis.ttl.mockResolvedValue(30)

    const result = await checkRateLimit('tenant-1', 'pro')

    expect(result.allowed).toBe(true) // 500 < 1000
    expect(result.limit).toBe(1000)
  })

  it('uses separate Redis keys per tenant', async () => {
    mockRedis.incr.mockResolvedValue(1)
    mockRedis.expire.mockResolvedValue(1)
    mockRedis.ttl.mockResolvedValue(60)

    await checkRateLimit('tenant-a', 'free')
    await checkRateLimit('tenant-b', 'free')

    const keys = mockRedis.incr.mock.calls.map((call) => call[0])
    expect(keys[0]).toContain('tenant-a')
    expect(keys[1]).toContain('tenant-b')
    expect(keys[0]).not.toBe(keys[1])
  })
})

API-Level Rate Limit Tests

Test the middleware integration with your API:

// tests/api/rateLimiting.test.ts
import supertest from 'supertest'
import app from '../../src/app'

describe('Rate limiting headers', () => {
  let token: string

  beforeAll(async () => {
    token = await getAuthToken({ plan: 'free' })
  })

  it('includes rate limit headers on every response', async () => {
    const response = await supertest(app)
      .get('/api/projects')
      .set('Authorization', `Bearer ${token}`)

    expect(response.headers['x-ratelimit-limit']).toBeDefined()
    expect(response.headers['x-ratelimit-remaining']).toBeDefined()
    expect(response.headers['x-ratelimit-reset']).toBeDefined()
    expect(Number(response.headers['x-ratelimit-limit'])).toBe(100)
  })

  it('decrements remaining on each request', async () => {
    const first = await supertest(app)
      .get('/api/projects')
      .set('Authorization', `Bearer ${token}`)

    const second = await supertest(app)
      .get('/api/projects')
      .set('Authorization', `Bearer ${token}`)

    const firstRemaining = Number(first.headers['x-ratelimit-remaining'])
    const secondRemaining = Number(second.headers['x-ratelimit-remaining'])

    expect(secondRemaining).toBe(firstRemaining - 1)
  })
})

describe('429 Too Many Requests', () => {
  it('returns 429 with correct headers when limit is exceeded', async () => {
    const limitedToken = await getAuthToken({ plan: 'free', exhaustedLimit: true })

    const response = await supertest(app)
      .get('/api/projects')
      .set('Authorization', `Bearer ${limitedToken}`)

    expect(response.status).toBe(429)
    expect(response.headers['retry-after']).toBeDefined()
    expect(Number(response.headers['retry-after'])).toBeGreaterThan(0)
    expect(response.body.error).toMatch(/rate limit/i)
  })

  it('includes Retry-After in seconds', async () => {
    const limitedToken = await getAuthToken({ plan: 'free', exhaustedLimit: true })

    const response = await supertest(app)
      .get('/api/projects')
      .set('Authorization', `Bearer ${limitedToken}`)

    const retryAfter = Number(response.headers['retry-after'])
    expect(retryAfter).toBeGreaterThan(0)
    expect(retryAfter).toBeLessThanOrEqual(60) // within the window
  })
})

Testing Per-Tenant Isolation

Rate limits must be isolated per tenant:

describe('Per-tenant rate limit isolation', () => {
  it("tenant A's traffic does not affect tenant B's quota", async () => {
    const tokenA = await getAuthToken({ tenantId: 'tenant-a', plan: 'free' })
    const tokenB = await getAuthToken({ tenantId: 'tenant-b', plan: 'free' })

    // Exhaust tenant A's limit
    for (let i = 0; i < 100; i++) {
      await supertest(app)
        .get('/api/projects')
        .set('Authorization', `Bearer ${tokenA}`)
    }

    // Tenant A should be rate limited
    const responseA = await supertest(app)
      .get('/api/projects')
      .set('Authorization', `Bearer ${tokenA}`)
    expect(responseA.status).toBe(429)

    // Tenant B should still have full quota
    const responseB = await supertest(app)
      .get('/api/projects')
      .set('Authorization', `Bearer ${tokenB}`)
    expect(responseB.status).toBe(200)
    expect(Number(responseB.headers['x-ratelimit-remaining'])).toBeGreaterThan(90)
  })
})

Load Testing Rate Limits with k6

For burst and sustained load testing, use k6:

// tests/load/rateLimiting.js
import http from 'k6/http'
import { check, sleep } from 'k6'

export const options = {
  scenarios: {
    burst: {
      executor: 'constant-arrival-rate',
      rate: 150,          // 150 req/s (above the 100/min free limit)
      timeUnit: '1m',
      duration: '2m',
      preAllocatedVUs: 20,
    },
  },
}

export default function () {
  const response = http.get('https://api.example.com/api/projects', {
    headers: { Authorization: `Bearer ${__ENV.TEST_TOKEN}` },
  })

  check(response, {
    'has rate limit headers': (r) =>
      r.headers['X-Ratelimit-Limit'] !== undefined,
    'returns 200 or 429': (r) => [200, 429].includes(r.status),
    '429 has Retry-After header': (r) =>
      r.status !== 429 || r.headers['Retry-After'] !== undefined,
  })
}

Testing Quota Reset

describe('Quota reset', () => {
  it('quota resets after the window expires', async () => {
    // Mock the time to control the window
    jest.useFakeTimers()
    const token = await getAuthToken({ plan: 'free' })

    // Use up the quota
    for (let i = 0; i < 100; i++) {
      await supertest(app).get('/api/projects').set('Authorization', `Bearer ${token}`)
    }

    // Advance time past the window
    jest.advanceTimersByTime(61 * 1000)

    // Should work again
    const response = await supertest(app)
      .get('/api/projects')
      .set('Authorization', `Bearer ${token}`)

    expect(response.status).toBe(200)
    expect(Number(response.headers['x-ratelimit-remaining'])).toBe(99)

    jest.useRealTimers()
  })
})

Testing Monthly Quota Enforcement

For usage-based plans with monthly caps:

describe('Monthly quota', () => {
  it('blocks requests when monthly quota is exhausted', async () => {
    // Seed a tenant with 0 remaining quota
    await db.tenants.update('tenant-1', { monthlyRequestsUsed: 10000, monthlyLimit: 10000 })

    const token = await getAuthToken({ tenantId: 'tenant-1', plan: 'pro' })
    const response = await supertest(app)
      .get('/api/projects')
      .set('Authorization', `Bearer ${token}`)

    expect(response.status).toBe(429)
    expect(response.body.error).toMatch(/monthly quota/i)
    expect(response.body.upgradeUrl).toBeDefined()
  })
})

What Automated Tests Miss

Unit and API tests cover rate limit logic but won't catch:

  • Redis cluster failover — rate limits become unenforced during Redis downtime
  • Clock synchronization — rate limit windows drift between nodes without NTP
  • Distributed counting accuracy — Redis INCR races under very high concurrency
  • CDN bypass — rate limits applied only at the origin, not at the CDN edge

HelpMeTest runs scheduled tests that verify rate limiting behaviour in your staging environment — hitting real endpoints, checking real headers — so you catch misconfigured limits before they affect paying customers. Pro plan at $100/month.

Summary

Testing SaaS rate limiting:

  • Unit tests — limiter logic with mocked Redis; verify allowed/blocked and header values
  • API tests — headers on every response, 429 with Retry-After, correct limit values
  • Tenant isolation — exhaust one tenant's quota, verify others are unaffected
  • Quota reset — fake timers to verify window-based reset
  • Load tests — k6 burst scenarios to verify limits hold under real traffic

Read more