Debugging Webhook Failures in CI: A Systematic Approach

Debugging Webhook Failures in CI: A Systematic Approach

Webhook failures in CI are some of the most frustrating bugs to debug. The test passes locally but fails in CI. The webhook arrives but nothing happens. Logs show 200 OK but the feature doesn't work.

This guide gives you a systematic approach for diagnosing and fixing webhook failures in CI pipelines.

The Debugging Hierarchy

Start with the simplest explanation and work up:

  1. Is the webhook arriving? — Check request logs at your endpoint
  2. Is the signature valid? — Log verification result before any processing
  3. Is the payload correct? — Log the deserialized payload
  4. Did the handler run? — Add entry/exit logs to your handler
  5. Did side effects complete? — Log database writes, queue publishes, API calls
  6. Did the response go out? — Log the status code returned

If you can answer each question from your logs, you can pinpoint any failure.

Setting Up Webhook Request Logging

Add structured logging at the entry point of every webhook handler:

app.post('/webhooks/:provider', async (req, res) => {
  const requestId = crypto.randomUUID();
  const logger = createLogger({ requestId, provider: req.params.provider });

  logger.info('webhook_received', {
    headers: {
      'content-type': req.headers['content-type'],
      'x-request-id': req.headers['x-request-id'],
      // Include provider-specific sig headers
      'stripe-signature': req.headers['stripe-signature'] ? '[present]' : '[missing]',
      'x-hub-signature-256': req.headers['x-hub-signature-256'] ? '[present]' : '[missing]',
    },
    body_size: req.body?.length || 0,
    timestamp: new Date().toISOString()
  });

  try {
    const result = await handleWebhook(req.params.provider, req);
    logger.info('webhook_processed', { result });
    res.json({ received: true, requestId });
  } catch (error) {
    logger.error('webhook_failed', {
      error: error.message,
      stack: error.stack
    });
    res.status(500).json({ error: 'Processing failed', requestId });
  }
});

In CI, always print these logs even on success — they're your debug trail if a later stage fails.

Capturing Webhook Payloads in CI

The hardest part of debugging webhook CI failures is that you can't easily replay them. Solve this by persisting captured payloads:

// Capture middleware — write to file in CI
app.post('/webhooks/*', (req, res, next) => {
  if (process.env.CI) {
    const filename = `webhook-captures/${Date.now()}-${req.path.replace(/\//g, '_')}.json`;
    fs.writeFileSync(filename, JSON.stringify({
      path: req.path,
      headers: req.headers,
      body: req.body
    }, null, 2));
  }
  next();
});

In your GitHub Actions workflow, upload captured webhooks as artifacts:

- name: Run webhook integration tests
  run: npm test -- --grep "webhook"

- name: Upload webhook captures on failure
  if: failure()
  uses: actions/upload-artifact@v4
  with:
    name: webhook-captures
    path: webhook-captures/
    retention-days: 7

When a CI run fails, download the artifacts and replay the exact payload locally.

Replaying Captured Payloads

Once you have a captured payload, replay it against your local server:

# Start your server
node server.js &

<span class="hljs-comment"># Replay the captured webhook
<span class="hljs-built_in">cat webhook-captures/1234567890-webhooks_stripe.json <span class="hljs-pipe">| \
  node scripts/replay-webhook.js
// scripts/replay-webhook.js
const fs = require('fs');
const http = require('http');

const capture = JSON.parse(fs.readFileSync('/dev/stdin', 'utf8'));

const req = http.request({
  hostname: 'localhost',
  port: 3000,
  path: capture.path,
  method: 'POST',
  headers: {
    ...capture.headers,
    host: 'localhost:3000'
  }
}, (res) => {
  console.log(`Status: ${res.statusCode}`);
  res.pipe(process.stdout);
});

req.write(JSON.stringify(capture.body));
req.end();

Common CI-Specific Failure Patterns

1. Signature Verification Fails Due to Body Parsing

Symptom: Signature valid locally, invalid in CI.

Cause: Some CI environments or reverse proxies normalize request bodies. The raw bytes sent don't match what your handler reads.

Fix: Log the raw body bytes in CI and compare with the signature input:

app.post('/webhooks/stripe', express.raw({ type: '*/*' }), (req, res) => {
  if (process.env.CI) {
    console.log('Raw body length:', req.body.length);
    console.log('Raw body hash:', crypto.createHash('md5').update(req.body).digest('hex'));
  }
  // ... verify signature
});

2. Environment Variables Missing

Symptom: undefined webhook secret, signature always fails.

Fix: Add an env var check at startup and in your test setup:

// In your webhook handler
const webhookSecret = process.env.WEBHOOK_SECRET;
if (!webhookSecret) {
  throw new Error('WEBHOOK_SECRET environment variable is required');
}
# In GitHub Actions
- name: Run webhook tests
  run: npm test
  env:
    WEBHOOK_SECRET: ${{ secrets.WEBHOOK_SECRET }}
    # Fail loudly if secret is missing

3. Database Not Ready When Webhook Arrives

Symptom: Handler returns 500, logs show connection refused.

Fix: Add a readiness check before starting tests:

- name: Wait for database
  run: |
    until pg_isready -h localhost -p 5432; do
      echo "Waiting for database..."
      sleep 1
    done

- name: Run webhook tests
  run: npm test

4. Webhook URL Unreachable in CI

Symptom: Integration tests that wait for webhook callback never complete.

Cause: CI environment has no public URL for providers to call back.

Fix: Use smee.io as a proxy to forward webhook events to CI:

- name: Start smee proxy
  run: npx smee-client --url https://smee.io/YOUR_CHANNEL --path /webhooks/github --port 3000 &

- name: Run integration tests
  run: npm run test:integration
  env:
    WEBHOOK_PROXY_URL: https://smee.io/YOUR_CHANNEL

Or use HelpMeTest's proxy to expose your local test server:

helpmetest proxy start localhost:3000
# Use the provided public URL as your webhook endpoint

5. Race Condition: Test Asserts Before Webhook Arrives

Symptom: Test passes locally but fails intermittently in CI (flaky).

Cause: CI is slower; webhook processing takes longer than expected.

Fix: Never use fixed sleeps. Use polling with a timeout:

// Wrong
await new Promise(resolve => setTimeout(resolve, 2000));
const result = await db.getResult();
expect(result).toBeDefined();

// Right
const result = await waitForCondition(
  () => db.getResult(),
  { timeout: 10000, interval: 200 }
);
expect(result).toBeDefined();

Structured CI Test Output for Webhook Tests

Make webhook test failures self-documenting:

afterEach(async () => {
  if (testFailed) {
    console.log('=== WEBHOOK DEBUG INFO ===');
    console.log('Received webhooks:', JSON.stringify(capturedWebhooks, null, 2));
    console.log('Database state:', JSON.stringify(await db.dump(), null, 2));
    console.log('Queue messages:', JSON.stringify(await queue.drain(), null, 2));
    console.log('=========================');
  }
});

End-to-End Monitoring with HelpMeTest

For production monitoring of webhook endpoints, HelpMeTest can run continuous tests against your staging environment:

*** Test Cases ***
Webhook Endpoint Returns 400 For Invalid Signature
    ${payload}=    Set Variable    {"type": "test.event"}
    ${response}=   POST    ${WEBHOOK_URL}/stripe
    ...    headers={"stripe-signature": "v1=invalid"}
    ...    data=${payload}
    Should Be Equal As Integers    ${response.status_code}    400

Webhook Endpoint Returns 200 For Valid Payload
    ${payload}=    Get File    fixtures/stripe_payment_succeeded.json
    ${sig}=        Build Stripe Signature    ${payload}    ${STRIPE_WEBHOOK_SECRET}
    ${response}=   POST    ${WEBHOOK_URL}/stripe
    ...    headers={"stripe-signature": "${sig}"}
    ...    data=${payload}
    Should Be Equal As Integers    ${response.status_code}    200

Run these as health checks on a schedule to detect regressions before they reach production.

The Webhook CI Debugging Checklist

When a webhook test fails in CI, work through this list:

  • Is the webhook request actually being made? (Check request logs)
  • Does the signature header exist? (Log header presence)
  • Is the secret available in CI? (Check env vars)
  • Is the raw body untampered? (Log body hash)
  • Is the database ready? (Add readiness check)
  • Is the webhook URL reachable? (Use smee.io or HelpMeTest proxy)
  • Are you waiting long enough? (Use polling, not fixed sleeps)
  • Did the handler complete? (Log entry/exit)
  • Are captured payloads available? (Upload as CI artifacts)

Summary

Webhook failures in CI almost always fall into a few categories: signature verification issues from body parsing, missing environment variables, race conditions from fixed sleeps, or the webhook URL being unreachable. The fix for all of them is the same: add structured logging, capture payloads as CI artifacts, and use polling instead of sleeps.

Once you can replay captured payloads locally and have structured debug output in CI, webhook failures stop being mysterious and start being straightforward to fix.

Read more