Testing Alexa Skills: Unit Tests and End-to-End Automation

Testing Alexa Skills: Unit Tests and End-to-End Automation

Alexa Skill testing requires a layered approach: unit tests for individual intent handlers, integration tests for full request/response cycles, and end-to-end simulation with Bespoken. This guide covers all three layers with concrete examples in Node.js and Python.

Building a reliable Alexa Skill is only half the battle. Without a proper test suite, a small refactor can silently break your slot filling logic, corrupt session state, or cause your AudioPlayer to misbehave in ways you won't catch until users start complaining. This guide walks through a complete testing strategy for Alexa Skills — from unit-testing individual handlers to running end-to-end conversation simulations in CI.

Understanding the Request/Response Model

Every Alexa interaction is a JSON request sent to your fulfillment endpoint, which returns a JSON response. This makes unit testing remarkably straightforward: you construct the request JSON, call your handler, and assert on the response.

The ASK SDK wraps this in a HandlerInput object. To test against it directly, you need to mock HandlerInput. Here's a minimal Node.js example:

const { SkillBuilders } = require('ask-sdk-core');
const { HelloWorldIntentHandler } = require('../src/handlers');

function buildHandlerInput({ intentName, slots = {}, sessionAttributes = {} }) {
  return {
    requestEnvelope: {
      request: {
        type: 'IntentRequest',
        intent: {
          name: intentName,
          slots,
        },
      },
      session: {
        attributes: sessionAttributes,
      },
      context: {
        System: {
          device: { deviceId: 'test-device' },
          user: { userId: 'test-user' },
        },
      },
    },
    attributesManager: {
      getSessionAttributes: () => sessionAttributes,
      setSessionAttributes: jest.fn(),
      getPersistentAttributes: async () => ({}),
      setPersistentAttributes: jest.fn(),
      savePersistentAttributes: jest.fn(),
    },
    responseBuilder: require('ask-sdk-core').ResponseFactory.init(),
  };
}

test('HelloWorldIntent returns greeting', async () => {
  const handlerInput = buildHandlerInput({ intentName: 'HelloWorldIntent' });
  expect(HelloWorldIntentHandler.canHandle(handlerInput)).toBe(true);
  const response = await HelloWorldIntentHandler.handle(handlerInput);
  expect(response.outputSpeech.ssml).toContain('Hello');
});

This pattern is the foundation of all Alexa unit testing. You construct realistic HandlerInput objects and assert on the response builder output.

Using ask-sdk-test

The ask-sdk-test library (available on npm) provides a higher-level testing API that eliminates a lot of boilerplate. It models conversations as sequences of turns and handles the request construction for you.

const { alexaTest, IntentRequestBuilder } = require('ask-sdk-test');
const skill = require('../src/skill');

describe('BookingIntent', () => {
  alexaTest(skill, 'en-US', (test) => {
    test.it('handles single slot booking', [
      {
        request: new IntentRequestBuilder('BookingIntent')
          .withSlot('date', '2026-06-01')
          .withSlot('time', '14:00')
          .build(),
        says: /Your booking is confirmed for June 1/,
        shouldEndSession: true,
      },
    ]);

    test.it('elicits missing slot', [
      {
        request: new IntentRequestBuilder('BookingIntent')
          .withSlot('date', '2026-06-01')
          .build(),
        elicitsSlot: 'time',
        says: /What time would you like/,
      },
    ]);
  });
});

The says matcher accepts strings or regex. The elicitsSlot assertion verifies that your handler correctly triggers slot elicitation — critical for multi-slot intents where the order of prompts matters.

Slot Filling and Confirmation Testing

Slot filling is where most Alexa Skills have bugs. Testing the happy path is not enough — you need to cover:

  • Missing required slots (elicitation dialogs)
  • Invalid slot values (slots resolved to ER_SUCCESS_NO_MATCH)
  • Slot confirmation flows
  • Dynamic entities injected at runtime
# Python with ask_sdk_core
from ask_sdk_model import (
    IntentRequest, Intent, Slot, SlotConfirmationStatus,
    slu_entityresolution as er
)

def make_slot(name, value, status_code=er.StatusCode.ER_SUCCESS_MATCH):
    return Slot(
        name=name,
        value=value,
        resolutions=er.Resolutions(
            resolutions_per_authority=[
                er.Resolution(
                    authority="amzn1.er-authority.echo-sdk.dynamic",
                    status=er.Status(code=status_code),
                    values=[er.ValueWrapper(er.Value(name=er.ValueName(name=value, id=value.lower())))]
                )
            ]
        )
    )

# Test that unresolved slots trigger clarification
unresolved_slot = make_slot('city', 'Springfield', er.StatusCode.ER_SUCCESS_NO_MATCH)
handler_input = build_handler_input_with_slot('TravelIntent', {'city': unresolved_slot})
response = handler.handle(handler_input)
assert 'Which Springfield' in response.output_speech.ssml

Session Management Testing

Session attributes persist across turns within a conversation. Bugs in session management cause context loss — the skill forgets what the user said two turns ago. Test this explicitly:

test('session carries booking state across turns', async () => {
  const sessionAttributes = { bookingStep: 'awaiting_confirmation', date: '2026-06-01' };
  const handlerInput = buildHandlerInput({
    intentName: 'AMAZON.YesIntent',
    sessionAttributes,
  });

  const response = await ConfirmationHandler.handle(handlerInput);
  expect(response.outputSpeech.ssml).toContain('confirmed');
  // Verify session is cleared after confirmation
  expect(handlerInput.attributesManager.setSessionAttributes)
    .toHaveBeenCalledWith({});
});

AudioPlayer Testing

AudioPlayer directives are notoriously hard to test because they involve asynchronous playback events. Mock the directive construction and verify the correct directive type is returned:

const { AudioPlayerPlayDirective } = require('../src/directives');

test('PlayIntent returns AudioPlayer.Play directive', async () => {
  const handlerInput = buildHandlerInput({ intentName: 'PlayMusicIntent' });
  const response = await PlayIntentHandler.handle(handlerInput);

  const directive = response.directives[0];
  expect(directive.type).toBe('AudioPlayer.Play');
  expect(directive.playBehavior).toBe('REPLACE_ALL');
  expect(directive.audioItem.stream.url).toMatch(/^https:\/\//);
  expect(directive.audioItem.stream.token).toBeDefined();
});

test('PlaybackFinished triggers next track', async () => {
  const handlerInput = {
    requestEnvelope: {
      request: { type: 'AudioPlayer.PlaybackFinished' },
      context: { AudioPlayer: { token: 'track-123', offsetInMilliseconds: 180000 } },
    },
    // ... rest of mock
  };
  const response = await PlaybackFinishedHandler.handle(handlerInput);
  expect(response.directives[0].type).toBe('AudioPlayer.Play');
  expect(response.directives[0].audioItem.stream.token).toBe('track-124');
});

End-to-End Testing with Bespoken

Unit tests verify individual handlers in isolation. Bespoken provides end-to-end testing that simulates the full Alexa pipeline — speech recognition, NLU, your skill, and response synthesis.

Install the Bespoken CLI and configure a testing.json:

{
  "handler": "src/index.handler",
  "locale": "en-US",
  "type": "unit",
  "interactionModel": "models/en-US.json",
  "tests": [
    {
      "description": "Full booking flow",
      "utterances": [
        "open my booking app",
        "book a table for two",
        "tomorrow at seven",
        "yes"
      ],
      "assertions": [
        { "turn": 0, "response": "Welcome to" },
        { "turn": 1, "response": "What date" },
        { "turn": 2, "response": "What time" },
        { "turn": 3, "response": "confirmed" }
      ]
    }
  ]
}

Run with bst test testing.json. Bespoken handles the NLU resolution locally using your interaction model, so you don't need an actual Alexa device or developer account in CI.

CI/CD Integration

Add Alexa testing to your GitHub Actions pipeline:

name: Alexa Skill Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npm test -- --coverage
      - run: npx bst test testing.json
      - name: Upload coverage
        uses: codecov/codecov-action@v4

For Python-based skills using Flask-Ask or the Python SDK, use pytest with the --tb=short flag and structure your fixtures in conftest.py to build handler inputs cleanly.

Testing LaunchRequest and Error Handling

Don't forget to test your LaunchRequest handler and the global error handler — these are the two most common places skills fail silently:

test('LaunchRequest speaks welcome message', async () => {
  const handlerInput = {
    requestEnvelope: { request: { type: 'LaunchRequest' } },
    // ... mock rest
  };
  const response = await LaunchHandler.handle(handlerInput);
  expect(response.outputSpeech.ssml).toMatch(/Welcome/);
  expect(response.reprompt).toBeDefined();
  expect(response.shouldEndSession).toBe(false);
});

test('ErrorHandler logs and recovers', async () => {
  const error = new Error('DynamoDB timeout');
  const handlerInput = buildHandlerInput({ intentName: 'FallbackIntent' });
  const response = await ErrorHandler.handle(handlerInput, error);
  expect(response.outputSpeech.ssml).toContain('Sorry');
  expect(response.shouldEndSession).toBe(false);
});

Monitoring in Production

Unit and E2E tests catch regressions before deployment, but production Alexa Skills need ongoing monitoring too. HelpMeTest can run your Alexa conversation flows on a schedule and alert you when NLU accuracy drops or response latency spikes — before your users notice.

The combination of a solid unit test suite with Bespoken E2E tests in CI, plus production monitoring, gives you full confidence from development through live operation.

Read more