Testing

Screenshot Comparison Tools Compared: Percy vs Applitools vs Chromatic vs BackstopJS

HelpMeTest

15 May 2026 — 5 min read

Visual regression testing tools have proliferated over the last five years, and the differences between them matter. Choosing the wrong tool for your team's context means either paying for features you don't need or missing capabilities that would save hours of manual QA.

This guide compares the four most commonly used screenshot comparison tools: Percy (BrowserStack), Applitools Eyes, Chromatic, and BackstopJS. For each, we cover how it works, what it costs, what it catches, and which team type it fits.

Quick Comparison

Tool	Pricing model	Best for	Diff algorithm	Cloud/Self-hosted
Percy	Snapshot-based	Multi-page web apps	Pixel diff	Cloud only
Applitools Eyes	Checkpoint-based	Enterprise, cross-browser	Visual AI	Cloud only
Chromatic	Snapshot-based	Storybook component libraries	Pixel diff	Cloud only
BackstopJS	Open source	Self-hosted, any team	Pixel diff	Self-hosted

Percy (BrowserStack)

How it works

Percy integrates with Cypress, Playwright, Selenium, and Storybook. You call cy.percySnapshot('name') in your tests; Percy captures the DOM snapshot and renders it server-side in their cloud, then diffs against the approved baseline.

The DOM-capture approach means Percy re-renders with consistent fonts and doesn't vary on client machine differences. Cross-browser testing runs the same snapshot in multiple browsers simultaneously.

Pricing

Free: 5,000 screenshots/month
Pro: starts at $599/month (35,000 screenshots)
Each snapshot at each width = one screenshot

For a mid-size project running visual tests on every PR with 200 snapshots across 3 widths, that's 600 screenshots per run. At 20 PRs/week you'd use ~12,000 screenshots/month — the free tier lasts about 2 weeks, then you need Pro.

Strengths

Easy integration — one npm package, one line of code per snapshot
Clean GitHub PR workflow — Percy posts a status check, team reviews in the Percy dashboard
DOM capture means consistent cross-browser rendering
Good documentation, large community

Weaknesses

No AI-powered noise reduction — pixel changes (even tiny font rendering differences) trigger reviews
Can generate significant review noise when testing on multiple browsers
Limited configurability for what counts as a "meaningful" visual change
No self-hosting option

Best for

Teams running Cypress or Playwright for E2E tests who want to layer visual regression on top with minimal setup. Good fit for web apps with stable designs and low browser variance.

Applitools Eyes

How it works

Applitools uses Visual AI — a model trained on millions of UI screenshots — to distinguish real visual regressions from irrelevant rendering noise (anti-aliasing, sub-pixel font differences, shadow opacity). When comparing screenshots, the AI classifies each difference as meaningful or noise.

The Ultrafast Grid renders captured DOM snapshots across many browsers and devices in parallel, so you can test 10 browser configurations with one test run.

Pricing

Free: 100 checkpoints/month
Paid: per-checkpoint pricing, negotiated enterprise contracts
The "checkpoint" model means each eyes.check() call is one checkpoint, multiplied by the number of browsers/viewports configured

Enterprise pricing is not public — contact sales. Mid-market teams typically report $20,000-$80,000/year for meaningful usage.

Strengths

Visual AI dramatically reduces false positives
Ultrafast Grid for true cross-browser visual testing at scale
Match level configuration (Strict, Content, Layout) for different types of content
SDKs for every major testing framework
Floating regions and ignore regions for dynamic content
Strong enterprise features: team management, SLAs, audit trails

Weaknesses

Significantly more expensive than alternatives
Setup is more complex than Percy — configuration is more involved
Cloud-only, no self-hosting
Overkill for smaller teams or simple UIs

Best for

Enterprise teams testing complex UIs across many browsers, design system teams that need to catch subtle visual regressions in component libraries, or teams where false positives from simpler tools are creating significant review burden.

Chromatic

How it works

Chromatic is built specifically for Storybook. It captures every story in your Storybook, renders them in its cloud, and diffs against the approved baseline. Changes are reviewed in the Chromatic dashboard before the PR can merge.

TurboSnap (Chromatic's smart snapshot system) analyzes your git diff to identify which components changed, then only re-snapshots stories that could be affected — dramatically reducing snapshot count.

Pricing

Free: 5,000 snapshots/month
Pro: $149/month (35,000 snapshots)
Team: $349/month (100,000 snapshots)

TurboSnap makes the free tier go significantly further for teams making targeted changes.

Strengths

Native Storybook integration — zero configuration beyond adding the npm package
TurboSnap reduces snapshot usage by 70-90% on typical feature branches
Interaction tests via Storybook's play function
Clean PR review workflow
Component-level granularity — changes are scoped to individual components
Living documentation: Chromatic publishes your Storybook as a reference

Weaknesses

Only works with Storybook (strong requirement)
Tests components in isolation — doesn't catch composition bugs at the page level
No AI noise reduction — pixel-level diffs can be noisy

Best for

React, Vue, Angular, or Svelte teams using Storybook for component documentation. If you maintain a design system or component library, Chromatic is the clear choice. If you don't use Storybook, use a different tool.

BackstopJS

How it works

BackstopJS is open source and self-hosted. You configure URL/selector pairs in backstop.json, and BackstopJS uses headless Chrome (via Puppeteer) to capture screenshots and compare them against reference images stored in your repository.

{
  "viewports": [
    { "label": "mobile", "width": 375, "height": 812 },
    { "label": "tablet", "width": 768, "height": 1024 },
    { "label": "desktop", "width": 1280, "height": 800 }
  ],
  "scenarios": [
    {
      "label": "Homepage",
      "url": "https://your-app.com/",
      "selectors": ["document"],
      "misMatchThreshold": 0.1
    },
    {
      "label": "Checkout Form",
      "url": "https://your-app.com/checkout",
      "selectors": [".checkout-form"],
      "delay": 1000,
      "misMatchThreshold": 0.05
    }
  ]
}

Reference images are committed to the repository. The diff engine is Resemblejs with configurable tolerance.

Pricing

Free and open source. You pay for the infrastructure to run it (a CI runner). For most teams, this is near-zero marginal cost.

Strengths

Completely free and open source
Self-hosted — data never leaves your infrastructure
Configurable tolerance threshold per scenario
Docker container available for CI
No per-screenshot limits
Good fit for compliance environments that can't use cloud tools

Weaknesses

Reference images committed to the repo → merge conflicts when multiple branches update references simultaneously
Setup and maintenance overhead vs. cloud tools
No built-in review workflow — you look at the diff report locally or in CI artifacts
No AI noise reduction
No cross-browser rendering by default (uses Puppeteer/Chrome)
Requires careful configuration to avoid flaky screenshots

Self-hosted alternative: Playwright's built-in screenshot comparison

Playwright has built-in visual comparison without any third-party service:

// playwright.config.js
module.exports = {
  expect: {
    toHaveScreenshot: {
      threshold: 0.2,
      maxDiffPixels: 100
    }
  }
};

// In tests
test('homepage visual', async ({ page }) => {
  await page.goto('/');
  await expect(page).toHaveScreenshot('homepage.png');
});

Playwright stores reference screenshots in the repository (like BackstopJS) and supports updating them with --update-snapshots. It's effectively a simpler, built-in BackstopJS — suitable for teams that are already using Playwright and want visual testing without a separate tool.

How to Choose

Choose Percy if:

You use Cypress or Playwright for E2E tests
You want the simplest possible integration
Your team makes 5-20 PRs/week with <200 snapshots each
Budget: free tier to $599+/month

Choose Applitools if:

You need true cross-browser visual testing at scale
False positives from pixel-level diffing are costing hours of review time
You test complex, data-heavy UIs where subtle visual changes matter
Budget: enterprise pricing ($20k+/year)

Choose Chromatic if:

Your team uses Storybook
You maintain a component library or design system
You want component-level visual testing, not page-level
Budget: free to $349+/month

Choose BackstopJS or Playwright screenshots if:

You cannot use cloud services (compliance, data residency)
You have tight budget constraints
You're willing to invest setup and maintenance time
You want maximum control over the tooling

Running Multiple Tools

It's not unusual for teams to use both Chromatic (component level) and Percy (page level). This gives you:

Chromatic catches regressions in individual components early (fast feedback, targeted)
Percy catches regressions in how components compose at the page level (broader coverage)

The cost is additive, but the coverage is more comprehensive than either tool alone.

Wrapping Up

Visual regression testing catches a class of bugs that functional tests miss: layout shifts, color changes, font regressions, missing images. Picking the right tool depends on where in the stack you need coverage (component vs page), your budget, and whether cloud services are acceptable.

Start with what integrates least disruptively into your current workflow. For most teams:

Storybook users → Chromatic
Playwright users → Percy or Playwright's built-in screenshots
Cypress users → Percy
Enterprise with complex cross-browser needs → Applitools

Screenshot Comparison Tools Compared: Percy vs Applitools vs Chromatic vs BackstopJS

HelpMeTest

Quick Comparison

Percy (BrowserStack)

How it works

Pricing

Strengths

Weaknesses

Best for

Applitools Eyes

How it works

Pricing

Strengths

Weaknesses

Best for

Chromatic

How it works

Pricing

Strengths

Weaknesses

Best for

BackstopJS

How it works

Pricing

Strengths

Weaknesses

Self-hosted alternative: Playwright's built-in screenshot comparison

How to Choose

Running Multiple Tools

Wrapping Up

Read more

Synthetic Test Data Generation with LLMs: Edge Cases at Scale

Testing Atlantis Terraform PR Automation: Workflows, Plan Verification, and Policy Enforcement

Prompt Regression Testing and Version Control for LLM Applications

Mixpanel Testing Strategies: Events, People Properties, and Funnel Data Isolation