Flutter Golden File Tests: Screenshot Comparison Testing
Flutter golden tests capture a screenshot of a widget and compare it pixel-by-pixel against a reference image (the "golden file"). If the screenshot changes, the test fails. This catches visual regressions — layout shifts, color changes, font differences — that behavioral tests miss entirely. This guide covers writing golden tests, generating reference images, and handling the CI challenges that come with pixel-perfect comparison.
Key Takeaways
expectLater(find.byType(MyWidget), matchesGoldenFile('my_widget.png')) is the core assertion. The path is relative to the test file. On first run (or with --update-goldens), the file is created. On subsequent runs, it's compared.
Golden files must be committed to your repository. They're the reference images. Keep them in a goldens/ subfolder next to your test files.
Platform rendering differs. A golden generated on macOS will fail on Linux CI (different font rendering). Use --platform=linux or generate goldens on the same OS as your CI.
golden_toolkit makes multi-device and multi-theme testing easy. Test the same widget across iPhone 14, Galaxy S22, and iPad Mini in one call.
Increase tolerance for CI flakiness. Use comparator: GoldenFileComparator(failurePercent: 0.01) if pixel-perfect comparison is too strict for your workflow.
What Golden Tests Catch
Golden tests catch visual regressions that no behavioral test can:
- A CSS-like change shifted all text 2px left
- A refactor changed a container's background color
- An icon was replaced with a similar but different one
- Responsive layout broke at a specific width
- Dark mode colors are slightly off
Behavioral tests only verify that "the button exists" or "clicking it calls the right function." Golden tests verify "the button looks like this."
Basic Golden Test
// test/widgets/user_card_golden_test.dart
import 'package:flutter/material.dart';
import 'package:flutter_test/flutter_test.dart';
import 'package:my_app/widgets/user_card.dart';
import 'package:my_app/models/user.dart';
void main() {
testWidgets('UserCard matches golden file', (tester) async {
await tester.pumpWidget(
MaterialApp(
theme: ThemeData.light(),
home: Scaffold(
body: UserCard(
user: User(
id: '1',
name: 'Alice Johnson',
email: 'alice@example.com',
role: UserRole.admin,
),
),
),
),
);
await expectLater(
find.byType(UserCard),
matchesGoldenFile('goldens/user_card_light.png'),
);
});
}On first run, Flutter creates the golden file. On subsequent runs, it compares. If the widget looks different, the test fails with a diff image.
Generating Golden Files
# Generate (or update) golden files
flutter <span class="hljs-built_in">test --update-goldens <span class="hljs-built_in">test/widgets/user_card_golden_test.dart
<span class="hljs-comment"># Run golden tests (compare against existing goldens)
flutter <span class="hljs-built_in">test <span class="hljs-built_in">test/widgets/user_card_golden_test.dartCommit the generated .png files to your repository. They are the reference images for future comparison.
Testing Multiple Variants
testWidgets('UserCard dark theme golden', (tester) async {
await tester.pumpWidget(
MaterialApp(
theme: ThemeData.dark(),
home: Scaffold(
body: UserCard(user: aliceUser),
),
),
);
await expectLater(
find.byType(UserCard),
matchesGoldenFile('goldens/user_card_dark.png'),
);
});
testWidgets('UserCard admin badge golden', (tester) async {
await tester.pumpWidget(
MaterialApp(
home: Scaffold(
body: UserCard(user: adminUser), // has admin badge
),
),
);
await expectLater(
find.byType(UserCard),
matchesGoldenFile('goldens/user_card_admin.png'),
);
});golden_toolkit for Multi-Device Testing
golden_toolkit makes it easy to test across multiple screen sizes:
# pubspec.yaml
dev_dependencies:
golden_toolkit: ^0.15.0import 'package:flutter_test/flutter_test.dart';
import 'package:golden_toolkit/golden_toolkit.dart';
void main() {
testGoldens('UserCard on multiple devices', (tester) async {
await multiScreenGolden(
tester,
'user_card_devices',
widget: MaterialApp(
home: Scaffold(
body: UserCard(user: aliceUser),
),
),
devices: [
Device.phone,
Device.iphone11,
Device.tabletLandscape,
],
);
});
}This creates one golden file per device size, named user_card_devices.phone.png, user_card_devices.iphone11.png, etc.
Pump Options
golden_toolkit also provides pumpWidgetBuilder with custom fonts:
void main() {
// Load fonts for golden tests
setUpAll(() async {
await loadAppFonts();
});
testGoldens('UserCard with custom fonts', (tester) async {
await tester.pumpWidgetBuilder(
UserCard(user: aliceUser),
surfaceSize: const Size(400, 200),
);
await screenMatchesGolden(tester, 'user_card_with_fonts');
});
}Handling Platform Differences
Golden files are platform-specific. Fonts render differently on macOS vs Linux vs Windows. A golden generated on macOS will fail on Ubuntu CI.
Solution 1: Generate goldens on CI
Use a matrix strategy to generate goldens only on the CI platform (Linux):
# .github/workflows/golden-tests.yml
- name: Update golden files (Linux only)
if: github.event_name == 'workflow_dispatch'
run: flutter test --update-goldens test/
- name: Run golden tests
run: flutter test test/Generate goldens locally only on Linux, or use Docker to match your CI environment.
Solution 2: Use a custom comparator with tolerance
// test/flutter_test_config.dart (applies to all tests in the directory)
import 'dart:async';
import 'package:flutter_test/flutter_test.dart';
Future<void> testExecutable(FutureOr<void> Function() testMain) async {
// Allow up to 0.5% pixel difference (reduces platform-specific failures)
goldenFileComparator = TolerantGoldenFileComparator(
Uri.file('test/'),
failurePercent: 0.005,
);
await testMain();
}A custom comparator that accepts small pixel differences handles minor font rendering variations between platforms.
Golden Diffs on Failure
When a golden test fails, Flutter generates diff images showing what changed:
Golden test failure: user_card_light.png
Pixel count: 12,345
Mismatch: 234 pixels (1.9%)
Output files:
- test/goldens/failures/user_card_light_isolatedDiff.png
- test/goldens/failures/user_card_light_maskedDiff.png
- test/goldens/failures/user_card_light_image.pngThe isolatedDiff.png shows only changed pixels. The maskedDiff.png overlays changes on the original. Use these to understand what changed.
CI Setup
# .github/workflows/flutter-tests.yml
- name: Run flutter tests with goldens
run: flutter test test/
- name: Upload golden diffs on failure
if: failure()
uses: actions/upload-artifact@v3
with:
name: golden-failures
path: test/goldens/failures/Upload the failure artifacts so you can inspect what changed without pulling the branch locally.
When NOT to Use Golden Tests
Golden tests are high-maintenance:
- Every intentional UI change requires updating goldens
- CI platform differences cause false failures
- They're slower than behavioral tests
Use golden tests for:
- Components with complex visual states (charts, custom painters)
- Brand-critical UI (login screens, onboarding)
- Components where small visual changes matter (color palettes, icon sets)
Avoid golden tests for:
- Every single widget
- Components that change frequently
- Functional behavior that behavioral tests can verify
Combining Test Types
A good Flutter test strategy:
- Unit tests — business logic (fast, no flakiness)
- Widget tests — behavioral widget testing (fast)
- Golden tests — critical visual components (medium, needs maintenance)
- Integration tests — user flows on device (slow, run in CI only)
- Production monitoring — HelpMeTest for live API/backend testing
Golden tests fill the visual gap between widget tests and manual review. Use them selectively for the components where appearance matters most.