Testing PDF Generation with Puppeteer: Visual Diffs, Text Extraction, and Layout Assertions
Puppeteer's page.pdf() generates PDFs from HTML pages — but verifying the output requires more than file existence checks. This guide covers testing PDF content via text extraction, visual regression diffs, layout assertions, and metadata checks using Node.js, Puppeteer, and the pdf-parse library.
Key Takeaways
Use pdf-parse to extract text for content assertions. Puppeteer produces binary PDF files — pdf-parse extracts the text layer so you can assert on content without visual comparison.
Visual snapshot tests catch layout regressions. Render the PDF as a PNG using pdf2pic or pdftoppm, then compare against a stored baseline. These catch font changes, spacing regressions, and element overflow.
Assert on PDF metadata separately from content. Page count, author, title, and creator are metadata — verify them independently of content assertions.
Use a headless Chromium in CI with a fixed version. Puppeteer's bundled Chromium version affects PDF rendering. Pin the Puppeteer version and use a fixed Chromium executable path in CI to get deterministic output.
Test the HTML source, not just the PDF. Most PDF rendering bugs originate in CSS or HTML. Print-specific CSS (@media print, page-break-*) is the most common source of failures — test it directly before asserting on PDF output.
Why Puppeteer PDF Tests Are Hard
Puppeteer generates PDFs by printing a webpage with headless Chromium. The output is a binary blob. Asserting that "the invoice shows the correct total" requires either:
- Extracting text from the PDF and matching against expected strings
- Rendering the PDF as images and comparing pixels against a baseline
- Parsing PDF structure using a library like
pdf-liborpdfjs-dist
Each approach has different trade-offs. This guide covers all three.
Setup
npm install puppeteer pdf-parse
npm install --save-dev jest pdf2picA minimal PDF generation function:
// src/pdf/generator.js
import puppeteer from 'puppeteer';
export async function generateInvoicePdf(invoiceData) {
const browser = await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
const page = await browser.newPage();
const html = renderInvoiceHtml(invoiceData);
await page.setContent(html, { waitUntil: 'networkidle0' });
const pdfBuffer = await page.pdf({
format: 'A4',
printBackground: true,
margin: { top: '20mm', bottom: '20mm', left: '15mm', right: '15mm' },
});
await browser.close();
return pdfBuffer;
}Text Extraction Tests
Use pdf-parse to extract the text layer and assert on content:
// src/__tests__/invoice-pdf.test.js
import pdfParse from 'pdf-parse';
import { generateInvoicePdf } from '../pdf/generator.js';
const sampleInvoice = {
invoiceNumber: 'INV-2026-001',
date: '2026-05-17',
client: { name: 'Acme Corp', address: '123 Main St, New York, NY 10001' },
lineItems: [
{ description: 'Web Development Services', quantity: 40, rate: 150, total: 6000 },
{ description: 'Design Consultation', quantity: 5, rate: 200, total: 1000 },
],
subtotal: 7000,
tax: 700,
total: 7700,
};
describe('Invoice PDF generation', () => {
let pdfBuffer;
let pdfData;
beforeAll(async () => {
pdfBuffer = await generateInvoicePdf(sampleInvoice);
pdfData = await pdfParse(pdfBuffer);
}, 30000); // PDF generation is slow
it('generates a valid PDF buffer', () => {
expect(pdfBuffer).toBeInstanceOf(Buffer);
expect(pdfBuffer.length).toBeGreaterThan(1000);
// PDF files start with '%PDF'
expect(pdfBuffer.toString('ascii', 0, 4)).toBe('%PDF');
});
it('contains invoice number', () => {
expect(pdfData.text).toContain('INV-2026-001');
});
it('contains client name and address', () => {
expect(pdfData.text).toContain('Acme Corp');
expect(pdfData.text).toContain('123 Main St');
});
it('contains all line items', () => {
expect(pdfData.text).toContain('Web Development Services');
expect(pdfData.text).toContain('Design Consultation');
});
it('shows correct totals', () => {
expect(pdfData.text).toContain('7,000');
expect(pdfData.text).toContain('700');
expect(pdfData.text).toContain('7,700');
});
it('has correct page count', () => {
expect(pdfData.numpages).toBe(1);
});
});Page Count Assertions
For multi-page documents, page count is a critical assertion:
describe('Multi-page reports', () => {
it('long item list generates multiple pages', async () => {
const report = {
...sampleInvoice,
lineItems: Array.from({ length: 50 }, (_, i) => ({
description: `Line item ${i + 1}`,
quantity: 1,
rate: 100,
total: 100,
})),
};
const pdf = await generateInvoicePdf(report);
const data = await pdfParse(pdf);
expect(data.numpages).toBeGreaterThan(1);
});
it('single-item invoice fits on one page', async () => {
const minimal = {
...sampleInvoice,
lineItems: [{ description: 'Single Service', quantity: 1, rate: 500, total: 500 }],
};
const pdf = await generateInvoicePdf(minimal);
const data = await pdfParse(pdf);
expect(data.numpages).toBe(1);
});
});PDF Metadata Tests
Puppeteer's page.pdf() accepts metadata options. Test that metadata is set correctly:
export async function generateInvoicePdf(invoiceData) {
// ... browser/page setup ...
const pdfBuffer = await page.pdf({
format: 'A4',
printBackground: true,
displayHeaderFooter: false,
// Chromium doesn't expose PDF metadata via page.pdf() directly,
// but pdf-lib can be used to set it post-generation
});
// Post-process with pdf-lib to set metadata
const { PDFDocument } = await import('pdf-lib');
const doc = await PDFDocument.load(pdfBuffer);
doc.setTitle(`Invoice ${invoiceData.invoiceNumber}`);
doc.setAuthor('HelpMeTest Billing System');
doc.setCreator('HelpMeTest');
doc.setCreationDate(new Date(invoiceData.date));
return Buffer.from(await doc.save());
}Testing metadata with pdf-parse:
it('has correct PDF metadata', async () => {
const data = await pdfParse(pdfBuffer);
expect(data.info.Title).toBe('Invoice INV-2026-001');
expect(data.info.Author).toBe('HelpMeTest Billing System');
expect(data.info.Creator).toBe('HelpMeTest');
});Visual Snapshot Tests
Text extraction doesn't catch layout problems — elements overlapping, wrong fonts, or broken table borders. Visual snapshots compare pixel-by-pixel:
// src/__tests__/invoice-pdf-visual.test.js
import { fromBuffer } from 'pdf2pic';
import { existsSync, mkdirSync, readFileSync, writeFileSync } from 'fs';
import path from 'path';
const SNAPSHOTS_DIR = path.join(process.cwd(), 'tests/snapshots/pdf');
async function renderPdfPage(pdfBuffer, pageNumber = 1) {
const convert = fromBuffer(pdfBuffer, {
density: 150, // 150 DPI for reasonable file size
format: 'png',
width: 794, // A4 at 96 DPI
height: 1123,
preserveAspectRatio: true,
});
const result = await convert(pageNumber, { responseType: 'buffer' });
return result.buffer;
}
describe('Invoice PDF visual regression', () => {
const SNAPSHOT_PATH = path.join(SNAPSHOTS_DIR, 'invoice-page1.png');
beforeAll(() => {
mkdirSync(SNAPSHOTS_DIR, { recursive: true });
});
it('matches stored visual snapshot', async () => {
const pdf = await generateInvoicePdf(sampleInvoice);
const rendered = await renderPdfPage(pdf, 1);
if (!existsSync(SNAPSHOT_PATH)) {
// First run: save baseline
writeFileSync(SNAPSHOT_PATH, rendered);
console.log('Baseline snapshot created:', SNAPSHOT_PATH);
return;
}
const baseline = readFileSync(SNAPSHOT_PATH);
// Simple pixel comparison — in practice use pixelmatch or jest-image-snapshot
const { default: pixelmatch } = await import('pixelmatch');
const { PNG } = await import('pngjs');
const img1 = PNG.sync.read(baseline);
const img2 = PNG.sync.read(rendered);
expect(img1.width).toBe(img2.width);
expect(img1.height).toBe(img2.height);
const numDiffPixels = pixelmatch(
img1.data, img2.data,
null,
img1.width, img1.height,
{ threshold: 0.1 }
);
// Allow up to 100 pixels of difference (font rendering variation)
expect(numDiffPixels).toBeLessThan(100);
}, 30000);
});Using jest-image-snapshot simplifies this considerably:
import { toMatchImageSnapshot } from 'jest-image-snapshot';
expect.extend({ toMatchImageSnapshot });
it('matches stored visual snapshot', async () => {
const pdf = await generateInvoicePdf(sampleInvoice);
const rendered = await renderPdfPage(pdf, 1);
expect(rendered).toMatchImageSnapshot({
failureThreshold: 0.01,
failureThresholdType: 'percent',
customSnapshotsDir: 'tests/snapshots/pdf',
});
}, 30000);Testing Print CSS Before PDF Generation
Most PDF layout bugs originate in print CSS. Test the HTML rendering in print mode directly in Puppeteer before asserting on PDF output:
describe('Invoice print layout', () => {
let browser;
let page;
beforeAll(async () => {
browser = await puppeteer.launch({ headless: 'new' });
page = await browser.newPage();
await page.setContent(renderInvoiceHtml(sampleInvoice), { waitUntil: 'networkidle0' });
await page.emulateMediaType('print');
});
afterAll(() => browser.close());
it('line items table is visible in print media', async () => {
const tableVisible = await page.$eval('table.line-items', el => {
const style = window.getComputedStyle(el);
return style.display !== 'none';
});
expect(tableVisible).toBe(true);
});
it('total section does not overflow page', async () => {
const totalEl = await page.$('.invoice-total');
const box = await totalEl.boundingBox();
// A4 width at 96 DPI is ~794px
expect(box.x + box.width).toBeLessThanOrEqual(794);
});
it('page breaks don\'t split table rows', async () => {
// Check that no table row has page-break-inside: avoid violated
const badRows = await page.$$eval('table.line-items tr', rows =>
rows.filter(row => {
const style = window.getComputedStyle(row);
return style.pageBreakInside === 'auto';
}).length
);
expect(badRows).toBe(0);
});
});Error Cases
Test what happens when PDF generation fails:
describe('PDF generation error handling', () => {
it('throws on invalid HTML template', async () => {
const invalidData = { ...sampleInvoice, client: null };
await expect(generateInvoicePdf(invalidData))
.rejects.toThrow();
});
it('handles very long client names without overflow', async () => {
const longNameData = {
...sampleInvoice,
client: {
...sampleInvoice.client,
name: 'A'.repeat(200),
},
};
// Should not throw
const pdf = await generateInvoicePdf(longNameData);
expect(pdf).toBeInstanceOf(Buffer);
const data = await pdfParse(pdf);
expect(data.text).toContain('A'.repeat(100));
});
});CI Configuration
PDF generation requires Chromium. In CI, either use Puppeteer's bundled Chromium or install it explicitly:
name: PDF Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20' }
# Install Chromium dependencies
- name: Install Chromium dependencies
run: |
sudo apt-get update
sudo apt-get install -y \
libgbm-dev libnss3-dev libatk-bridge2.0-dev \
libdrm-dev libxkbcommon-dev libxcomposite-dev \
libxdamage-dev libxrandr-dev libgtk-3-dev libasound-dev
- run: npm ci
# Install PDF rendering dependencies for visual tests
- run: sudo apt-get install -y poppler-utils
- run: npm test
env:
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD: falseFor deterministic rendering, pin Puppeteer to a specific version and set PUPPETEER_EXECUTABLE_PATH if using a system Chromium.
What to Test
| Test type | What it catches |
|---|---|
File header (%PDF) |
Generator produces a valid PDF binary |
| Text extraction | Required content is present in the text layer |
| Page count | Layout doesn't overflow/underflow pages unexpectedly |
| Visual snapshot | Layout regressions — spacing, font, table borders |
| Print CSS assertions | CSS bugs before they reach PDF output |
| Metadata | Title, author, creator fields set correctly |
| Error cases | Generator doesn't crash on edge-case data |
For end-to-end testing of PDF download flows — clicking a "Download Invoice" button and verifying the file downloads correctly — HelpMeTest covers the browser interaction layer that Puppeteer unit tests can't easily replicate.