Puppeteer JavaScript: Complete Browser Automation Guide
Puppeteer is Google's official Node.js library for automating headless Chrome. It's widely used for screenshots, PDFs, web scraping, and end-to-end testing. This guide covers installation, core APIs, common patterns, and where Puppeteer fits in the modern browser automation landscape.
Key Takeaways
Puppeteer drives headless Chrome natively. It uses Chrome DevTools Protocol (CDP) directly, giving you lower-level browser control than Selenium. The tradeoff: it only supports Chrome and Chromium (not Firefox or Safari).
waitForSelector is your primary waiting tool. After navigation, always wait for a specific element before interacting. Don't rely on navigation events alone — JavaScript-heavy SPAs keep loading after load fires.
page.evaluate() runs JavaScript in the browser context. This is the bridge between your Node.js code and the page. Use it to extract data, manipulate DOM elements, or call page-level functions.
Intercept requests to speed up scraping. Blocking images, fonts, and CSS requests can cut page load time in half when you only need content.
Playwright has superseded Puppeteer for new projects. Playwright is maintained by Microsoft (ex-Puppeteer team), supports multiple browsers, and has a cleaner API. But Puppeteer remains widely used and maintained by Google Chrome.
What Is Puppeteer?
Puppeteer is an official Google Node.js library that provides a high-level API to control Chrome or Chromium browsers over the DevTools Protocol. It can run in headless mode (no visible window) or headed mode.
What you can do with Puppeteer:
- Generate screenshots and PDFs of any web page
- Crawl and scrape JavaScript-rendered content
- Automate form filling, clicking, and navigation
- Run end-to-end tests for web applications
- Capture performance metrics and network traffic
- Test Chrome extensions
Puppeteer was created by the Chrome DevTools team at Google in 2017. The same team later moved to Microsoft and created Playwright, which is now the more feature-complete option. Puppeteer remains officially maintained and widely used.
Installation
npm install puppeteer
This installs Puppeteer and automatically downloads a compatible version of Chromium (~200MB).
If you already have Chrome installed and want to use it:
npm install puppeteer-core # No browser download
Then specify the executable path:
const browser = await puppeteer.launch({
executablePath: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
});
Quick Start
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://example.com");
const title = await page.title();
console.log("Title:", title);
await page.screenshot({ path: "example.png" });
await browser.close();
})();
Run it:
node index.js
Core Browser API
Launching a Browser
const puppeteer = require("puppeteer");
// Headless mode (default)
const browser = await puppeteer.launch();
// Headed mode (visible window, useful for debugging)
const browser = await puppeteer.launch({ headless: false });
// Custom launch options
const browser = await puppeteer.launch({
headless: "new", // Chrome's new headless mode (recommended)
slowMo: 100, // Slow down operations by 100ms for debugging
devtools: true, // Open DevTools automatically
args: [
"--no-sandbox",
"--disable-dev-shm-usage", // Required in Docker
"--window-size=1920,1080",
],
});
Working with Pages
// Open a new tab
const page = await browser.newPage();
// Get all open pages
const pages = await browser.pages();
// Close a specific page
await page.close();
// Close the browser (and all pages)
await browser.close();
Navigation
// Navigate to URL
await page.goto("https://example.com");
// Navigate with options
await page.goto("https://example.com", {
waitUntil: "networkidle0", // Wait until no network requests for 500ms
timeout: 30000, // Timeout in ms (default 30000)
});
// waitUntil options:
// - "load" — wait for the load event (default)
// - "domcontentloaded" — wait for DOMContentLoaded
// - "networkidle0" — wait until no requests for 500ms
// - "networkidle2" — wait until max 2 requests for 500ms
// Go back/forward
await page.goBack();
await page.goForward();
// Reload the page
await page.reload();
// Get current URL
const url = page.url();
Taking Screenshots
// Full page screenshot
await page.screenshot({
path: "full-page.png",
fullPage: true,
});
// Viewport screenshot (visible area only)
await page.screenshot({ path: "viewport.png" });
// Clip to specific coordinates
await page.screenshot({
path: "clipped.png",
clip: { x: 0, y: 0, width: 800, height: 600 },
});
// JPEG format (smaller file size)
await page.screenshot({
path: "image.jpg",
type: "jpeg",
quality: 90,
});
// Get screenshot as Buffer (no file save)
const buffer = await page.screenshot({ encoding: "binary" });
Screenshot a Specific Element
const element = await page.$(".product-card");
await element.screenshot({ path: "product.png" });
Generating PDFs
// Generate PDF with A4 format
await page.pdf({
path: "document.pdf",
format: "A4",
printBackground: true,
margin: {
top: "20mm",
right: "15mm",
bottom: "20mm",
left: "15mm",
},
});
// Custom page size (width x height in inches or mm)
await page.pdf({
path: "custom.pdf",
width: "8.5in",
height: "11in",
printBackground: true,
});
Finding Elements
Puppeteer uses CSS selectors for element lookup:
// Single element (returns null if not found)
const element = await page.$(".product-name");
// All matching elements
const elements = await page.$$(".product-card");
// Wait for an element to appear, then return it
const element = await page.waitForSelector(".loading-done");
// Element exists check
const exists = (await page.$(".optional-element")) !== null;
Waiting for Content
Waiting correctly is the key to reliable Puppeteer automation:
// Wait for a selector to appear
await page.waitForSelector(".product-list");
// Wait for an element to disappear
await page.waitForSelector(".loading-spinner", { hidden: true });
// Wait for a function to return truthy
await page.waitForFunction(() => document.title !== "Loading...");
// Wait for navigation (after a click that triggers navigation)
await Promise.all([
page.waitForNavigation({ waitUntil: "networkidle0" }),
page.click("a.next-page"),
]);
// Wait for a specific request to complete
await page.waitForResponse(
(response) => response.url().includes("/api/products") && response.status() === 200
);
// Wait with custom timeout
await page.waitForSelector(".slow-element", { timeout: 60000 });
Interacting with Elements
Clicking
// Click a CSS selector
await page.click(".submit-btn");
// Click with options
await page.click(".submit-btn", {
button: "right", // "left", "right", "middle"
clickCount: 2, // Double-click
delay: 100, // Delay between mousedown and mouseup
});
// Click using an element handle
const btn = await page.$(".submit-btn");
await btn.click();
Typing
// Type into a field (appends to existing content)
await page.type('input[name="email"]', "user@example.com");
// Type with delay (simulates human typing)
await page.type('input[name="email"]', "user@example.com", { delay: 50 });
// Clear and retype
await page.click('input[name="email"]', { clickCount: 3 }); // Select all
await page.type('input[name="email"]', "new@example.com");
// Focus and use keyboard
await page.focus('input[name="search"]');
await page.keyboard.type("search query");
await page.keyboard.press("Enter");
Keyboard Events
const { Key } = require("puppeteer");
await page.keyboard.press("Enter");
await page.keyboard.press("Escape");
await page.keyboard.press("ArrowDown");
// Key combination
await page.keyboard.down("Control");
await page.keyboard.press("A"); // Ctrl+A (select all)
await page.keyboard.up("Control");
// Type special characters
await page.keyboard.type("Hello\tWorld"); // Tab between words
Select Dropdowns
// Select by value
await page.select('select[name="country"]', "US");
// Select multiple options (for multi-select)
await page.select('select[name="languages"]', "en", "fr", "de");
// Select by visible text (using evaluate)
await page.evaluate(() => {
const select = document.querySelector('select[name="country"]');
const option = Array.from(select.options).find(opt => opt.text === "United States");
if (option) {
select.value = option.value;
select.dispatchEvent(new Event("change"));
}
});
File Upload
const fileInput = await page.$('input[type="file"]');
await fileInput.uploadFile("/path/to/file.pdf");
Extracting Data with page.evaluate()
page.evaluate() runs JavaScript in the browser context. This is how you extract data from the page:
// Extract text content
const title = await page.evaluate(() => document.title);
const heading = await page.evaluate(() =>
document.querySelector("h1")?.innerText
);
// Extract multiple values
const products = await page.evaluate(() => {
return Array.from(document.querySelectorAll(".product-card")).map((card) => ({
name: card.querySelector(".name")?.innerText || "",
price: card.querySelector(".price")?.innerText || "",
rating: card.querySelector(".rating")?.dataset.score || "",
}));
});
// Pass arguments into evaluate
const selector = ".product-name";
const names = await page.evaluate((sel) => {
return Array.from(document.querySelectorAll(sel)).map((el) => el.innerText);
}, selector);
// Return DOM elements (as JSHandle, not plain JS value)
const inputHandle = await page.evaluateHandle(() =>
document.querySelector("input")
);
Network Request Interception
Block unnecessary resources to speed up scraping:
await page.setRequestInterception(true);
page.on("request", (request) => {
const resourceType = request.resourceType();
// Block images, fonts, and stylesheets
if (["image", "font", "stylesheet"].includes(resourceType)) {
request.abort();
} else {
request.continue();
}
});
await page.goto("https://example.com");
Intercepting and Modifying Requests
await page.setRequestInterception(true);
page.on("request", (request) => {
// Add custom headers
const headers = {
...request.headers(),
Authorization: "Bearer mytoken",
};
request.continue({ headers });
});
Capturing Responses
// Log all API responses
page.on("response", async (response) => {
if (response.url().includes("/api/")) {
const data = await response.json();
console.log("API response:", data);
}
});
Working with Cookies and Sessions
// Get all cookies
const cookies = await page.cookies();
// Set a cookie
await page.setCookie({
name: "session",
value: "abc123",
domain: "example.com",
path: "/",
httpOnly: true,
});
// Delete a cookie
await page.deleteCookie({ name: "session" });
// Save cookies to file (reuse across runs)
const fs = require("fs");
const cookies = await page.cookies();
fs.writeFileSync("cookies.json", JSON.stringify(cookies, null, 2));
// Load saved cookies
const savedCookies = JSON.parse(fs.readFileSync("cookies.json", "utf8"));
await page.setCookie(...savedCookies);
Complete Scraping Example
const puppeteer = require("puppeteer");
const fs = require("fs");
async function scrapeProducts(url) {
const browser = await puppeteer.launch({
headless: "new",
args: ["--no-sandbox", "--disable-dev-shm-usage"],
});
const page = await browser.newPage();
// Block unnecessary resources
await page.setRequestInterception(true);
page.on("request", (req) => {
if (["image", "font", "stylesheet"].includes(req.resourceType())) {
req.abort();
} else {
req.continue();
}
});
await page.setViewport({ width: 1280, height: 720 });
const allProducts = [];
let pageNum = 1;
while (true) {
const pageUrl = `${url}?page=${pageNum}`;
console.log(`Scraping page ${pageNum}: ${pageUrl}`);
try {
await page.goto(pageUrl, { waitUntil: "networkidle0", timeout: 30000 });
// Wait for products to load
const hasProducts = await page
.waitForSelector(".product-card", { timeout: 5000 })
.then(() => true)
.catch(() => false);
if (!hasProducts) {
console.log("No more products found");
break;
}
const products = await page.evaluate(() => {
return Array.from(document.querySelectorAll(".product-card")).map(
(card) => ({
name: card.querySelector(".product-name")?.innerText.trim() || "",
price: card.querySelector(".product-price")?.innerText.trim() || "",
url: card.querySelector("a")?.href || "",
})
);
});
if (products.length === 0) break;
allProducts.push(...products);
pageNum++;
// Polite delay
await new Promise((r) => setTimeout(r, 1000 + Math.random() * 2000));
} catch (error) {
console.error(`Error on page ${pageNum}:`, error.message);
break;
}
}
await browser.close();
fs.writeFileSync("products.json", JSON.stringify(allProducts, null, 2));
console.log(`Saved ${allProducts.length} products`);
return allProducts;
}
scrapeProducts("https://example-shop.com/products");
Using Puppeteer for Testing
Puppeteer can run end-to-end tests, but most teams use Jest + Puppeteer or dedicated testing libraries:
const puppeteer = require("puppeteer");
describe("Login Flow", () => {
let browser;
let page;
beforeAll(async () => {
browser = await puppeteer.launch({ headless: "new" });
});
afterAll(async () => {
await browser.close();
});
beforeEach(async () => {
page = await browser.newPage();
});
afterEach(async () => {
await page.close();
});
test("user can log in with valid credentials", async () => {
await page.goto("https://app.example.com/login");
await page.type('input[name="email"]', "user@example.com");
await page.type('input[name="password"]', "password123");
await Promise.all([
page.waitForNavigation(),
page.click('button[type="submit"]'),
]);
expect(page.url()).toContain("/dashboard");
const welcomeText = await page.$eval(".welcome-message", (el) => el.innerText);
expect(welcomeText).toContain("Welcome");
});
});
Jest Puppeteer preset simplifies setup:
npm install jest-puppeteer puppeteer
// jest.config.js
{
"preset": "jest-puppeteer"
}
Puppeteer vs Playwright
Both libraries automate headless browsers. Here's how they compare:
| Puppeteer | Playwright | |
|---|---|---|
| Maintainer | Google Chrome team | Microsoft |
| Browser support | Chrome / Chromium only | Chrome, Firefox, WebKit |
| Auto-waiting | Manual waits required | Built-in auto-waiting |
| API design | Callback/promise-heavy | Clean async/await |
| Network interception | Yes (CDP-based) | Yes (cleaner API) |
| Parallelism | Manual setup | Built-in browser contexts |
| Test framework | Use with Jest | Built-in @playwright/test |
| Screenshots | Yes | Yes |
| Tracing | Basic | Full trace viewer |
| Python/Java/C# support | ❌ No | ✅ Yes |
| Documentation | Good | Excellent |
| NPM downloads | ~4M/week | ~8M/week |
When to use Puppeteer:
- Chrome-only automation is fine for your use case
- You're building a tool on top of Chrome DevTools Protocol
- You're maintaining existing Puppeteer code
- You need specific Chromium APIs not in Playwright
When to use Playwright:
- New projects (cleaner API, better maintained)
- You need Firefox or WebKit support
- You want built-in test runner with parallel execution
- Your team uses multiple programming languages
Running in CI/CD
GitHub Actions
# .github/workflows/scrape.yml
name: Scrape Products
on:
schedule:
- cron: "0 9 * * *" # Daily at 9 AM
workflow_dispatch:
jobs:
scrape:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
- name: Install dependencies
run: npm ci
- name: Run scraper
run: node scrape.js
env:
# Puppeteer flags for CI
PUPPETEER_EXECUTABLE_PATH: /usr/bin/google-chrome-stable
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: scrape-results
path: products.json
Docker
FROM node:20-slim
# Install Chrome
RUN apt-get update && apt-get install -y \
google-chrome-stable \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
# Run with no-sandbox (required in Docker)
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable
CMD ["node", "scrape.js"]
Alternatives to Writing Puppeteer Code
For functional testing — verifying that your web application works correctly — tools like HelpMeTest let you skip the Puppeteer boilerplate and write tests in plain English:
*** Test Cases ***
Product Listing Shows Correct Data
Go To https://example-shop.com/products
Should See at least 10 product cards
Each product should have a name and price
The first product price should be visible
HelpMeTest generates Robot Framework + Playwright tests, runs them in a cloud browser, and self-heals selectors when your UI changes. No Puppeteer setup, no selector maintenance.
Puppeteer is the right choice when:
- You need detailed browser control (CDP access, custom protocols)
- You're building infrastructure (screenshot services, PDF generators)
- You're scraping at scale with custom logic
AI testing tools are better when:
- You're testing your own application's functionality
- Your UI changes regularly (self-healing beats manual selector updates)
- Non-developers need to write or understand tests
Conclusion
Puppeteer remains a strong, well-maintained tool for Chrome automation. It gives you low-level browser control, a clean JavaScript API, and direct CDP access when you need it.
For new browser automation projects, evaluate Playwright as well — it has superseded Puppeteer in many teams because of its multi-browser support, cleaner API, and built-in test runner.
Try HelpMeTest free — for web application testing without writing selector code.