Puppeteer Python: How to Use Pyppeteer for Browser Automation

Puppeteer Python: How to Use Pyppeteer for Browser Automation

Puppeteer is a Node.js library, but Python developers can use Pyppeteer — an unofficial Python port. This guide covers installation, common automation tasks, and the gotchas you'll hit with Pyppeteer. If you need production-grade browser automation in Python, Playwright is the better-maintained choice.

Key Takeaways

Pyppeteer is Puppeteer ported to Python. It's not officially maintained by the Puppeteer team — it's a community port. Expect occasional version drift and API quirks compared to the Node.js original.

Playwright is the better Python choice in 2026. Microsoft actively maintains Playwright's Python SDK. It has the same API as its Node.js version, better async support, and first-class Python documentation.

Both Pyppeteer and Playwright use Chromium. The browser automation capabilities are similar — screenshots, scraping, form filling, JavaScript execution, intercepting requests.

Async is mandatory. Both Pyppeteer and Playwright's Python APIs are async-first. You need asyncio and async/await. There's no blocking API to fall back on (Playwright has a sync version, Pyppeteer doesn't).

For test automation, use a dedicated tool. Browser automation code written for scraping tends to be fragile. For testing web applications, tools like HelpMeTest generate and maintain tests automatically — no selector code to maintain.

What Is Puppeteer Python (Pyppeteer)?

Puppeteer is Google's Node.js library for controlling headless Chrome. Pyppeteer is a Python port that mirrors the Puppeteer API, letting Python developers use Chrome automation without switching to Node.js.

Pyppeteer gives you:

  • Headless or headed Chrome/Chromium control
  • Screenshot and PDF generation
  • Web scraping with JavaScript support
  • Form submission and click automation
  • Network request interception
  • JavaScript execution in the page context

Why Python developers look for "Puppeteer Python": Most Puppeteer tutorials are in Node.js. Python developers often search for a Python equivalent, and Pyppeteer is the direct answer — though Playwright has become the more popular choice.

Installing Pyppeteer

pip install pyppeteer

Pyppeteer downloads Chromium automatically on first run:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto("https://example.com")
    print(await page.title())
    await browser.close()

asyncio.run(main())

On first run, this downloads a compatible Chromium binary (~150MB). Subsequent runs use the cached version.

Install with headless=False for debugging:

browser = await launch(headless=False)  # Opens a visible browser window

Common Automation Tasks

Taking Screenshots

import asyncio
from pyppeteer import launch

async def screenshot():
    browser = await launch(headless=True)
    page = await browser.newPage()

    # Set viewport size
    await page.setViewport({"width": 1280, "height": 720})

    await page.goto("https://example.com")

    # Full page screenshot
    await page.screenshot({"path": "full-page.png", "fullPage": True})

    # Clip to specific area
    await page.screenshot({
        "path": "clipped.png",
        "clip": {"x": 0, "y": 0, "width": 800, "height": 400}
    })

    await browser.close()

asyncio.run(screenshot())

Scraping Dynamic Content

Pyppeteer loads the full page including JavaScript, so it can scrape content that BeautifulSoup can't see:

import asyncio
from pyppeteer import launch

async def scrape_spa():
    browser = await launch(headless=True)
    page = await browser.newPage()

    await page.goto("https://example-spa.com/products", {
        "waitUntil": "networkidle0"  # Wait for no network activity
    })

    # Wait for React/Vue/Angular to render
    await page.waitForSelector(".product-card")

    # Extract data via JavaScript
    products = await page.evaluate("""
        () => {
            const cards = document.querySelectorAll('.product-card');
            return Array.from(cards).map(card => ({
                name: card.querySelector('.name')?.innerText || '',
                price: card.querySelector('.price')?.innerText || '',
            }));
        }
    """)

    print(products)
    await browser.close()

asyncio.run(scrape_spa())

Filling Forms and Submitting

import asyncio
from pyppeteer import launch

async def fill_form():
    browser = await launch(headless=True)
    page = await browser.newPage()

    await page.goto("https://example.com/login")

    # Type into form fields
    await page.type('input[name="email"]', "user@example.com")
    await page.type('input[name="password"]', "secretpassword")

    # Click submit button
    await page.click('button[type="submit"]')

    # Wait for navigation after login
    await page.waitForNavigation()

    print(f"After login, URL is: {page.url}")
    await browser.close()

asyncio.run(fill_form())

Clicking and Navigating

import asyncio
from pyppeteer import launch

async def navigate():
    browser = await launch(headless=True)
    page = await browser.newPage()

    await page.goto("https://example.com")

    # Click a link
    await page.click("a.read-more")
    await page.waitForNavigation()

    # Click by text content (using evaluate)
    await page.evaluate("""
        () => {
            const links = Array.from(document.querySelectorAll('a'));
            const target = links.find(a => a.textContent.includes('Contact'));
            if (target) target.click();
        }
    """)

    await page.waitForNavigation()
    print(page.url)

    await browser.close()

asyncio.run(navigate())

Waiting for Elements

Pyppeteer's waiting API is similar to Puppeteer's:

import asyncio
from pyppeteer import launch

async def wait_examples():
    browser = await launch(headless=True)
    page = await browser.newPage()

    await page.goto("https://example.com")

    # Wait for a CSS selector to appear
    await page.waitForSelector(".loading-spinner", {"hidden": True})  # Wait until hidden

    # Wait for element to be visible
    element = await page.waitForSelector(".content", {"visible": True})

    # Wait for navigation
    await page.click("a.next-page")
    await page.waitForNavigation({"waitUntil": "networkidle0"})

    # Wait for a function to return true
    await page.waitForFunction("() => document.title !== 'Loading...'")

    # Custom timeout (default is 30 seconds)
    await page.waitForSelector(".slow-element", {"timeout": 60000})

    await browser.close()

asyncio.run(wait_examples())

Intercepting Network Requests

import asyncio
from pyppeteer import launch

async def intercept_requests():
    browser = await launch(headless=True)
    page = await browser.newPage()

    # Enable request interception
    await page.setRequestInterception(True)

    async def handle_request(request):
        # Block image and font requests to speed up scraping
        if request.resourceType in ["image", "font", "stylesheet"]:
            await request.abort()
        else:
            await request.continue_()

    page.on("request", lambda req: asyncio.ensure_future(handle_request(req)))

    await page.goto("https://example.com")
    content = await page.content()

    print(f"Page loaded without images, HTML length: {len(content)}")
    await browser.close()

asyncio.run(intercept_requests())

Generating PDFs

import asyncio
from pyppeteer import launch

async def generate_pdf():
    browser = await launch(headless=True)
    page = await browser.newPage()

    await page.goto("https://example.com/report", {"waitUntil": "networkidle0"})

    await page.pdf({
        "path": "report.pdf",
        "format": "A4",
        "printBackground": True,
        "margin": {
            "top": "20mm",
            "right": "15mm",
            "bottom": "20mm",
            "left": "15mm"
        }
    })

    await browser.close()
    print("PDF saved")

asyncio.run(generate_pdf())

Handling Multiple Pages / Tabs

import asyncio
from pyppeteer import launch

async def multiple_pages():
    browser = await launch(headless=True)

    # Open multiple pages in parallel
    pages = await asyncio.gather(
        browser.newPage(),
        browser.newPage(),
        browser.newPage()
    )

    # Navigate all three at once
    await asyncio.gather(
        pages[0].goto("https://example.com/page1"),
        pages[1].goto("https://example.com/page2"),
        pages[2].goto("https://example.com/page3"),
    )

    # Extract titles from all pages
    titles = await asyncio.gather(*[p.title() for p in pages])
    print(titles)

    await browser.close()

asyncio.run(multiple_pages())

Pyppeteer vs Playwright Python: Which to Use?

In 2026, Playwright is the better choice for most new Python browser automation projects. Here's why:

Pyppeteer Playwright (Python)
Maintainer Community port Microsoft (official)
Python support Primary use case First-class SDK
Documentation Sparse, follow Node.js docs Full Python docs
Async API Async only Both sync and async
Browser support Chromium only Chrome, Firefox, WebKit
Version alignment Lags behind Puppeteer Always current
Test framework integration Manual Built-in pytest-playwright
Element waiting Manual waits Auto-waiting
Community Smaller Larger, growing

When Pyppeteer still makes sense:

  • You're maintaining existing Pyppeteer code
  • You're following a Puppeteer Node.js tutorial and want Python parity
  • Your team is more familiar with the Pyppeteer API

When to use Playwright instead:

  • Starting a new project
  • You need Firefox or WebKit support
  • You want official documentation and support
  • You're building test automation (use pytest-playwright)

The Same Example in Playwright Python

Here's the scraping example from above, rewritten in Playwright:

from playwright.sync_api import sync_playwright

def scrape_products():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        page.goto("https://example-spa.com/products")
        page.wait_for_selector(".product-card")

        products = page.evaluate("""
            () => {
                const cards = document.querySelectorAll('.product-card');
                return Array.from(cards).map(card => ({
                    name: card.querySelector('.name')?.innerText || '',
                    price: card.querySelector('.price')?.innerText || '',
                }));
            }
        """)

        print(products)
        browser.close()

scrape_products()

Playwright's sync API (no asyncio needed) is often cleaner for simple scripts.

Common Pyppeteer Issues and Fixes

Issue: Chromium Download Fails

pyppeteer.errors.BrowserError: Browser closed unexpectedly

Fix — manually specify a Chromium path or set the revision:

import os
os.environ["PYPPETEER_CHROMIUM_REVISION"] = "1263111"

# Or use an existing Chrome installation
browser = await launch(executablePath="/usr/bin/google-chrome")

Issue: Timeout Errors

pyppeteer.errors.TimeoutError: Navigation timeout of 30000 ms exceeded

Fix — increase the default timeout:

# Increase navigation timeout
await page.goto(url, {"timeout": 60000})

# Increase default timeout for all operations
page.setDefaultNavigationTimeout(60000)
page.setDefaultTimeout(30000)

Issue: Running in Docker / CI

Headless Chrome in Docker needs sandbox flags disabled:

browser = await launch(
    headless=True,
    args=[
        "--no-sandbox",
        "--disable-setuid-sandbox",
        "--disable-dev-shm-usage",
        "--disable-gpu",
    ]
)

Issue: Page Crashes on Complex Sites

pyppeteer.errors.NetworkError: Protocol error (Target.activateTarget)

This often happens on memory-intensive pages. Increase the shared memory size in Docker:

docker run --shm-size=2gb your-image

Or disable the crash-prone /dev/shm usage:

args=["--disable-dev-shm-usage"]

Using Pyppeteer for Testing

Browser automation code can be used for testing — loading pages, filling forms, and asserting content. But scraping-style code tends to be fragile for tests:

The problem with raw Pyppeteer tests:

  • Selectors break when UI changes
  • Timing issues cause flaky failures
  • No test reporting or history
  • No CI integration out of the box
  • No self-healing when the app updates

Example fragile test:

async def test_login():
    browser = await launch(headless=True)
    page = await browser.newPage()
    await page.goto("https://app.example.com/login")
    await page.type("#email", "test@example.com")  # Breaks if ID changes
    await page.type("#password", "password123")
    await page.click(".login-btn")               # Breaks if class changes
    await page.waitForNavigation()
    assert "/dashboard" in page.url
    await browser.close()

The AI alternative: HelpMeTest lets you write the same test in plain English:

*** Test Cases ***
User Can Log In
    Go To       https://app.example.com/login
    Fill In     Email     test@example.com
    Fill In     Password  password123
    Click       Log In
    Should Be On    /dashboard

HelpMeTest:

  • Runs in a cloud browser (no local Chromium setup)
  • Self-heals selectors when the UI changes
  • Provides test history and failure recordings
  • Integrates with CI/CD automatically
  • No asyncio, no selector management

Conclusion

Pyppeteer is a workable Python port of Puppeteer, but it's a community project that lags behind the official Node.js library. For new Python browser automation projects in 2026, Playwright is the better choice — it's officially maintained, has full Python documentation, and offers both sync and async APIs.

Quick reference:

  • Pyppeteer — use if you're maintaining existing code or following Puppeteer tutorials
  • Playwright (Python) — use for new projects, better maintained, same capabilities
  • HelpMeTest — use for web application testing where you want self-healing tests without writing selector code

Try HelpMeTest free — write tests in plain English, run them in a real cloud browser.

Read more