Selenium Tutorial for Beginners: Web Automation from Scratch

Selenium Tutorial for Beginners: Web Automation from Scratch

Selenium is the original web automation framework, powering browser tests since 2004. This tutorial covers everything from installation to running your first test — in Python and JavaScript. You'll also learn Selenium's biggest pain points (explicit waits, element staleness, cross-browser setup) and when modern tools like Playwright make more sense.

Key Takeaways

Selenium uses WebDriver protocol. Your test code sends HTTP commands to a browser driver (chromedriver, geckodriver), which controls the browser. This extra hop is what makes Selenium slower and less reliable than newer tools.

Always use explicit waits, never implicit. time.sleep() and implicitly_wait() cause flaky tests. Use WebDriverWait with expected_conditions to wait for exactly what you need.

Never use absolute XPath. /html/body/div[2]/form/input[1] breaks on every page change. Use relative XPath or CSS selectors instead.

Page Object Model (POM) is essential. As your test suite grows, raw Selenium code becomes unmaintainable. POM centralizes selectors and page interactions so changes propagate everywhere.

Selenium is still relevant for enterprise. If you need Java, C#, Python, or Ruby, multi-browser/version testing, or work in an organization standardized on Selenium Grid, it's still the right choice. For new JavaScript projects, Playwright is better.

What Is Selenium?

Selenium is an open-source browser automation framework. It was originally built at ThoughtWorks in 2004 to automate web application testing. Today it's the most widely used testing tool in enterprise environments.

Selenium automates browsers through the WebDriver protocol — a W3C standard that all major browsers implement. You write code, it talks to a browser driver, the driver controls the browser.

Your Test Code → Selenium WebDriver → Browser Driver (chromedriver) → Chrome

Selenium Components

  • Selenium WebDriver: The core library for browser automation (what you'll use)
  • Selenium IDE: Chrome/Firefox extension for recording tests (good for prototyping)
  • Selenium Grid: Run tests in parallel across multiple machines/browsers

Installation

Python

pip install selenium

Install Chrome and ChromeDriver. The easiest way in 2026 is webdriver-manager:

pip install webdriver-manager

JavaScript (Node.js)

npm install selenium-webdriver

Install Chrome and ChromeDriver automatically:

npm install chromedriver

Your First Test

Python

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager

# Launch Chrome
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

try:
    # Navigate to page
    driver.get("https://example.com")

    # Verify the title
    assert "Example Domain" in driver.title, f"Unexpected title: {driver.title}"

    # Find element and verify text
    heading = driver.find_element(By.TAG_NAME, "h1")
    assert heading.text == "Example Domain"

    print("Test passed!")

finally:
    driver.quit()  # Always close the browser

JavaScript

const { Builder, By, until } = require('selenium-webdriver')
const chrome = require('selenium-webdriver/chrome')

async function runTest() {
  const driver = new Builder()
    .forBrowser('chrome')
    .setChromeOptions(new chrome.Options().headless())
    .build()

  try {
    await driver.get('https://example.com')

    const title = await driver.getTitle()
    console.assert(title.includes('Example Domain'), `Unexpected title: ${title}`)

    const heading = await driver.findElement(By.tagName('h1'))
    const text = await heading.getText()
    console.assert(text === 'Example Domain', `Unexpected heading: ${text}`)

    console.log('Test passed!')
  } finally {
    await driver.quit()
  }
}

runTest()

Finding Elements

This is the core skill in Selenium: locating elements reliably.

Locator Types

from selenium.webdriver.common.by import By

# By ID — fastest, most reliable (when IDs are stable)
driver.find_element(By.ID, "submit-button")

# By name attribute
driver.find_element(By.NAME, "email")

# By CSS selector — versatile, readable
driver.find_element(By.CSS_SELECTOR, "button[type='submit']")
driver.find_element(By.CSS_SELECTOR, "#login-form input.email-field")
driver.find_element(By.CSS_SELECTOR, ".product-card:first-child .price")

# By XPath — use when CSS isn't enough
driver.find_element(By.XPATH, "//button[text()='Submit']")
driver.find_element(By.XPATH, "//label[text()='Email']/following-sibling::input")

# By link text (exact match)
driver.find_element(By.LINK_TEXT, "Sign In")

# By partial link text
driver.find_element(By.PARTIAL_LINK_TEXT, "Sign")

# By class name (avoid — fragile)
driver.find_element(By.CLASS_NAME, "submit-btn")

# By tag name (usually returns many elements)
driver.find_elements(By.TAG_NAME, "input")  # Note: find_elements (plural)

Selector Priority

Use selectors in this order, from most stable to least:

  1. By.ID — stable if IDs are meaningful (not auto-generated)
  2. By.NAME — good for form inputs
  3. By.CSS_SELECTOR — versatile and readable
  4. By.XPATH — use for text-based or complex relationships
  5. By.CLASS_NAME — fragile (classes change with styling refactors)

Finding Multiple Elements

# Returns a list — empty list if none found (no exception)
items = driver.find_elements(By.CSS_SELECTOR, ".product-card")
print(f"Found {len(items)} products")

for item in items:
    name = item.find_element(By.CSS_SELECTOR, ".product-name").text
    price = item.find_element(By.CSS_SELECTOR, ".price").text
    print(f"{name}: {price}")

Interacting with Elements

# Click
button = driver.find_element(By.ID, "submit")
button.click()

# Type text
email_field = driver.find_element(By.NAME, "email")
email_field.clear()
email_field.send_keys("user@example.com")

# Submit a form
form = driver.find_element(By.TAG_NAME, "form")
form.submit()

# Press keyboard keys
from selenium.webdriver.common.keys import Keys

search = driver.find_element(By.NAME, "q")
search.send_keys("selenium tutorial")
search.send_keys(Keys.RETURN)

# Select from dropdown
from selenium.webdriver.support.ui import Select

country_select = Select(driver.find_element(By.ID, "country"))
country_select.select_by_visible_text("United States")
country_select.select_by_value("US")
country_select.select_by_index(1)

# Checkbox
checkbox = driver.find_element(By.ID, "agree-terms")
if not checkbox.is_selected():
    checkbox.click()

# Get text/attributes
heading = driver.find_element(By.TAG_NAME, "h1")
print(heading.text)           # Visible text
print(heading.get_attribute("id"))     # Attribute value
print(heading.get_attribute("class"))
print(heading.is_displayed())  # Is element visible?
print(heading.is_enabled())    # Is element interactive?

Waits — The Most Important Topic

Timing is Selenium's biggest challenge. Modern web apps load content asynchronously. Without proper waits, your tests will fail intermittently.

Never Use time.sleep()

# BAD — arbitrary sleep
time.sleep(3)  # Wastes time when fast, still fails when slow
click_button()

Explicit Waits — The Right Way

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

wait = WebDriverWait(driver, timeout=10)  # Wait up to 10 seconds

# Wait until element is visible
element = wait.until(
    EC.visibility_of_element_located((By.ID, "success-message"))
)

# Wait until element is clickable
button = wait.until(
    EC.element_to_be_clickable((By.CSS_SELECTOR, "button[type='submit']"))
)
button.click()

# Wait for URL to change
wait.until(EC.url_contains("/dashboard"))

# Wait for text to appear
wait.until(EC.text_to_be_present_in_element((By.ID, "status"), "Complete"))

# Wait for element to disappear
wait.until(EC.invisibility_of_element_located((By.ID, "loading-spinner")))

# Wait for element count
wait.until(EC.number_of_windows_to_be(2))

Common Expected Conditions

# Presence in DOM (not necessarily visible)
EC.presence_of_element_located((By.ID, "my-id"))

# Visible on screen
EC.visibility_of_element_located((By.ID, "my-id"))

# Clickable (visible AND enabled)
EC.element_to_be_clickable((By.ID, "button"))

# Selected (checkbox/radio)
EC.element_to_be_selected(element)

# URL checks
EC.url_to_be("https://example.com/dashboard")
EC.url_contains("/dashboard")
EC.url_matches(r".*/dashboard/\d+")

# Title checks
EC.title_is("Dashboard | MyApp")
EC.title_contains("Dashboard")

# Alert present
EC.alert_is_present()

Custom Wait Conditions

# Wait for custom condition using lambda
wait.until(lambda driver: len(driver.find_elements(By.CSS_SELECTOR, ".product")) > 5)

# Wait for AJAX to finish
wait.until(lambda driver: driver.execute_script("return jQuery.active == 0"))

Page Object Model

For any real test suite, Page Object Model (POM) is essential. It moves page selectors and interactions into classes, so tests are readable and changes are isolated.

# pages/login_page.py
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


class LoginPage:
    URL = "/login"

    EMAIL_INPUT = (By.NAME, "email")
    PASSWORD_INPUT = (By.NAME, "password")
    SUBMIT_BUTTON = (By.CSS_SELECTOR, "button[type='submit']")
    ERROR_MESSAGE = (By.CSS_SELECTOR, ".error-message")

    def __init__(self, driver):
        self.driver = driver
        self.wait = WebDriverWait(driver, 10)

    def goto(self):
        self.driver.get(self.URL)

    def login(self, email, password):
        self.wait.until(EC.visibility_of_element_located(self.EMAIL_INPUT))
        self.driver.find_element(*self.EMAIL_INPUT).send_keys(email)
        self.driver.find_element(*self.PASSWORD_INPUT).send_keys(password)
        self.driver.find_element(*self.SUBMIT_BUTTON).click()

    def get_error_message(self):
        return self.wait.until(
            EC.visibility_of_element_located(self.ERROR_MESSAGE)
        ).text
# tests/test_login.py
import pytest
from selenium import webdriver
from pages.login_page import LoginPage
from pages.dashboard_page import DashboardPage


@pytest.fixture
def driver():
    driver = webdriver.Chrome()
    yield driver
    driver.quit()


def test_valid_login(driver):
    login_page = LoginPage(driver)
    login_page.goto()
    login_page.login("user@example.com", "password123")

    dashboard = DashboardPage(driver)
    assert dashboard.is_loaded()
    assert dashboard.get_welcome_message() == "Welcome, User"


def test_invalid_password(driver):
    login_page = LoginPage(driver)
    login_page.goto()
    login_page.login("user@example.com", "wrongpassword")

    error = login_page.get_error_message()
    assert "Invalid credentials" in error

Handling Common Scenarios

Alerts and Popups

from selenium.webdriver.support import expected_conditions as EC

# Wait for alert, then accept it
wait.until(EC.alert_is_present())
alert = driver.switch_to.alert
print(alert.text)   # Read alert text
alert.accept()      # Click OK
# alert.dismiss()   # Click Cancel

# Send text to a prompt dialog
alert.send_keys("My input")
alert.accept()

Iframes

# Switch to iframe
iframe = driver.find_element(By.CSS_SELECTOR, "iframe#payment-frame")
driver.switch_to.frame(iframe)

# Now interact with elements inside the iframe
driver.find_element(By.NAME, "card-number").send_keys("4111111111111111")

# Switch back to main page
driver.switch_to.default_content()

Multiple Windows/Tabs

# Open link in new tab
main_window = driver.current_window_handle
driver.find_element(By.LINK_TEXT, "Open report").click()

# Switch to new tab
for handle in driver.window_handles:
    if handle != main_window:
        driver.switch_to.window(handle)
        break

# Work in new tab
assert "Report" in driver.title

# Close tab and switch back
driver.close()
driver.switch_to.window(main_window)

JavaScript Execution

# Execute JavaScript for things Selenium can't do directly
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")

# Click hidden elements
element = driver.find_element(By.ID, "hidden-button")
driver.execute_script("arguments[0].click();", element)

# Set values in JavaScript
driver.execute_script("arguments[0].value = 'test';", input_field)

# Return values
page_title = driver.execute_script("return document.title")

Screenshots

# Full page screenshot
driver.save_screenshot("screenshot.png")

# Element screenshot
element = driver.find_element(By.ID, "chart")
element.screenshot("chart.png")

Running with pytest (Python)

pip install pytest pytest-selenium
# conftest.py
import pytest
from selenium import webdriver
from selenium.webdriver.chrome.options import Options


@pytest.fixture(scope="session")
def driver():
    options = Options()
    options.add_argument("--headless")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")

    driver = webdriver.Chrome(options=options)
    driver.implicitly_wait(0)  # Disable implicit waits — use explicit only
    yield driver
    driver.quit()


@pytest.fixture
def fresh_driver():
    # Per-test driver (slower but more isolated)
    options = Options()
    options.add_argument("--headless")
    driver = webdriver.Chrome(options=options)
    yield driver
    driver.quit()

Run tests:

pytest tests/ -v
pytest tests/ -k "login"         <span class="hljs-comment"># Run tests matching "login"
pytest tests/ --reruns 2          <span class="hljs-comment"># Retry flaky tests twice

CI/CD Integration

# .github/workflows/selenium.yml
name: Selenium Tests

on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - run: pip install -r requirements.txt

      - name: Install Chrome
        run: |
          wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
          sudo apt-get update
          sudo apt-get install -y google-chrome-stable

      - name: Run Selenium tests
        run: pytest tests/ -v --tb=short

      - uses: actions/upload-artifact@v4
        if: failure()
        with:
          name: screenshots
          path: screenshots/

Selenium vs Modern Alternatives

Selenium Playwright Cypress
Languages Python, Java, JS, C#, Ruby JS, Python, Java, C# JS only
Auto-waiting No (manual waits) Yes (built-in) Yes (built-in)
Speed Slow Fast Fast
Cross-browser Yes (best support) Yes (incl. WebKit) Limited
Learning curve High Medium Low
Enterprise adoption Very high Growing High
Selenium Grid Yes No (built-in parallel) Cypress Cloud

Use Selenium when:

  • Enterprise environment standardized on Selenium/Grid
  • Need Java or C# (Playwright also supports these now)
  • Testing older browsers or specific browser versions
  • Large existing Selenium test suite

Use Playwright instead when:

  • Starting a new project in 2026
  • Building a JavaScript/TypeScript app
  • Want auto-waiting and better reliability out of the box

Getting Started Checklist

  • Install Selenium: pip install selenium webdriver-manager
  • Write first test using explicit waits (not time.sleep)
  • Create a Page Object for your login page
  • Add pytest fixtures for browser lifecycle
  • Run headless in CI: --headless Chrome option
  • Set up GitHub Actions to run on every push

Writing Selenium tests one by one? HelpMeTest generates browser tests from plain English and runs them automatically — no WebDriver configuration required.

Read more