BLOG POSTS

MangoHost Blog / Selenium WebDriver Tutorial: Getting Started

Selenium WebDriver Tutorial: Getting Started

Selenium WebDriver is the go-to automation framework for web application testing, letting you programmatically control browsers like they’re remote-controlled cars. Whether you’re setting up CI/CD pipelines, scraping data, or just tired of manually clicking through the same workflows, WebDriver gives you the power to automate pretty much anything a human can do in a browser. This tutorial will walk you through the fundamentals of getting WebDriver up and running, show you real examples that actually work, and help you avoid the gotchas that trip up newcomers.

How Selenium WebDriver Works

WebDriver operates through a client-server architecture where your test code communicates with browser drivers via HTTP requests using the WebDriver protocol (formerly JSON Wire Protocol, now W3C WebDriver standard). When you write something like driver.findElement(By.id("username")), that gets translated into HTTP POST requests sent to the browser driver, which then manipulates the actual browser instance.

The architecture looks like this: Your Test Code → Language Bindings → Browser Driver → Browser. Each browser needs its own driver – ChromeDriver for Chrome, GeckoDriver for Firefox, EdgeDriver for Edge. These drivers act as translators between WebDriver commands and browser-specific APIs.

Here’s what happens under the hood when you run a simple test:

WebDriver spawns a new browser instance in a controlled environment
Your commands get serialized into HTTP requests
The browser driver receives these requests and executes native browser actions
Results get passed back through the same chain in reverse

Step-by-Step Setup Guide

Let’s get you set up with Python since it’s probably the most straightforward for beginners. You’ll need Python 3.6+ and pip installed.

First, install the Selenium package:

pip install selenium

Next, you need browser drivers. The easiest approach is using WebDriverManager, which automatically handles driver downloads and version matching:

pip install webdriver-manager

Here’s your first working WebDriver script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time

# Setup Chrome driver with automatic driver management
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

try:
    # Navigate to a website
    driver.get("https://example.com")
    
    # Find an element and interact with it
    search_box = driver.find_element(By.NAME, "q")
    search_box.send_keys("selenium webdriver")
    search_box.submit()
    
    # Wait a bit to see results
    time.sleep(3)
    
    # Get page title
    print(f"Page title: {driver.title}")
    
finally:
    # Always close the browser
    driver.quit()

For headless execution (no GUI), add these Chrome options:

from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome(service=service, options=chrome_options)

If you prefer Firefox, swap in GeckoDriver:

from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.firefox.service import Service

service = Service(GeckoDriverManager().install())
driver = webdriver.Firefox(service=service)

Real-World Examples and Use Cases

Here’s a practical example that automates form submission on a typical login page:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

def login_automation(username, password):
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    
    try:
        driver.get("https://your-app.com/login")
        
        # Wait for elements to load instead of using sleep()
        wait = WebDriverWait(driver, 10)
        
        username_field = wait.until(
            EC.presence_of_element_located((By.ID, "username"))
        )
        password_field = driver.find_element(By.ID, "password")
        login_button = driver.find_element(By.XPATH, "//button[@type='submit']")
        
        # Fill in credentials
        username_field.send_keys(username)
        password_field.send_keys(password)
        login_button.click()
        
        # Verify successful login
        success_element = wait.until(
            EC.presence_of_element_located((By.CLASS_NAME, "dashboard"))
        )
        
        return True
        
    except TimeoutException:
        print("Login failed or page didn't load properly")
        return False
        
    finally:
        driver.quit()

For data extraction, here’s how you’d scrape a table:

def scrape_table_data(url):
    driver = webdriver.Chrome(options=chrome_options)
    driver.get(url)
    
    # Find table rows
    rows = driver.find_elements(By.XPATH, "//table[@id='data-table']//tr")
    
    data = []
    for row in rows[1:]:  # Skip header row
        cells = row.find_elements(By.TAG_NAME, "td")
        row_data = [cell.text.strip() for cell in cells]
        data.append(row_data)
    
    driver.quit()
    return data

Common real-world applications include:

E2E testing in CI/CD pipelines
Automated regression testing for web applications
Data scraping from dynamic JavaScript-heavy sites
Monitoring website functionality and uptime
Automating repetitive admin tasks

Comparison with Alternatives

Tool	Language Support	Performance	Learning Curve	Best For
Selenium WebDriver	Python, Java, C#, Ruby, JS	Moderate	Medium	Cross-browser testing, mature ecosystem
Playwright	Python, JS, C#, Java	Fast	Low-Medium	Modern apps, better debugging tools
Cypress	JavaScript only	Fast	Low	Frontend developers, great DX
Puppeteer	JavaScript/Node.js	Very Fast	Medium	Chrome-only automation, PDF generation

WebDriver still dominates in enterprise environments because of its maturity and extensive browser support. However, newer tools like Playwright are gaining traction for their speed and developer experience improvements.

Common Pitfalls and Troubleshooting

The biggest newbie mistake is not handling waits properly. Never use time.sleep() in production code:

# Bad - brittle and slow
time.sleep(5)
element = driver.find_element(By.ID, "dynamic-content")

# Good - responsive and reliable
wait = WebDriverWait(driver, 10)
element = wait.until(EC.presence_of_element_located((By.ID, "dynamic-content")))

Element selection issues are another common problem. XPath can be fragile, so prefer CSS selectors or IDs when possible:

# Fragile - breaks when DOM structure changes
driver.find_element(By.XPATH, "/html/body/div[3]/div[2]/form/input[1]")

# Better - more resilient
driver.find_element(By.CSS_SELECTOR, "input[name='username']")

# Best - if available
driver.find_element(By.ID, "username-input")

Browser driver version mismatches cause headaches. Symptoms include “SessionNotCreatedException” or browsers not launching. WebDriverManager mostly solves this, but for manual setups, always match your browser version with the driver version.

Memory leaks happen when you forget to call driver.quit(). Use try-finally blocks or context managers:

from contextlib import contextmanager

@contextmanager
def get_driver():
    driver = webdriver.Chrome()
    try:
        yield driver
    finally:
        driver.quit()

# Usage
with get_driver() as driver:
    driver.get("https://example.com")
    # Automatic cleanup happens here

Best Practices and Performance Tips

For faster execution, disable images and CSS when you don’t need them:

chrome_options = Options()
prefs = {
    "profile.managed_default_content_settings.images": 2,
    "profile.default_content_setting_values.notifications": 2
}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")

Use explicit waits strategically. WebDriverWait with expected conditions is your friend:

from selenium.webdriver.support import expected_conditions as EC

# Wait for element to be clickable
clickable_element = wait.until(EC.element_to_be_clickable((By.ID, "submit-btn")))

# Wait for text to appear
wait.until(EC.text_to_be_present_in_element((By.ID, "status"), "Complete"))

# Wait for element to disappear (loading spinners)
wait.until(EC.invisibility_of_element((By.CLASS_NAME, "loading-spinner")))

For CI/CD environments, always run in headless mode and consider using Docker containers with pre-installed browsers to avoid environment inconsistencies.

Page Object Model (POM) keeps your tests maintainable as they grow:

class LoginPage:
    def __init__(self, driver):
        self.driver = driver
        self.username_field = (By.ID, "username")
        self.password_field = (By.ID, "password")
        self.login_button = (By.XPATH, "//button[@type='submit']")
    
    def login(self, username, password):
        self.driver.find_element(*self.username_field).send_keys(username)
        self.driver.find_element(*self.password_field).send_keys(password)
        self.driver.find_element(*self.login_button).click()

Performance-wise, WebDriver typically handles 10-50 operations per second depending on page complexity and network conditions. For high-volume scenarios, consider parallel execution with tools like pytest-xdist or Selenium Grid.

Security note: never hardcode credentials in your scripts. Use environment variables or secure vaults:

import os
username = os.getenv('TEST_USERNAME')
password = os.getenv('TEST_PASSWORD')

The official Selenium documentation at https://selenium-python.readthedocs.io/ covers advanced topics like handling multiple windows, working with frames, and mobile testing. For troubleshooting browser-specific issues, check the ChromeDriver docs at https://chromedriver.chromium.org/ and GeckoDriver at https://github.com/mozilla/geckodriver.

This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.

This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.