Back to blog

Watir Ruby: How To Automate Browsers and Scrape Web Data Step by Step

Share article:

Watir is an open-source Ruby library for automating web browsers through code. Built on top of Selenium WebDriver, it wraps browser communication in a clean, Ruby-idiomatic API so you can focus on clicking buttons, filling forms, navigating pages, and extracting data without managing the underlying complexity. It's particularly useful for scraping JavaScript-heavy sites, automating form submissions, and collecting content that only appears after user interaction. This guide walks you through the full process, from setup to a working Watir scraper with proxy support.

Diamond in a squircle

TL;DR

  • Watir is a Ruby-native browser automation library built on top of Selenium WebDriver
  • Install Watir and webdrivers gems to start browser automation quickly
  • Use Watir Ruby to automate Chrome, Firefox, Edge, and Safari with simple Ruby code.
  • Use a Watir Ruby scraper to locate elements using IDs, classes, CSS selectors, XPath expressions, text content, or any combination of attributes
  • Scale Watir scraping with Decodo residential proxies as Watir natively supports proxy configuration via Chrome options
  • Combine Watir's automatic waiting and browser control features to scrape JavaScript-powered websites reliably
  • Always close browser sessions in an ensure block to prevent zombie processes from consuming your server's memory.

What is browser automation?

Browser automation is the process of writing code that controls a web browser the same way a person would, such as visiting URLs, clicking elements, typing text, scrolling down, waiting for content to appear, and reading what the page shows. 

Instead of opening a browser and manually navigating through pages, a script performs those actions automatically.

It helps automate repetitive tasks, collect web data, test applications, and monitor websites.

The key difference from static HTTP scrapers is that the browser actually executes JavaScript, handles cookies and sessions, and renders the page exactly as a human visitor would see it.

This matters because a large portion of the modern web doesn't exist in the HTML that the server first sends. It gets loaded dynamically after the initial page load, triggered by JavaScript that runs in the browser. 

A scraper that only reads the raw HTML response misses most of this content. A browser automation tool captures it all.

The most common use cases of browser automation include:

  • Web scraping and data extraction. Many websites load content through JavaScript after the page initially loads. Traditional HTTP requests often miss this content.

Browser automation solves this problem by rendering pages just like a real user would. Once the page finishes loading, your script can extract text, links, product information, pricing data, and other structured content.

  • Automated testing. Quality assurance teams frequently use browser automation to verify that web applications work correctly. Instead of manually testing forms, login pages, shopping carts, and checkout flows, automated tests perform those actions repeatedly and consistently.
  • Form submission at scale. Filling in and submitting forms that require a real browser session, such as registration flows, lead capture forms, or booking systems
  • Monitoring website changes. Organizations often monitor websites for changes. A browser automation script can check page availability, detect layout changes, monitor product availability, track competitor updates, and verify website functionality, then trigger downstream alerts or actions.
  • Price tracking. Browser automation helps collect pricing information from multiple websites automatically and on a schedule, even on sites that block simple HTTP scrapers.

Browser automation also enables things that HTTP-based scraping simply can't do, such as:

  • Handling multi-step authentication flows
  • Interacting with dynamic dropdowns and date pickers
  • Waiting for specific user interface states before reading data
  • Dealing with sites that fingerprint and block non-browser HTTP clients

Here’s how browser automation beats manual work:

Manual process

Browser automation

Slow execution

Fast execution

Human errors

Consistent results

Limited scale

Handles large workloads

Difficult repetition

Repeatable workflows

Time-consuming

Runs automatically

With browser automation, all you need is to create a working automation script, and you can run it hundreds or thousands of times without changing the workflow.

What’s the role of headless browsers?

A headless browser runs without displaying a graphical user interface.

Instead of opening visible windows, a headless browser operates like a normal browser. It loads pages, executes JavaScript, handles cookies, and renders content entirely in the background without painting anything to a screen.

This has several benefits: it runs faster because it skips the rendering pipeline, lowers memory usage, and is easier to deploy on servers and CI/CD environments that have no display.

Understanding Watir's components: Classic, webdriver, and watirspec

Watir has evolved significantly since its first release in 2001, making it one of the oldest browser automation libraries still in active use.

Earlier versions focused on Internet Explorer, while modern Watir supports today's major browsers through Selenium WebDriver.

Here are its main components:

Watir-Classic (deprecated)

Watir originally launched as Watir-Classic.

This version communicated directly with Internet Explorer using Microsoft's OLE/COM technology.

At the time, Internet Explorer dominated the browser market, making this approach practical for browser automation.

Watir-Classic offered:

  • Direct Internet Explorer control
  • Ruby-friendly syntax
  • Browser testing capabilities
  • Automation for internal business applications

However, Internet Explorer eventually reached end-of-life status.

As browser standards evolved and Chrome became dominant, Watir required a new architecture.

Today, Watir-Classic is considered deprecated and is no longer recommended for new projects. While it’s still mentioned in old documentation and Stack Overflow answers, you shouldn't use it in any new project.

Watir-Webdriver (the modern Watir)

To support modern browsers, the Watir project introduced Watir-WebDriver.

Modern Watir (version 6.x and above) wraps Selenium WebDriver, the industry-standard protocol for programmatic browser control. The WebDriver protocol is an API and W3C standard that defines how automation clients communicate with browsers. 

ChromeDriver, GeckoDriver (for Firefox), and MSEdgeDriver all implement this protocol. Watir uses Selenium's Ruby bindings to talk to these drivers and adds its own higher-level API layer on top.

The practical effect is that you get cross-browser support essentially for free. The same Watir code that drives Chrome also works with Firefox and Edge, because they all speak the same WebDriver protocol underneath.

WatirSpec

WatirSpec is the executable specification of the Watir API, a complete test suite that defines exactly how Watir should behave. You can think of it as a set of standards and automated tests that ensure consistent behavior across Watir implementations. If a Watir version passes WatirSpec, it behaves correctly. 

This is similar to how RubySpec defines Ruby's expected behavior across different implementations. You won't interact with WatirSpec directly when writing scrapers, but it guarantees behavioral consistency across Watir releases.

Its purpose includes:

  • Defining expected API behavior
  • Maintaining consistency
  • Supporting future development
  • Preventing regressions

Watir vs. Selenium

As a developer evaluating browser automation tools in Ruby, you’d probably wonder why use Watir instead of Selenium directly, but this is almost entirely about developer experience.

Selenium in Ruby is verbose. Every interaction requires you to explicitly wait for elements to be ready, wrap operations in WebDriverWait blocks, and write significantly more boilerplate code than the actual scraping logic. Watir handles waiting automatically. When you call:

browser.button(text: 'Submit').click

Watir waits for that button to exist and be clickable before clicking it. With Selenium, you'd need to write an explicit wait condition first. Over the course of a full scraping project, this difference adds up to a lot less code and far fewer timing-related bugs.

The relationship looks like this:

Your Ruby Script -> Watir -> Selenium WebDriver -> Browser Driver -> Browser

Many Ruby developers choose Watir because it reduces complexity while retaining Selenium's browser support.

Installing Watir and setting up your Ruby environment

Let's build a working environment and install everything required before we start web scraping with Ruby.

This section covers Ruby installation requirements, gem setup, browser driver management, and verifying that everything is working before you write a single line of scraping code.

Verify your Ruby installation

Watir currently supports modern Ruby versions.

You’ll need Ruby 2.6 or a newer version. We recommend Ruby 3.x for new projects because it brings meaningful performance improvements and is what most teams use today. 

You should always verify the latest requirements from the gem documentation.

Check your installed Ruby version from a terminal or command prompt by running:

ruby --version

Or the shorter equivalent:

ruby -v

If Ruby is installed on your device, you’ll see something like this:

ruby 3.3.0 (2023-12-25 revision ...) [x86_64-linux]

If not installed, download it from the official website and use a version manager like rbenv or RVM on macOS and Linux (these let you install and switch between multiple Ruby versions), or the RubyInstaller on Windows. 

Install the Watir gem globally

With Ruby ready, install Watir through RubyGems.

gem install watir

For projects that use Bundler, which is recommended for anything beyond a quick script, add both gems to your Gemfile:

# Gemfile
gem "watir"
gem "webdrivers"

Then install the dependencies:

bundle install

Verify installation:

gem list watir

You should see output similar to:

watir (x.y.z)

You can run a quick test to see if Watir is installed correctly.

Create a file named test.rb and add this script:

require "watir"
browser = Watir::Browser.new :chrome
browser.goto "https://example.com"
puts browser.title
browser.close

Run it:

ruby test.rb

If the browser opens, navigates to the page, prints the title, and closes, Watir is installed correctly.

Install the webdrivers gem

Browser automation requires a driver executable that acts as a bridge between Watir and the browser itself. 

For instance, Chrome needs ChromeDriver, Firefox needs GeckoDriver, and Edge needs MSEdgeDriver. Without the correct driver version installed, Watir can't control the browser.

The webdrivers gem handles all of this automatically. It detects the version of each browser you have installed, downloads the matching driver if it's not already present, and ensures the driver is on your PATH. 

This eliminates the most common source of setup frustration with Selenium-based automation. Install it once and forget about driver management:

gem install webdrivers

Verify the installation

The most reliable way to confirm everything works is a minimal script that opens a real browser session, visits a URL, reads the page title, and closes cleanly. 

Create a file named verify_install.rb and add this script:

require 'watir'
require 'webdrivers'
browser = Watir::Browser.new :chrome
browser.goto 'https://example.com'
puts browser.title
browser.close

Run it:

ruby verify_install.rb

Expected output:

Example Domain

If the page title prints to the terminal without errors, your setup is working. If a Chrome window opens, navigates to example.com, prints Example Domain, and closes without errors, your Watir + WebDriver setup is working correctly.

One note: with recent versions of Selenium and Chrome, the webdrivers gem is often no longer necessary because Selenium can automatically manage drivers itself. If you encounter version conflicts, check the versions of Watir, Selenium, and Chrome you're using.

Suggested project file structure

Small projects often work with a single Ruby file.

For larger scraping projects, separate concerns clearly from the start and make the code easier to maintain as it grows. This avoids the common problem of one massive script file that's hard to test, maintain, or hand off:

my_scraper/
scraper.rb # entry point and main scraping loop
models/
product.rb # data structures
config/
settings.rb # timeouts, base URLs, proxy credentials
output/
results.csv # scraped data
logs/
scraper.log # execution log

Keep credentials out of your source files by loading them from environment variables or a config file that's excluded from version control. 

Starting browser sessions and basic navigation

With Watir installed, you can control browsers directly from Ruby code. Once the browser launches, you can navigate pages, inspect content, move between pages, refresh content, and close sessions cleanly.

Create a browser instance

Open a Chrome browser with a single line:

require 'watir'
require 'webdrivers'
browser = Watir::Browser.new :chrome

The Watir::Browser.new call launches the browser, sets up the WebDriver connection, and returns a browser object you'll use for all subsequent operations. To use a different browser, pass its name:

browser = Watir::Browser.new :firefox
browser = Watir::Browser.new :edge
browser = Watir::Browser.new :safari

Open a webpage

After creating a browser session, navigate to a website using goto

browser.goto 'https://books.toscrape.com'

After calling goto, Watir waits for the page to reach an interactive state before returning. This behavior helps reduce synchronization issues. You don't need to add an explicit sleep or wait after navigation for most pages.

Reading page information

Once a page loads, you can inspect important details.

puts browser.title # => "Books to Scrape"
puts browser.url # => https://books.toscrape.com/
puts browser.text

These are useful for confirming you've landed on the right page, especially after redirects.

The browser.title method returns the page title shown in the browser tab, while browser.url returns the current URL after any redirects. Meanwhile, browser.text returns the visible text content from the page.

Example:

require 'watir'
browser = Watir::Browser.new(:chrome)
browser.goto 'https://books.toscrape.com'
puts "Title: #{browser.title}"
puts "URL: #{browser.url}"
puts browser.text
browser.close

You need to check the title and URL after navigation to verify that the browser reached the expected page, especially when sites perform redirects or authentication flows.

Watir Ruby provides methods that behave like browser navigation methods/buttons (backforward, and refresh).

browser.back
browser.forward
browser.refresh

Example:

require 'watir'
browser = Watir::Browser.new(:chrome)
# Navigate to first page
browser.goto 'https://books.toscrape.com'
puts browser.title
# Navigate to another page
browser.goto 'https://books.toscrape.com/catalogue/category/books/travel_2/index.html'
puts browser.title
# Go back to previous page
browser.back
puts "After back: #{browser.title}"
# Go forward again
browser.forward
puts "After forward: #{browser.title}"
# Refresh the current page
browser.refresh
browser.close

You call these methods after the browser has already navigated to one or more pages.

Run in headless mode

Many production scraping systems run entirely in headless mode. It uses less memory and doesn't require a display, which matters on servers and in CI environments. 

Pass headless through Chrome options:

options = Selenium::WebDriver::Chrome::Options.new
options.add_argument '--headless=new'
options.add_argument '--no-sandbox'
options.add_argument '--disable-dev-shm-usage'
browser = Watir::Browser.new :chrome, options: options

The headless mode is commonly used in production, CI/CD pipelines, and server environments because it runs Chrome without a visible UI. 

The --no-sandbox and --disable-dev-shm-usage flags help avoid common Chrome startup issues in Docker containers and some CI environments. Add them when needed for your deployment environment, particularly when running Chrome inside Linux containers.

Close the browser

Always close browser sessions after completing your work to free the process and release memory.

Failing to close sessions can leave browser processes running in memory. Over time, those unused processes consume system resources and create stability issues.

Use:

browser.close

For reliable cleanup in scripts that might raise exceptions, use ensure:

begin
browser = Watir::Browser.new :chrome, options: options
browser.goto 'https://books.toscrape.com'
# scraping logic
ensure
browser.close
end

Now that you can control browser sessions, the next step is interacting with actual page elements.

Finding elements and interacting with pages

Locating elements on a page and then interacting with them is the foundation of browser automation. Before a script can click buttons, submit forms, or extract information, it must identify the correct page elements. 

Watir Ruby gives you a rich set of locator strategies and provides straightforward methods for interacting with text fields, links, dropdowns, checkboxes, and tables.

Locating single elements

Watir provides multiple ways to find an element. The choice affects how reliable and maintainable your scraper is. 

Some locators are stable across site changes, others break easily. You need to know how to choose the right selector depending on your scraper.

Locating by ID is the most reliable locator when available. IDs are intended to be unique within a page and tend to be stable:

element = browser.element(id: 'search-input')

Locating by class name works when IDs aren't available, but class names change more often:

element = browser.element(class: 'product-title')

Locating by CSS selector gives you the full power of CSS selectors, including tag names, class names, attribute values, pseudo-selectors, and descendant relationships:

element = browser.element(css: 'article.product_pod h3 a')

Locating by XPath is powerful for complex conditions, navigating document structure, and finding elements by their text content or relationship to other elements:

element = browser.element(xpath: '//button[contains(text(), "Add to basket")]')

For XPath syntax reference, see the XPath glossary entry.

You can also locate elements by visible text:

element = browser.element(text: 'Add to basket')

Or by tag name with attributes; combine tag type with any attribute for a precise match:

element = browser.element(tag_name: 'input', type: 'submit')

Getting collections of elements

Many scraping tasks require multiple elements. Watir provides collection methods such as:

All links:

links = browser.links

All divs:

divs = browser.divs

All matching elements:

elements = browser.elements(class: "product")

Iterating through results:

browser.links.each do |link|
puts link.text
end

The code above iterates through all links in browser.links and prints the visible text of each one.

For example, if a page contains:

<a href="/home">Home</a>
<a href="/about">About Us</a>
<a href="/contact">Contact</a>

The output would be:

Home
About Us
Contact

Interacting with elements

Filling text fields:

Suppose we want to submit a search query.

browser.text_field(id: 'email').set 'user@example.com'
browser.text_field(name: 'password').set 'secret'

Watir clears the existing value before entering text.

Clicking buttons and links:

Clicking is straightforward.

browser.button(text: 'Sign in').click
Or
browser.link(href: /products/).click

Selecting dropdown values

Many forms include dropdown menus.

Select an option using:

browser.select_list(name: 'country').select 'United States'

Watir automatically chooses the matching option.

Working with checkboxes

Check a checkbox:

browser.checkbox(id: "agree").set

Clear a checkbox:

browser.checkbox(id: "agree").clear

Reading element text

Extract visible content:

title = browser.h1.text
puts title

Reading text and attributes:

puts browser.element(css: 'h1').text
puts browser.link(text: 'Details').href
puts browser.element(css: 'img.hero').attribute_value('src')

Extracting multiple links

The following example collects all links from a page.

browser.links.each do |link|
puts "#{link.text} - #{link.href}"
end

Reading table data

Many websites present information in tables. You can iterate through rows and cells.

table = browser.table(id: "products")
table.rows.each do |row|
puts row.cells.map(&:text).join(" | ")
end

Storing scraped data in Ruby structures

Structured storage makes data easier to export.

Example:

products = []
browser.divs(class: "product").each do |product|
products << {
title: product.h2.text,
price: product.span(class: "price").text
}
end
puts products

This approach creates clean data structures that can later be exported to JSON, CSV, databases, or APIs.

Extracting structured data: full example

This example launches Chrome in headless mode, visits books.toscrape.com, finds every product card on the first page, and extracts the title, price, and star rating into an array of hashes, then closes the browser even if an exception occurs.

require 'watir'
require 'webdrivers'
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument '--headless'
browser = Watir::Browser.new :chrome, options: options
begin
browser.goto 'https://books.toscrape.com'
browser.element(css: 'article.product_pod').wait_until(&:present?)
books = []
browser.elements(css: 'article.product_pod').each do |article|
title = article.element(css: 'h3 a').attribute_value('title')
price = article.element(css: 'p.price_color').text
rating = article.element(css: 'p.star-rating').attribute_value('class').split.last
books << { title: title, price: price, rating: rating }
end
books.each { |b| puts b.inspect }
ensure
browser.close
end

Run it:

ruby scrape_books.rb

Sample output:

{:title=>"A Light in the Attic", :price=>"£51.77", :rating=>"Three"}
{:title=>"Tipping the Velvet", :price=>"£53.74", :rating=>"One"}
{:title=>"Soumission", :price=>"£50.10", :rating=>"One"}
...

At this point, you can launch browsers, locate elements, submit forms, navigate websites, and extract structured information. The next section covers JavaScript execution, screenshots, waiting strategies, and techniques for handling asynchronously loaded content.

Advanced Watir techniques

Many modern websites rely heavily on JavaScript to load content after the initial page request. Watir provides tools for interacting with these websites, executing JavaScript directly, capturing screenshots, and waiting for content to load properly. These capabilities help create more reliable browser automation and web scraping workflows.

Executing JavaScript in Watir

Most websites expose content through the Document Object Model (DOM), which Watir can interact with directly. However, some situations require direct JavaScript execution.

Watir provides the execute_script method for running JavaScript inside the browser.

Basic example:

require "watir"
browser = Watir::Browser.new(:chrome, headless: true)
browser.goto("https://example.com")
browser.execute_script(
"document.body.style.backgroundColor = 'yellow';"
)
browser.close

This code changes the page background color through JavaScript.

Returning values from JavaScript

You can also retrieve values from the page.

The following example returns the page title:

title = browser.execute_script(
"return document.title"
)
puts title

This technique is useful when data exists in JavaScript variables that aren't visible through normal page elements.

Scrolling pages

Many websites use infinite scrolling to load content. Instead of clicking pagination links, you can scroll through the page using JavaScript.

browser.execute_script(
"window.scrollTo(0, document.body.scrollHeight);"
)

This command scrolls to the bottom of the page. A common scraping workflow repeatedly scrolls and waits for new content to load.

Triggering JavaScript events

Some elements respond to JavaScript events rather than standard clicks. In those situations, JavaScript can trigger actions directly.

Example:

browser.execute_script(
"document.querySelector('#submit').click();"
)

This approach should be used only when standard Watir interactions fail.

Extracting data from JavaScript variables

Many modern applications store information in browser-side JavaScript objects.

Example:

product_data = browser.execute_script(
"return window.productData;"
)
puts product_data

This method can simplify extraction when the data is already structured in JavaScript.

Taking screenshots

Screenshots are the most practical debugging tool for browser automation. When a scraping run fails overnight, and you want to understand why, a screenshot of what the browser saw can give you the full picture immediately, with CAPTCHA pages, login redirects, empty result sets, or error messages that weren't in the HTML you expected.

Save a screenshot of the current viewport using the screenshot API:

browser.screenshot.save("homepage.png")

Watir saves the image in your current project directory.

Add screenshots to your error handling so you always capture the browser state when something goes wrong:

begin
browser.element(css: 'div.results').wait_until(&:present?)
rescue Watir::Wait::TimeoutError => e
browser.screenshot.save "timeout_#{Time.now.to_i}.png"
raise e
end

Timestamp the filename to avoid overwriting screenshots across runs.

Here’s a complete screenshot example:

require "watir"
browser = Watir::Browser.new(:chrome, headless: true)
browser.goto("https://example.com")
browser.screenshot.save("example-homepage.png")
browser.close

Save the code as screenshot.rb.

Run it:

ruby screenshot.rb

Handling dynamic content and AJAX

The single biggest source of errors in browser automation scripts is timing: trying to read an element before it's loaded, or clicking a button before it's enabled. Watir's built-in waiting system is designed specifically to handle dynamic content.

This feature is one reason many Ruby developers prefer Watir over raw Selenium.

Instead of manually checking page state repeatedly, Watir can wait until conditions become true.

Waiting for elements

Suppose search results appear after a delay. You can wait until the results container becomes visible.

browser.div(id: "results")
.wait_until(&:present?)

Once the element appears, execution continues.

Waiting for elements to disappear

Some workflows require waiting for loading indicators to vanish.

Example:

browser.div(id: "loading")
.wait_while(&:present?)

The script pauses until the loading indicator disappears.

Custom wait conditions

Watir also supports custom logic.

Example:

browser.wait_until do
browser.links.count > 20
end

This code waits until at least 20 links exist on the page.

Adjusting timeouts

Watir uses a default timeout of 30 seconds.

You can increase it globally:

Watir.default_timeout = 60

This change gives slow-loading websites more time to respond.

Never use sleep for timing. The sleep function is an anti-pattern in browser automation. A fixed sleep 5 wastes time when the content loads in 1 second, and fails when the network is slow and the content takes 8 seconds. 

Watir's wait methods are always the right choice. They return as soon as the condition is met and raise an error with a clear message if the timeout is exceeded.

Instead of sleep 5, use:

browser.div(id: "results")
.wait_until(&:present?)

The script proceeds immediately after the text appears.

The following script waits for product cards before extracting data.

require "watir"
browser = Watir::Browser.new(:chrome, headless: true)
browser.goto("https://quotes.toscrape.com/js/")
browser.div(class: "quote")
.wait_until(&:present?)
browser.divs(class: "quote").each do |quote|
puts quote.text
end
browser.close

Run the script:

ruby dynamic_content.rb

Sample output:

"The world as we have created it..."
"It is our choices..."
"There are only two ways to live your life..."

At this stage, you've learned how to handle pages that rely on JavaScript and asynchronous loading. The next step is making your scraping projects more resilient when collecting larger amounts of web data.

Using proxies with Watir Ruby

As scraping volume increases, websites often begin limiting requests from the same IP address. Some sites go further and block entire IP ranges that belong to cloud providers, because datacenter IPs are heavily associated with scraping and automated activity. 

Proxies help distribute traffic, access geo-restricted content, and reduce the likelihood of temporary blocks. Combining Watir Ruby with residential proxies creates a more reliable setup for production scraping workloads.

Why proxies matter for scraping

Every website sees your IP address when you connect.

If you repeatedly access the same website from one IP, you may encounter:

  • Rate limits
  • Temporary restrictions
  • CAPTCHA challenges
  • Access denials

Here are key reasons why you need proxies for web scraping at scale:

  • IP-based rate limiting: A single IP sending 500 requests per hour will trigger most rate limiters. Rotating proxies distribute those requests across many different IPs while staying under the per-IP threshold on each one.
  • Geo-restricted content: Some websites, such as eCommerce platforms, streaming services, and news sites, show different content depending on the visitor's location. Residential proxies in specific countries let you see the local version of a page.
  • Anti-bot detection and blocking: Sophisticated sites use IP reputation databases and behavioral analysis to identify non-human traffic. Residential IPs look like real user traffic, making them far harder to flag than datacenter IPs.

Types of proxies and when to use each

Common types of proxies to use when using Watir to automate web browsers with Ruby include:

Residential proxies use IP addresses assigned by ISPs to real home devices. They're the most effective option for sites with active anti-bot protection because the IP traffic is indistinguishable from that of a real user. Residential proxies like the ones offered by Decodo are commonly used for:

  • eCommerce monitoring
  • Price intelligence
  • Search engine data collection
  • Travel fare monitoring

Datacenter proxies come from cloud server IP ranges. They're faster and cheaper, and they work for sites without strong anti-bot defenses, but can be easily flagged by services that check IP reputation. Use them for targets that don't actively block datacenter traffic.

Configuring Watir to use a proxy

Watir supports proxy configuration during browser initialization.

Before creating a browser session, define your proxy settings.

proxy_username = "YOUR_PROXY_USERNAME"
proxy_password = "YOUR_PROXY_PASSWORD"
proxy = {
http: "http://#{proxy_username}:#{proxy_password}@gate.decodo.com:7000",
ssl: "http://#{proxy_username}:#{proxy_password}@gate.decodo.com:7000"
}
browser = Watir::Browser.new(
:chrome,
proxy: proxy,
headless: true
)

The snippet configures a Watir browser to send all web traffic through an authenticated proxy server before creating a headless Chrome session.

All requests made by the browser, such as page loads, API calls, and asset downloads, are sent through the proxy endpoint instead of your direct internet connection.

Verify the proxy is working

After launching the browser, you can check the reported IP address:

browser.goto("https://httpbin.org/ip")
puts browser.text

If the proxy is configured correctly, the IP shown should be the proxy's public IP rather than your local machine's IP.

Complete Watir Ruby scraping example with a proxy

Here's a complete Watir Ruby example that routes traffic through an authenticated proxy, opens a page, and prints basic page information:

require "watir"
# Proxy credentials
proxy_username = "YOUR_PROXY_USERNAME"
proxy_password = "YOUR_PROXY_PASSWORD"
# Proxy configuration
proxy = {
http: "http://#{proxy_username}:#{proxy_password}@gate.decodo.com:7000",
ssl: "http://#{proxy_username}:#{proxy_password}@gate.decodo.com:7000"
}
# Launch Chrome through the proxy
browser = Watir::Browser.new(
:chrome,
proxy: proxy,
headless: true
)
begin
# Navigate to a page
browser.goto("https://example.com")
# Extract page information
puts "Title: #{browser.title}"
puts "URL: #{browser.url}"
puts "Body text:"
puts browser.body.text
ensure
browser.close
end

Run the script:

Save the file as proxy_scraper.rb.

Execute:

ruby proxy_scraper.rb

Example output:

Title: Example Domain
URL: https://example.com/
Body text:
Example Domain
This domain is for use in illustrative examples in documents…

Geo-targeting

Geo-targeted proxies allow you to view content from specific countries or regions. This capability becomes particularly useful for competitive intelligence projects.

Route requests through specific countries by using country entry points:

# US-based requests
options.add_argument '--proxy-server=http://gate.decodo.com:7000'
# UK-based requests
options.add_argument '--proxy-server=http://gate.decodo.com:7000'

This snippet describes how to route web traffic through proxy servers located in different countries so that websites see your requests as coming from those regions.

In a browser automation tool such as Selenium, these arguments configure the browser to send its traffic through a proxy endpoint. The proxy provider then exits the traffic from the selected country.

Common use cases include:

  • Competitive intelligence: Viewing search results, advertisements, or product listings as they appear to users in different countries. 
  • Regional price monitoring: Comparing prices across markets (for example, US vs. UK pricing). 
  • Localization testing: Verifying language, currency, and content variations shown to visitors from different regions. 
  • Geo-restricted content verification: Confirming whether content is available or displayed differently in specific countries. 

Here are a few practical considerations to keep in mind:

  • Websites may use additional signals beyond IP address (cookies, account location, browser language, GPS data, payment region, etc.), so changing the proxy alone may not always reproduce a local user's experience. 
  • Some sites actively detect and block proxy traffic. 
  • You should comply with the website's terms of service and applicable laws when collecting data.

Alternative option for difficult websites

Some websites deploy advanced anti-bot systems.

In those situations, managing browser infrastructure becomes increasingly complex. Decodo’s Site Unblocker provides a managed alternative. 

Designed as a proxy-like solution, it allows you to retrieve useful web data from hard-to-access websites without launching a web scraper or deploying a headless browser. You can visit difficult websites without getting blocked, encountering CAPTCHAs or IP bans, especially when collecting public data.

Proxy best practices

Follow these recommendations when scaling browser automation with Watir Ruby:

  • Rotate IPs between browser sessions
  • Avoid excessive request rates
  • Use realistic browsing patterns
  • Monitor response quality
  • Distribute traffic over time

Ruby scraper, rotating IPs

Plug Decodo's residential proxies into your Watir setup and stop getting blocked after the first handful of requests. 115M+ IPs, 195+ countries.

A complete working scraper

Here's a full production-ready script that combines headless mode, proxy support, multi-page pagination, element extraction, polite delays, and CSV output. It scrapes all 1,000 books from books.toscrape.com across 50 pages.

require 'watir'
require 'webdrivers'
require 'csv'
# Credentials loaded from environment
proxy_username = ENV.fetch('PROXY_USERNAME', 'YOUR_PROXY_USERNAME')
proxy_password = ENV.fetch('PROXY_PASSWORD', 'YOUR_PROXY_PASSWORD')
proxy_host = 'gate.decodo.com'
proxy_port = '7000'
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument '--headless'
options.add_argument '--no-sandbox'
options.add_argument '--disable-dev-shm-usage'
options.add_argument "--proxy-server=http://#{proxy_host}:#{proxy_port}"
capabilities = Selenium::WebDriver::Remote::Capabilities.chrome(
proxy: Selenium::WebDriver::Proxy.new(
type: :manual,
http: "http://#{proxy_username}:#{proxy_password}@#{proxy_host}:#{proxy_port}",
ssl: "http://#{proxy_username}:#{proxy_password}@#{proxy_host}:#{proxy_port}"
)
)
browser = Watir::Browser.new :chrome, options: options
books = []
page = 1
begin
loop do
url = page == 1
? 'https://books.toscrape.com'
: "https://books.toscrape.com/catalogue/page-#{page}.html"
browser.goto url
browser.element(css: 'article.product_pod').wait_until(&:present?)
browser.elements(css: 'article.product_pod').each do |article|
title = article.element(css: 'h3 a').attribute_value('title')
price = article.element(css: 'p.price_color').text
rating = article.element(css: 'p.star-rating').attribute_value('class').split.last
books << { title: title, price: price, rating: rating }
end
puts "Page #{page} scraped — #{books.length} books so far"
break unless browser.link(css: 'li.next a').present?
page += 1
sleep(rand(2..4))
end
ensure
browser.close
end
CSV.open('books.csv', 'w') do |csv|
csv << ['Title', 'Price', 'Rating']
books.each { |b| csv << [b[:title], b[:price], b[:rating]] }
end
puts "Done. #{books.length} books saved to books.csv."

Run it:

ruby full_scraper.rb

Expected output

For Books to Scrape, the script should collect:

  • 1,000 books 
  • 50 pages 
  • Title 
  • Price 
  • Star rating 

Then produce a CSV similar to:

Title

Price

Rating

A Light in the Attic

£51.77

Three

Tipping the Velvet

£53.74

One

Soumission

£50.10

One

Open books.csv in a spreadsheet application to view, sort, and filter the collected data.

Best practices and common pitfalls in Watir rub scraping

Reliable scraping projects depend on more than code that works once. Good practices help scripts remain stable as websites change, while poor habits often lead to broken automation and difficult debugging sessions. The following practices can help you improve reliability, maintainability, and long-term performance.

Always use explicit waits instead of sleep

Watir's automatic waiting handles most cases, but complex scenarios need explicit conditions.

Explicit waits allow scripts to react to actual page conditions rather than fixed delays. Watir's waiting features make automation more reliable because the script proceeds immediately when content becomes available.

Using sleep can introduce unnecessary delays and often cause failures when pages load slower than expected.

Handle errors gracefully

Element availability can change without warning. You need to wrap interactions inside begin/rescue blocks to prevent the entire script from crashing when one element is missing.

Handling errors gracefully can improve resilience and allow the scraper to continue processing remaining pages. It also helps generate useful logs for troubleshooting.

Example:

begin
price = browser.element(css: 'span.sale-price').text
rescue Watir::Exception::UnknownObjectException
price = browser.element(css: 'span.regular-price').text
end

Choose resilient selectors

IDs and dedicated data attributes generally remain more stable than CSS classes. Front-end teams often change styling classes during redesigns, which can break scraping logic.

When possible, target unique identifiers rather than presentation-related classes. This approach reduces maintenance requirements over time.

Close browser sessions properly

Every browser session consumes system resources. Failing to close sessions can create orphaned browser processes that continue running after script completion.

Using an ensure block guarantees cleanup even when unexpected errors occur. This pattern keeps systems stable during long scraping jobs.

Example:

browser = Watir::Browser.new :chrome, options: options
begin
browser.goto("https://example.com")
ensure
browser.close
end

Respect rate limits

Sending rapid requests can trigger anti-bot systems and IP bans. You need to add small randomized delays between page visits to create more natural browsing behavior. You also need to practice polite scraping to minimize server load and improve long-term access reliability.

Use headless mode in production

Production environments rarely require visible browser windows. Use headless mode to reduce memory consumption and improve execution speed.

Most cloud servers don't provide graphical interfaces. Headless execution avoids display-related deployment issues.

Add logging

Logs create visibility into scraper behavior. Record navigation events, extracted records, and errors to make troubleshooting much easier.

Well-structured logs help identify failures quickly and simplify maintenance as projects grow.

Example:

require 'logger'
logger = Logger.new('scraper.log')
logger.info "Visiting #{url}"
browser.goto url
logger.info "Title: #{browser.title}"

Recognize CAPTCHAs and blocks

Websites often signal blocking through CAPTCHA pages, login challenges, or unusual error messages. Learn how to bypass CAPTCHAs to prevent inaccurate data collection.

Example:

if browser.element(css: '.g-recaptcha').present?
logger.warn "CAPTCHA detected — rotating proxy and retrying"
sleep(rand(10..20))
end

You also need to recognize when a site is blocking your scraper and implement fallback strategies such as proxy rotation, request throttling, or alternative collection methods. 

Final thoughts

Watir remains one of the most approachable browser automation libraries for Ruby developers. It combines Selenium's browser support with a cleaner API, built-in waiting mechanisms, and syntax that keeps automation code readable.

It works well for Ruby-centric teams, automated testing, browser workflows, and small to medium-scale scraping on JavaScript-heavy sites. For larger workloads that need distributed infrastructure, proxy rotation, and anti-bot bypass, managed solutions reduce the operational overhead significantly. If you want to skip browser infrastructure entirely, Decodo's Web Scraping API handles rendering, proxies, and detection in a single endpoint.

Skip the browser infrastructure

Decodo's Web Scraping API handles rendering, proxy rotation, and anti-bot bypass so your Ruby code just parses the response. No Selenium, no headless Chrome to maintain.

Share article:

About the author

Justinas Tamasevicius

Director of Engineering

Justinas Tamaševičius is Director of Engineering with over two decades of expertise in software development. What started as a self-taught passion during his school years has evolved into a distinguished career spanning backend engineering, system architecture, and infrastructure development.

Connect with Justinas via LinkedIn.

All information on Decodo Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Decodo Blog or any third-party websites that may belinked therein.

Frequently asked questions

What is Watir used for?

Watir Ruby is used for browser automation, web application testing, and web data extraction. Developers use it to automate repetitive browser tasks such as form submissions, navigation, clicking buttons, scraping data from JavaScript-heavy websites, and validating application behavior across multiple browsers.

Its Ruby-friendly syntax makes automation easier than working directly with Selenium.

How does Watir compare to Selenium?

Watir Ruby uses Selenium WebDriver underneath but provides a simpler and more Ruby-oriented interface.

Selenium focuses on browser automation infrastructure, while Watir adds convenience features such as automatic waiting, cleaner element interactions, and more readable syntax.

Many Ruby developers choose Watir because it reduces boilerplate code while maintaining Selenium's browser compatibility.

What are the benefits of using Watir?

Watir Ruby offers easy browser automation, cross-browser support, automatic waits, and straightforward element interaction methods.

It works with Chrome, Firefox, Edge, and Safari while hiding much of Selenium's complexity.

These features help developers create more reliable automation scripts, reduce maintenance effort, and improve productivity when testing or scraping websites.

Authentication method showing users; Endpoint generator showing code & language selector on dark gradient

Web Scraping with Ruby: A Simple Step-by-Step Guide

Web scraping with Ruby might not be the first language that comes to mind for data extraction – Python usually steals the spotlight here. However, Ruby's elegant syntax and powerful gems make it surprisingly effective. This guide walks you through building Ruby scrapers from your first HTTP request to production-ready systems that handle JavaScript rendering, proxy rotation, and anti-bot measures. We'll cover essential tools like HTTParty and Nokogiri, show practical code examples, and teach you how to avoid blocks and scale safely.

Authentication method and Endpoint generator panels showing code snippet and auth options in a dark dotted gradient UI mockup

Playwright vs. Selenium in 2026: Which Browser Automation Tool Should You Choose?

As websites become more dynamic and better at detecting automated traffic, choosing the right automation tool has become more challenging. At the same time, performance, reliability, and anti-detection capabilities matter more than ever. Two tools dominate the space: Selenium, a mature and widely adopted standard, and Playwright, a newer framework built for modern web apps. This guide compares them through practical use cases like web scraping and dynamic content extraction to help you decide which fits your needs best.

JS logo overlaying a glowing blue code snippet on a dark abstract background

JavaScript Web Scraping Tutorial (2026)

Ever wished you could make the web work for you? JavaScript web scraping allows you to gather valuable information from websites in an automated way, unlocking insights that would be difficult to collect manually. In this guide, you'll learn the key tools, techniques, and best practices to scrape data efficiently, whether you're a beginner or a developer looking to streamline data collection.

© 2018-2026 decodo.com (formerly smartproxy.com). All Rights Reserved