In this series, my goal is to provide knowledge that will help you become a Selenium WebDriver Superstar! 🌟
There are 3 components in the Selenium family suite. Selenium WebDriver is an API responsible for automating our browser through a driver.
- Selenium supports the automation of browsers by sending and receiving commands.
- WebDriver communicates with the browser “i.e. Chrome” through a driver “chromedriver.exe”.
The other 2 components are Selenium IDE and Selenium Grid. Selenium IDE records and playback our test. Selenium Grid executes our test across multiple operating systems, browsers, and machines. By the end of this series, you will learn about Selenium WebDriver: From A to Z and the Best Practices for Selenium Automation that comes with it.
Tutorial Chapters – Selenium WebDriver: From A to Z
- Selenium WebDriver: From A to Z (Chapter 1)
Install Selenium
Selenium can be installed multiple ways into our IDE. We can download from Selenium’s Official Site or Maven’s Repository. The official site for Selenium has the latest releases for all components plus previous releases and source code. Maven’s Repository is a website that holds many plugins, project jars, library jars, and artifacts for our project.
Selenium’s Official Site
We go to https://www.selenium.dev/downloads/ for Selenium’s Official Site, then navigate to Selenium Client & WebDriver Language Bindings section to download Selenium. The languages are Ruby, Java, Python, C#, and JavaScript. Currently, the Stable Version is 3.14 and Alpha Version is 4.0 (you can read more about the new version here: Selenium 4 new features).
Maven’s Repository
To install from Maven’s Repository, go to https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java.
After selecting our desired Selenium version, we copy and paste the dependency for Maven to a pom.xml file or Gradle to a build.gradle file. Maven and Gradle build automation tools with a similar purpose for managing our projects. Both help bypass manual work such as downloading and configuring Selenium. Here’s a Selenium 4 dependency for Maven and Gradle.
<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java --> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>4.0.0-alpha-6</version> </dependency>
// https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java compile group: 'org.seleniumhq.selenium', name: 'selenium-java', version: '4.0.0-alpha-6'
Understand the DOM
Automating a browser begins with understanding the Document Object Model (DOM) which manages each web page. The DOM is an API that handles HTML documents as a tree structure. HTML stands for Hyper Text Markup Language which is designed to create web pages. We can right-click a web page like TestProject’s Example Page, select Inspect, and view the DOM.
At the top of every DOM, is an HTML root element consisting of 2 elements: head and body.
An element consists of a start tag and an optional end tag with content inserted between both tags. In the following example, let’s inspect the Full Name field.
<input id="name" type="text" class="form-control" placeholder="Enter your full name" required="">
The input tag is followed by attributes and attribute values. Attributes provide extra information about an HTML element.
Here’s a breakdown of the Full Name field:
- input is the tag name and used to accept information from a user.
- id=”name”: id is an attribute to determine a unique identifier for the element with “name” as the attribute value.
- type=”text”: type is the attribute with “text” as the attribute value and it defines a 1-line text field.
- class=”form-control”: class is the attribute with ”form-control” as the attribute value.
- placeholder=”Enter your full name”: placeholder is an attribute that provides a clue of the expected information. The value “Enter your full name” is displayed in the field.
- required is an attribute to make sure the user enters information.
In Google’s Chrome or Microsoft’s Edge browser, we can type CTRL + F to find the element. A box labeled Find by string, selector, or XPath is available to search for a value. For this element, the best way to find Full Name is by id since it’s a unique identifier. In this example, #name successfully finds the Full Name element which is highlighted yellow in the DOM.
Find WebElements
One of the first steps to automating a task is finding the WebElement. A WebElement is an element from HTML displayed on a web page. The WebElements show up as buttons, images, links, or anything visible on a web page. We are unable to perform any Selenium action until a WebElement is found. To find the WebElement, our Test Script needs a WebDriver, findElement(s) method, and locator.
WebDriver Interface
WebDriver is an interface that talks to the browser. Per Selenium’s documentation, the methods in this interface fall into 3 categories:
- Control of the browser itself
- Selection of WebElements
- Debugging aids
To control the browser, we start by writing statement lines like the following for
- Python: driver = webdriver.Chrome(executable_path=””)
- C#: IWebDriver Driver = new ChromeDriver();
- Java: WebDriver driver = new ChromeDriver();
The above statements take control of Google’s Chrome browser. However, a similar statement line will take control of any browser by changing ChromeDriver to something like FirefoxDriver. After controlling the browser, the 2nd category is the Selection of WebElements. We select WebElements using the findElement(s) method and one of several locators provided by Selenium.
findElement(s) Method
A WebElement can be located using a search value entered from the DOM. In the example from Understanding the DOM, we found the Full Name field using #name. To locate a WebElement, Selenium offers 2 methods: findElement and findElements.
- findElement() finds the first WebElement matching a provided criterion.
- findElements() find a group of WebElements matching a provided criterion.
Locators
Selenium has multiple locators for locating the same WebElement. We have the option of choosing our favorite locator such as id, xpath or cssSelector. With the release of Selenium 4, there are additional methods called Relative Locators to assist in locating a WebElement.
8 Selenium Locators
The 8 Selenium locators in alphabetical order are:
- className – locates an element based on the value of a “class” attribute.
- cssSelector – locates elements based on a CSS value.
- id – locates an element based on the value of an “id” attribute.
- linkText – locates an element based on the value of the exact text in a web page.
- name – locates an element based on the value of the “name” attribute.
- partialLinkText – locates an element based on the value of the partial text in a web page.
- tagName – locates an element based on their tag name.
- xpath – locates an element based on an XPath value.
Relative Locators
The Relative Locators find a specific element based on the position of another element. Selenium 4 introduced the following 5 Relative Locators:
- above() – finds an element or elements located above a fixed element.
- below() – finds an element or elements located below a fixed element.
- near() – finds an element or elements located near a fixed element.
- toLeftOf() – finds an element or elements located to the left of a fixed element.
- toRightOf() – finds an element or elements located to the right of a fixed element.
We can use any valid locator to find an element. Let’s use id, cssSelector, and xpath to locate the Full Name field on TestProject Example Page.
@Test public void locateFullName () { WebDriverManager.chromedriver().setup(); WebDriver driver = new ChromeDriver (); driver.manage().window().maximize(); driver.get("https://example.testproject.io/web/index.html"); // Locate Full Name By ID driver.findElement(By.id("name")); // Locate Full Name By CSS Selector driver.findElement(By.cssSelector("#name")); // Locate Full Name By XPath driver.findElement(By.xpath("//input[@id='name']")); }
Selenium Methods
Selenium WebDriver is a compilation of API’s utilized for automating an Application Under Test (AUT). There are many Selenium methods for performing an action but the most used methods can be grouped into 5 categories:
Some of the categories fall under the same Interface. For example, the Browser, Navigation, and Switch Methods are located under the WebDriver Interface. However, the WebElement Methods have its own WebElement Interface while the Wait Methods are located in 2 separate places: Options Interface and selenium.support.ui package.
Note: A detailed article has been written for each category via TestProject Blogs: Browser, WebElement, Navigation, Wait, and Switch.
Browser Methods
The Browser Methods perform actions on a browser. Those methods are:
- close() – closes the current active window.
- get() – loads a new web page.
- getCurrentUrl() – gets a string defining the current web page URL.
- getPageSource() – gets the complete page source of the loaded web page.
- getTitle() – gets the current page title.
- quit() – stops running the driver and close associated windows
WebElement Methods
The WebElement Methods perform actions on WebElements. A WebElement is an element from the DOM that shows up on a web page as a button, link, text field, etc. There are 16 methods that help automate an action on a WebElement.
- clear()
- click()
- findElement()
- findElements()
- getAttribute()
- getCssValue()
- getLocation()
- getRect()
- getSize()
- getTagName()
- getText()
- isDisplayed()
- isEnabled()
- isSelected()
- sendKeys()
- submit()
Navigation Methods
The Navigation Methods load a web page, refresh a web page, or move backwards and forwards in our browser’s history. We access these methods after writing driver.navigate() then the following are available to perform an action:
- back() – moves backward one page in the browser’s history.
- forward() – moves forward one page in the browser’s history.
- refresh() – refreshes the current page.
- to() – loads a new web page.
Wait Methods
The Wait Methods pause between execution statements. It’s important to wait because an exception may appear due to Selenium executing our Test Script so fast. For debugging and demos, we may use Thread.sleep() to pause. However, Thread.sleep() is a Java sleep method and not a Selenium Wait Method. The following are ways to dynamically pause an execution statement:
- pageLoadTimeOut() – sets the wait time for a page to load.
- implicitlyWait() – amount of time a driver should wait for an element.
- Explicit Wait – pause execution until time expires or an expected condition is met using the WebDriverWait class.
- FluentWait – the core of explicit wait because WebDriverWait extends FluentWait.
Switch Methods
The Switch Methods switch to alerts, windows, and frames. An alert is a pop-up box that contains information or expects a response from the user. The 3 types of alerts are
- Information Alerts contain a message with 1 button.
- Confirmation Alerts contain a message with 2 buttons.
- Prompt Alerts contain a message with 2 buttons and a text field.
We can perform the following actions on an alert:
- accept() – accepts the alert by clicking OK button.
- dismiss() – cancels the alert by clicking Cancel button.
- getText() – gets text from the alert.
- sendKeys() – types text into the alert.
Switching to a window requires Selenium to assign a window handle. It’s a unique alphanumeric id that helps change control to a window. The following 3 methods retrieve a window handle(s) and switch to a window:
- getWindowHandle() – get the current window handle.
- getWindowHandles() – get a set of window handles.
- switchTo() – switch focus between each window.
Switching to a frame has 3 methods to select a frame. The methods are:
- frame(WebElement element) – selects a frame by WebElement.
- frame(String nameOrId) – selects a frame by a name or ID.
- frame(int index) – selects a frame by its (zero-based) index.
Here’s a link to the Switch Methods article consisting of screenshots, code, and examples: https://blog.testproject.io/2020/06/18/selenium-switch-methods-chapter-5/.
getScreenshotAs Method
Capturing a screenshot using Selenium is beneficial when a Test Script fails. It provides a way for an engineer to analyze what was incorrect during a test execution. The getScreenshotAs method allows us to take a screenshot of the page or WebElement then store it in a specified location.
Actions Class
The Actions Class contains advanced methods because they handle keyboard and mouse interactions. We can carry out actions such as dragging and dropping a WebElement, keying down then keying up on our keyboard. To use the Actions class, we must instantiate the class then add an object like act equal to new Actions. Afterward, the driver must be placed in parenthesis to complete the instantiation. Here’s an example of the syntax:
Actions act = new Actions(driver);
Next, is to type the object act and the dot operator so all of the methods are revealed in the intellisense. Here’s a code snippet and screenshot of some methods in the Actions class. Also, a video below that explains the difference between Actions Class and Action Interface:
Actions act = new Actions(driver); act.
That wraps up Selenium WebDriver: From A to Z! The information in this article is relevant for beginners, automation engineers that are seeking to refresh their knowledge, and experienced Selenium WebDriver Test Script creators. In the next article, I will write about the Best Practices for Selenium Automation.
Stay Tuned 😊