logo logo

Selenium WebDriver: From A to Z

Selenium WebDriver: From A to Z

In this series, my goal is to provide knowledge that will help you become a Selenium WebDriver Superstar! 🌟
There are 3 components in the Selenium family suite. Selenium WebDriver is an API responsible for automating our browser through a driver.

  • Selenium supports the automation of browsers by sending and receiving commands.
  • WebDriver communicates with the browser “i.e. Chrome” through a driver “chromedriver.exe”.

The other 2 components are Selenium IDE and Selenium Grid. Selenium IDE records and playback our test. Selenium Grid executes our test across multiple operating systems, browsers, and machines. By the end of this series, you will learn about Selenium WebDriver: From A to Z and the Best Practices for Selenium Automation that comes with it.

Tutorial Chapters – Selenium WebDriver: From A to Z


Install Selenium

Selenium can be installed multiple ways into our IDE. We can download from Selenium’s Official Site or Maven’s Repository. The official site for Selenium has the latest releases for all components plus previous releases and source code. Maven’s Repository is a website that holds many plugins, project jars, library jars, and artifacts for our project.

Selenium’s Official Site

We go to https://www.selenium.dev/downloads/ for Selenium’s Official Site, then navigate to Selenium Client & WebDriver Language Bindings section to download Selenium. The languages are Ruby, Java, Python, C#, and JavaScript. Currently, the Stable Version is 3.14 and Alpha Version is 4.0 (you can read more about the new version here: Selenium 4 new features).

Selenium Client & WebDriver Language Bindings

Maven’s Repository

To install from Maven’s Repository, go to https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java.

Maven's Repository

After selecting our desired Selenium version, we copy and paste the dependency for Maven to a pom.xml file or Gradle to a build.gradle file. Maven and Gradle build automation tools with a similar purpose for managing our projects. Both help bypass manual work such as downloading and configuring Selenium. Here’s a Selenium 4 dependency for Maven and Gradle.

Selenium 4 dependency for Maven

<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->

<dependency>

    <groupId>org.seleniumhq.selenium</groupId>

    <artifactId>selenium-java</artifactId>

    <version>4.0.0-alpha-6</version>

</dependency>

Selenium 4 dependency for Gradle

// https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java

compile group: 'org.seleniumhq.selenium', name: 'selenium-java', version: '4.0.0-alpha-6'

Understand the DOM

Automating a browser begins with understanding the Document Object Model (DOM) which manages each web page. The DOM is an API that handles HTML documents as a tree structure. HTML stands for Hyper Text Markup Language which is designed to create web pages. We can right-click a web page like TestProject’s Example Page, select Inspect, and view the DOM.
At the top of every DOM, is an HTML root element consisting of 2 elements: head and body.

TestProject’s Example Page

TestProject’s Example Page - Inspect the DOM

An element consists of a start tag and an optional end tag with content inserted between both tags. In the following example, let’s inspect the Full Name field.

TestProject’s Example Page - Inspect the DOM

<input id="name" type="text" class="form-control" placeholder="Enter your full name" required="">

The input tag is followed by attributes and attribute values. Attributes provide extra information about an HTML element.

Here’s a breakdown of the Full Name field:

  • input is the tag name and used to accept information from a user.
  • id=”name”: id is an attribute to determine a unique identifier for the element with “name” as the attribute value.
  • type=”text”: type is the attribute with “text” as the attribute value and it defines a 1-line text field.
  • class=”form-control”: class is the attribute with ”form-control” as the attribute value.
  • placeholder=”Enter your full name”: placeholder is an attribute that provides a clue of the expected information. The value “Enter your full name” is displayed in the field.
  • required is an attribute to make sure the user enters information.

In Google’s Chrome or Microsoft’s Edge browser, we can type CTRL + F to find the element. A box labeled Find by string, selector, or XPath is available to search for a value. For this element, the best way to find Full Name is by id since it’s a unique identifier. In this example, #name successfully finds the Full Name element which is highlighted yellow in the DOM.

TestProject’s Example Page - Inspect the DOM

TestProject’s Example Page - Inspect the DOM

Find WebElements

One of the first steps to automating a task is finding the WebElement. A WebElement is an element from HTML displayed on a web page. The WebElements show up as buttons, images, links, or anything visible on a web page. We are unable to perform any Selenium action until a WebElement is found. To find the WebElement, our Test Script needs a WebDriver, findElement(s) method, and locator.

WebDriver Interface

WebDriver is an interface that talks to the browser. Per Selenium’s documentation, the methods in this interface fall into 3 categories:

  1. Control of the browser itself
  2. Selection of WebElements
  3. Debugging aids

WebDriver Interface

To control the browser, we start by writing statement lines like the following for

  • Python: driver = webdriver.Chrome(executable_path=””)
  • C#: IWebDriver Driver = new ChromeDriver();
  • Java: WebDriver driver = new ChromeDriver();

The above statements take control of Google’s Chrome browser. However, a similar statement line will take control of any browser by changing ChromeDriver to something like FirefoxDriver. After controlling the browser, the 2nd category is the Selection of WebElements. We select WebElements using the findElement(s) method and one of several locators provided by Selenium.

findElement(s) Method

A WebElement can be located using a search value entered from the DOM. In the example from Understanding the DOM, we found the Full Name field using #name. To locate a WebElement, Selenium offers 2 methods: findElement and findElements.

  • findElement() finds the first WebElement matching a provided criterion.
  • findElements() find a group of WebElements matching a provided criterion.

Locators

Selenium has multiple locators for locating the same WebElement. We have the option of choosing our favorite locator such as id, xpath or cssSelector. With the release of Selenium 4, there are additional methods called Relative Locators to assist in locating a WebElement.

8 Selenium Locators

The 8 Selenium locators in alphabetical order are:

  1. className – locates an element based on the value of a “class” attribute.
  2. cssSelector – locates elements based on a CSS value.
  3. id – locates an element based on the value of an “id” attribute.
  4. linkText – locates an element based on the value of the exact text in a web page.
  5. name – locates an element based on the value of the “name” attribute.
  6. partialLinkText – locates an element based on the value of the partial text in a web page.
  7. tagName – locates an element based on their tag name.
  8. xpath – locates an element based on an XPath value.

Selenium Locators

Relative Locators

The Relative Locators find a specific element based on the position of another element. Selenium 4 introduced the following 5 Relative Locators:

  1. above() – finds an element or elements located above a fixed element.
  2. below() – finds an element or elements located below a fixed element.
  3. near() – finds an element or elements located near a fixed element.
  4. toLeftOf() – finds an element or elements located to the left of a fixed element.
  5. toRightOf() – finds an element or elements located to the right of a fixed element.

Relative Locators

We can use any valid locator to find an element. Let’s use id, cssSelector, and xpath to locate the Full Name field on TestProject Example Page.

@Test

  public void locateFullName () {

    WebDriverManager.chromedriver().setup();
    WebDriver driver = new ChromeDriver ();        
    driver.manage().window().maximize();
    driver.get("https://example.testproject.io/web/index.html");

    // Locate Full Name By ID
    driver.findElement(By.id("name"));

    // Locate Full Name By CSS Selector
    driver.findElement(By.cssSelector("#name"));

    // Locate Full Name By XPath
    driver.findElement(By.xpath("//input[@id='name']"));

 }

Selenium Methods

Selenium WebDriver is a compilation of API’s utilized for automating an Application Under Test (AUT). There are many Selenium methods for performing an action but the most used methods can be grouped into 5 categories:

  1. Browser Methods
  2. WebElement Methods
  3. Navigation Methods
  4. Wait Methods
  5. Switch Methods

Some of the categories fall under the same Interface. For example, the Browser, Navigation, and Switch Methods are located under the WebDriver Interface. However, the WebElement Methods have its own WebElement Interface while the Wait Methods are located in 2 separate places: Options Interface and selenium.support.ui package.

Note: A detailed article has been written for each category via TestProject Blogs: Browser, WebElement, Navigation, Wait, and Switch.

Browser Methods

The Browser Methods perform actions on a browser. Those methods are:

  • close() – closes the current active window.
  • get() – loads a new web page.
  • getCurrentUrl() – gets a string defining the current web page URL.
  • getPageSource() – gets the complete page source of the loaded web page.
  • getTitle() – gets the current page title.
  • quit() – stops running the driver and close associated windows

Selenium Browser Methods

WebElement Methods

The WebElement Methods perform actions on WebElements. A WebElement is an element from the DOM that shows up on a web page as a button, link, text field, etc. There are 16 methods that help automate an action on a WebElement.

  1. clear()
  2. click()
  3. findElement()
  4. findElements()
  5. getAttribute()
  6. getCssValue()
  7. getLocation()
  8. getRect()
  9. getSize()
  10. getTagName()
  11. getText()
  12. isDisplayed()
  13. isEnabled()
  14. isSelected()
  15. sendKeys()
  16. submit()

Selenium WebElement Methods

Navigation Methods

The Navigation Methods load a web page, refresh a web page, or move backwards and forwards in our browser’s history. We access these methods after writing driver.navigate() then the following are available to perform an action:

  • back() – moves backward one page in the browser’s history.
  • forward() – moves forward one page in the browser’s history.
  • refresh() – refreshes the current page.
  • to() – loads a new web page.

Selenium Navigation Methods

Selenium Navigation Methods

Wait Methods

The Wait Methods pause between execution statements. It’s important to wait because an exception may appear due to Selenium executing our Test Script so fast. For debugging and demos, we may use Thread.sleep() to pause. However, Thread.sleep() is a Java sleep method and not a Selenium Wait Method. The following are ways to dynamically pause an execution statement:

  • pageLoadTimeOut() – sets the wait time for a page to load.
  • implicitlyWait() – amount of time a driver should wait for an element.
  • Explicit Wait – pause execution until time expires or an expected condition is met using the WebDriverWait class.
  • FluentWait – the core of explicit wait because WebDriverWait extends FluentWait.

Selenium Wait Methods

Selenium Wait Methods

Switch Methods

The Switch Methods switch to alerts, windows, and frames. An alert is a pop-up box that contains information or expects a response from the user. The 3 types of alerts are

  1. Information Alerts contain a message with 1 button.
  2. Confirmation Alerts contain a message with 2 buttons.
  3. Prompt Alerts contain a message with 2 buttons and a text field.

We can perform the following actions on an alert:

  • accept() – accepts the alert by clicking OK button.
  • dismiss() – cancels the alert by clicking Cancel button.
  • getText() – gets text from the alert.
  • sendKeys() – types text into the alert.

Selenium Switch Methods

Switching to a window requires Selenium to assign a window handle. It’s a unique alphanumeric id that helps change control to a window. The following 3 methods retrieve a window handle(s) and switch to a window:

  1. getWindowHandle() – get the current window handle.
  2. getWindowHandles() – get a set of window handles.
  3. switchTo() – switch focus between each window.

Selenium Switch Methods

Switching to a frame has 3 methods to select a frame. The methods are:

  1. frame(WebElement element) – selects a frame by WebElement.
  2. frame(String nameOrId) – selects a frame by a name or ID.
  3. frame(int index) – selects a frame by its (zero-based) index.

Selenium Switch Methods

Here’s a link to the Switch Methods article consisting of screenshots, code, and examples: https://blog.testproject.io/2020/06/18/selenium-switch-methods-chapter-5/.

getScreenshotAs Method

Capturing a screenshot using Selenium is beneficial when a Test Script fails. It provides a way for an engineer to analyze what was incorrect during a test execution. The getScreenshotAs method allows us to take a screenshot of the page or WebElement then store it in a specified location.

getScreenshotAs Method

Actions Class

The Actions Class contains advanced methods because they handle keyboard and mouse interactions. We can carry out actions such as dragging and dropping a WebElement, keying down then keying up on our keyboard. To use the Actions class, we must instantiate the class then add an object like act equal to new Actions. Afterward, the driver must be placed in parenthesis to complete the instantiation. Here’s an example of the syntax:

Actions act = new Actions(driver);

Next, is to type the object act and the dot operator so all of the methods are revealed in the intellisense. Here’s a code snippet and screenshot of some methods in the Actions class. Also, a video below that explains the difference between Actions Class and Action Interface:

Actions act = new Actions(driver);
act.

Action Class

That wraps up Selenium WebDriver: From A to Z! The information in this article is relevant for beginners, automation engineers that are seeking to refresh their knowledge, and experienced Selenium WebDriver Test Script creators. In the next article, I will write about the Best Practices for Selenium Automation.

Stay Tuned
😊

About the author

Rex Jones II

Rex Jones II has a passion for sharing knowledge about testing software. His background is development but enjoys testing applications.

Rex is an author, trainer, consultant, and former Board of Director for User Group: Dallas / Fort Worth Mercury User Group (DFWMUG) and member of User Group: Dallas / Fort Worth Quality Assurance Association (DFWQAA). In addition, he is a Certified Software Tester Engineer (CSTE) and has a Test Management Approach (TMap) certification.

Recently, Rex created a social network that demonstrate automation videos. In addition to the social network, he has written 6 Programming / Automation books covering VBScript the programming language for QTP/UFT, Java, Selenium WebDriver, and TestNG.

✔️ YouTube https://www.youtube.com/c/RexJonesII/videos
✔️ Facebook http://facebook.com/JonesRexII
✔️ Twitter https://twitter.com/RexJonesII
✔️ GitHub https://github.com/RexJonesII/Free-Videos
✔️ LinkedIn https://www.linkedin.com/in/rexjones34/

Leave a Reply

FacebookLinkedInTwitterEmail