Inside Selenium WebDriver

Everything under the sun has already been written about the Selenium WebDriver. The amount of online tutorials is tremendous, new posts spring like mushrooms after the rain, there are various forums solely dedicated to Selenium, and it’s widely known that Selenium is considered to be a standard choice in the test automation field.

Personally, I’ve reviewed dozens, if not hundreds, of different sites that dedicate their entire content to Selenium. I’ve seen endless theoretical backgrounds, examples, explanations, installations, code segments… Everything but one – What goes on behind the scenes? How is Selenium WebDriver “built”? What’s inside Selenium WebDriver?

We all know that Selenium WebDriver consists code libraries, and these libraries enable us to perform actions on various browsers. But how do these code libraries look like? How are they built? What’s their architecture? What relation exists between its different classes/interfaces that build these code libraries?

Before going further, I have a few remarks:

  • The code libraries are enormous, and many hours were invested in planning, designing, writing and fixing them. Thus, I won’t be able to discuss everything in this post, but I’ll try to cover the basics and most of the important parts, at least until I’ll run out of ink   😉
  • Selenium WebDriver’s code libraries are executed in a variety of programming languages. In this post, I decided to focus on Selenium’s most popular language – Java.
  • I’ll show the code libraries I downloaded from the internet, but you can also see their display in the project’s documentation.
  • I’ll use the 3.4.0 version of Selenium WebDriver.

This is the project and the way it looks in the file system:

Selenium WebDriver01

 

There are libraries that have browser names, such as: Chrome, Firefox, Edge, etc. Inside these libraries, there are classes that are responsible for the browser’s functionality. For example, in the Edge library there are classes that are in charge of the functionality of the Edge browser, and so on. In the image below you can see what the Chrome library includes:

Selenium WebDriver02

 

 

 

Those of you who have already had the chance to automate with Selenium WebDriver on the Chrome browser are surely familiar with this command:

WebDriver driver = new ChromeDriver();

This command opens a session to the Chrome browser, on which we will later work and operate on. But what exactly is this code line? What does it means? It creates a WebDriver object called driver and we initiate it through the ChromeDriver constructor. Wait! What? Let’s break it down:

The WebDriver is an interface, whereas the ChromeDriver is a class that inherits from the WebDriver (Well, not exactly – It inherits from a different class that implements the WebDriver. We’ll get into that in a moment). Inside the ChromeDriver class there is a constructor, which is a method with the same name as the class. In fact, there are a few constructors that are executed in this class (It’s called: Method Overload, or in this case: Constructor Overload). This is the class:

Selenium WebDriver03

 

In this class there are methods that are deprecated. Notice that this class inherits from a different class called: RemoteWebDriver. Let’s open it up and see its content:

Selenium WebDriver04

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

What do we see in the image above?

  • We see executions of a few constructors.
  • We can even see a few of the developers’ comments on the code (regarding things that require additional development/fixing/updating documentation).
  • This class implements two interfaces: JavaScriptExecutor and WebDriver.

Now, let’s open it up and see the WebDriver’s interface:

Selenium WebDriver05

 

Those of you who have already had a chance to work with Selenium, probably recognize here a few of the methods, such as: findElement that returns a WebElement object; findElements that return a list of WebElement; or the getTitle that returns a string; etc. As mentioned above, the WebDriver is the interface and in the image above we’re just stating the methods, and their executions will be performed later on in the inheriting classes. We can see that this interface implements an additional interface called: SearchContext. Let’s open it:

Selenium WebDriver06

 

 

 

 

 

 

 

 

 

 

 

 

In the image above, we see the usage of findElement and findElements, that are functions through which we can identify the elements on the page.
The SearchContext interface has two other implementing interfaces: One is the WebDriver we’ve already seen; and the second is the WebElement:

Selenium WebDriver07

 

 

 

 

 

The image above demonstrates the hierarchy we talked about: The ChromeDriver inherits from the RemoteWebDriver that implements the WebDriver, which implements the SearchContext. At the same time there’s also the WebElement that implements the SearchContext, and the RemoteWebElement that implements the WebElement.
Now, let’s look at the WebElement interface, and after that we’ll take a look into The RemoteWebElement class:

Selenium WebDriver08

 

Again, those who have already had a chance to work with Selenium will probably be familiar with some/all of the existing methods, such as: getText, isEnabled, submit, click, and others.

The WebElement is an interface that is implemented by the RemoteWebElement class. Let’s open it:

Selenium WebDriver09

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In the image above you can see the implementations of commands that belong to the elements, for example: the click command. How exactly does it work?

The click function calls to an additional function – execute To the execute function we send the DriverCommand.CLICK_ELEMENT and another parameter of ImmutableMap that represents the sessionId (we’ll see it below). What is the DriverCommand?

The DriverCommand is an additional interface in which there are defined constants for functions, and they are defined in the WebDriver JSON protocol. In fact, the WebDriver JSON wire protocol is WebDriver’s way of communicating with the Drivers’ implementations (The IEDriver, ChromeDriver, FireFoxDriver, etc). By the way, this interface is also called: Empty interface (it does not consist any methods).

This is how the DriverCommand interface looks like:

Selenium WebDriver10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In order to understand how it works, we’ll have to open the Json Wire Protocol project in GitHub, which for example, has the documentation for the various executions of Selenium WebDriver’s known functions, such as: the click function, the clear function, etc.

We can continue to wander deeper inside Selenium WebDriver’s “maze” and explore the other various classes, interfaces and libraries, such as the various Exceptions classes; different types of Waits; in the support library there are executions of FindBy, Page Factory; and the list goes on and on…

I hope you enjoyed going into the depth of Selenium WebDriver 😉  Please share your thoughts in the comments below.