Selenium Demystified

The Selenium Story

In 2004, Jason Huggins developed an early version of Selenium-RC as an internal tool for ThoughtWorks. Selenium-IDE was originally created by Shinya Kasatani and donated to the Selenium project in 2006. Google got involed in 2009 and is largely responsible for what became WebDriver. Then in 2012 a W3C standard draft was pusblished, solidifying WebDriver’s place as the defacto standard. Today Selenium is widely used in the QA community. Just hop on any employment board and see how many QA Automation positions mention Selenium.


You understand basic HTML and JavaScript for the Selenium IDE content.

For the WebDriver content you’ll need a basic understanding of Java or other supported language.

Selenium IDE

The Selenium IDE is a Firefox add-on that allows the user to record, click, and play back actions in the browser. If you have ever used macros in Microsoft’s Office products or AutoIT this will be familiar. These tests are saved in HTML files in a format referred to as Selenese. The test playback is done via JavaScript [source]. So when you have Command = open and Target =


it is executed almost like you opened up the Developer Tools Console (F12) and in the console tab ran window.location.href = '';


Selenium Tests

Test steps are comprised of three things; command, target, and value.


  • A command is what tells Selenium what to do. Selenium commands come in three ‘flavors’;
    • Actions are commands that generally manipulate the state of the application. They do things like “click this link” and “select that option”. If an Action fails, or has an error, the execution of the current test is stopped.
      • Many Actions can be called with the “AndWait” suffix, e.g. “clickAndWait”. This suffix tells Selenium that the action will cause the browser to make a call to the server, and that Selenium should wait for a new page to load.
    • Accessors examine the state of the application and store the results in variables, e.g. “storeTitle”. They are also used to automatically generate Assertions.
    • Assertions are like Accessors, but they verify that the state of the application conforms to what is expected. Examples include “make sure the page title is X” and “verify that this checkbox is checked”.
  • The target is a element locator for the html element. There are a number of supported selectors;
    • By ID – This is the most efficient and preferred way to locate an element. Common pitfalls that UI developers make is having non-unique id’s on a page or auto-generating the id, both should be avoided. A class on an html element is more appropriate than an auto-generated id.
    • By Class – “Class” in this case refers to the attribute on the DOM element. Often in practical use there are many DOM elements with the same class name, thus finding multiple elements becomes the more practical option over finding the first element.
    • By Tag Name – The DOM Tag Name of the element.
    • By Name – Find the input element with matching name attribute.
    • By Link Text – Find the link element with matching visible text.
    • By Partial Link Text – Find the link element with partial matching visible text.
    • By CSS -Like the name implies it is a locator strategy by css. Native browser support is used by default, so please refer to w3c css selectors for a list of generally available css selectors. If a browser does not have native support for css queries, then Sizzle is used. IE 6,7 and FF3.0 currently use Sizzle as the css query engine.
      • Beware that not all browsers were created equal, some css that might work in one version may not work in another.
    • By XPATH – At a high level, WebDriver uses a browser’s native XPath capabilities wherever possible. On those browsers that don’t have native XPath support, we have provided our own implementation. This can lead to some unexpected behaviour unless you are aware of the differences in the various xpath engines.
    • Using JavaScript – You can execute arbitrary javascript to find an element and as long as you return a DOM Element, it will be automatically converted to a WebElement object.
  • The value is used for certain commands. Official documentation is currently lacking in respect to a list of all API endpoints and their support with various frameworks. Here is a list of all actions, though.

IDE to WebDriver

You can “write” your tests by using Selenium IDE to record your actions. Then you just need to export to your language of choice. You may find that some of your tests are not exported how you might expect. Some libraries are not fully fleshed out so you’ll get a comment in the exported coded saying that the Command is unsupported. When this happens you’ll need to code something language specific to suit your needs.

After you export the test from IDE to WebDriver, your code will now be interpreted by WebDriver and executed on the browser.


It is more likely that you will build a suite of tests and have them executed by something like JUnit or NUnit. You could even run on multiple machines using something like Selenium Grid2, too. Many CI tools, like Jenkins, can be used to kick off the tests automatically with any new code changes.

Selenium WebDriver

The Selenium WebDriver is an application  that works as a middle man between your automation code and your browser. This application called be called from many languages and frameworks using the WebDriver API. The level of support you get differs based on the programming language, operating system, and browser you use. Know that Selenium is developed  in Java + Windows + Firefox. There is limited support for Linux and OSX. Additional browsers are supported by extensions. Headless browsers, like HtmlUnit, allow a UI-less WebView to be used as a web browser.