Selenium: 7 Things You Need To Know

Image source: xkcd

Selenium can be one of the most powerful tools in the QA Engineers’ arsenal. Unfortunately, it can be time consuming to write, a nightmare to maintain, and it can produce way too many false negatives. At Lucid Software, that is where our suite was in the middle of 2014.

We had a suite of about 300 tests written by 40 different developers averaging 60-70 failures a night. In charge of it, we had 1 dedicated full time engineer whose job was simply to sort out which tests were legitimate failures and which tests were false negatives. This would frequently take his entire work day. When tests were proven to be too unreliable, they were blacklisted and pushed off into the pile of other tasks to be completed “someday.”

My interest in Selenium was piqued when a feature that I had added, along with the test for the feature, broke. We did not find the bug until it had already been in production for 2 weeks. There was absolutely no excuse for a bug like this to ever make it past development, let alone through QA and into production. I looked at what had happened to the test: it turned out it had been blacklisted (put in a list of skipped tests) about a month after I wrote it. The question arose: what needs to be done to make our Selenium test suite reliable, scalable, and maintainable?

Our entire team worked to answer those questions. Since that time, we have doubled the number of Selenium tests in our suite, while reducing the number of false negatives to less than 1%. On a regular basis, we are catching regressions during development and adding new tests to cover all new features in our products. Below, I have broken down the source of such dramatic improvement into seven major takeaways.

Make Tests Easier to Write

The number 1 complaint we had from developers around writing tests was that it took just as much work to write the tests as it did to fix the bug or add the feature. With that in mind, we knew that we had to make tests easier to write. This led to the creation of entities like Application User and  Application Driver.

1. Create An Application User

An Application User is the Selenium representation of the backend of the website. For us, this meant creating a new Lucidchart or Lucidpress user, setting up a subscription, and possibly creating a document. This class contained the helper methods to prepare a test scenario and all the teardown at the end of a test. This class also contained access to our backend services to make things such as adding team members, uploading images or fonts, and changing subscription levels really easy. The following is an example of how a developer would use the application user.


class EditorPerformanceTest extends LucidSpec {
 val user = new ChartUser

 override def beforeAll() {
   user.login()
   user.createDocument()
 }

 override def afterAll() {
   user.finished()
 }

In this situation, all setup was simplified to two easy method calls, leaving the test ready to go in the editor. At the end of the test, all of the tear down (closing the driver, database removal, etc) is taken care of in the finished method. By abstracting all of this, (as well as several other helper methods) into a User class, we made it much easier for developers to get a test set up and ready to validate a bug or feature.

2. Create an Application Driver

The Selenium API can be very daunting. There are around 20 different ways to get an element using the WebDriver. From there, there are countless ways to perform different actions such as dragging and dropping, clicking and right clicking, using the scroll wheel, and typing. In an effort to simplify this, and make it so all the developers need not become familiar with the entire WebDriver documentation, we created a driver to simplify the most common actions. This application driver extends WebDriver and adds the Selenium actions class. From there, we included methods combining the most common actions, such as clicking on an element, executing a script, and dragging and dropping web elements. The class looked something like the following UML diagram, and contained very simple methods such as the ones shown below.


def dragAndDrop(cssFrom: String, cssTo: String) {
 val elem1 = getElementByCss(cssFrom)
 val elem2 = getElementByCss(cssTo)
 actions.dragAndDrop(elem1, elem2)
}

def contextClickByCss(css: String) 
 actions.contextClick(getElementByCss(css))
}

When developers needed to do more complex actions, they still had access to the WebDriver and Actions classes, but for most tests, the limited functionality of the Lucid Driver was more than enough. This had the added bonus of making tests much easier to debug because all of the developers were now using the same methods instead of each developer searching through the Selenium API and finding a different way to perform identical functionality.

Make Tests Easier to Update

With an ever changing product and many developers adding and updating features, it is very easy for tests to become outdated. When a feature was updated, we needed a way to quickly port the old tests to work with the updated DOM. Using DOM IDs and the page object pattern helped us make tests that were easily updated and maintainable.

3. Use DOM IDs

Finding an element in the DOM can be one of the most challenging parts of a Selenium test. IDs provide a way for key elements to be uniquely identified within the entire product. In some of our original tests, we used XPaths, class paths, and other complex CSS selectors to locate important elements. However, when an element moved to a different place in the UI, or simply just changed its CSS class names (due to redesign or refactoring), updating the test required going back and finding that element again. With IDs, an element is identifiable regardless of where it is in the DOM and what styling is applied to it.

Below is an example of a major port we did in the Publish dialog of Lucidpress. This particular feature had 4 Selenium test suites, or around 30 tests in total. There were additionally another 20-30 tests that used the dialog in one way or another. Because we used IDs, most of the tests needed little to no updating. The test could find the publish button, generate code button, size selectors, and other key elements without any issues. This turned what could have been virtually rewriting all the tests, into simply making a couple of changes in how to navigate to the correct tab of the dialog.

4. Page Object Pattern

The page object pattern made the biggest difference in making our tests maintainable. The page object pattern quite simply means that every page knows how to perform the actions within the page. For example, the login page knows how to submit user credentials, click the “forgot my password” link, and sign in with Google SSO. By moving this functionality to a common spot, it could be shared by all of the tests. Because all of our tests are written by different developers, and within the product there are many different ways to perform the same action, every developer had different ways of performing the exact same functionality. An example of this would be selecting a document on Lucidchart’s documents page. When moving to page objects, we found 6 different CSS strings to select a document and 3 different ways to decide which one to click on. If this ever changed, it would be nightmare to go through and fix it in all 50 or so tests that required clicking on a document. Below is an example of our page object representing the documents page.



object DocsList extends RetryHelper with MainMenu with Page {

 val actionsPanel = new ActionsPanel

 val fileBrowser = new FileBrowser

 val fileTree = new FileTree

 val sharingPanel = new SharingPanel

 val invitationPanel = new InvitationPanel

Because there are so many actions to be performed, we broke it into smaller classes, each encompassing a smaller section of the page.

From there, each of the smaller sections contains all of the methods that can be performed within the section. For example, in the file browser section, we have methods to click the Create Document button, select a document, and check to see if the correct number of documents are there.



def clickCreateDocument(implicit user: LucidUser) {
 doWithRetry() {
   user.clickElement("new-document-button")
 }
}

def selectDocument(fileNum: Int=0)(implicit user: LucidUser) {
 doWithRetry() {
   user.driver.getElements(docIconCss)(fileNum).click()
 }
}

def numberOfDocsEquals(numberOfDocs: Int)(implicit user: LucidUser) : Boolean ={
 predicateWithRetry(WebUser.longWaitTime *5, WebUser.waitTime) {
   numberOfDocuments == numberOfDocs
 }
}

This allowed us to turn a complex test file, that was difficult to read and understand into a very clear, concise, descriptive test that anyone could comprehend.

With the page object model, our testing framework became much more maintainable and scalable. When a major feature was updated, all that had to be done to update the tests was to update the page objects. Developers knew exactly where to look and what to do in order to get all tests related to that feature passing. When it comes to scaling the test suite, creating new scenarios was as simple as combining the functionality that is already written in a different manner. This turned the task of writing more tests for a particular feature from a 2-3 hour chore to a trivial 10 minute task.

Make Tests Reliable

False negatives are possibly the worst part about Selenium tests. It makes it difficult to run them as part of an automated build because nobody wants to deal with a failed build that really should have passed. At Lucid, this was the number one problem we needed to solve before our Selenium suite could be considered valuable. We added retrying to all of our test actions and to several of our more flaky test suites, yielding much better results.

5. Retry Actions

The biggest reason for false negatives in our Selenium test suite was Selenium getting ahead of the browser. Selenium would click to open a panel and then before the javascript could execute to open the panel, Selenium was already trying to use it. This led to a lot of exceptions for stale element, element not found, and element not clickable.

On the first pass, the solution was simple: every time we got one of these errors, simply add a little wait. If it still failed, make the wait longer. While this solution worked in most cases, it was not elegant and led to a lot of time leaving the browser just sitting and waiting. Selenium tests are already slow enough without the explicit waits.

In an effort to solve this problem, we looked at some of the options that Selenium makes available (FluentWait, Explicit Waits, and Implicit Waits) but we were unable to get them working in all the situations that we needed for our application. From these examples, we decided to set up our own polling system that would fit our needs.



/**
* Try and take an action until it returns a value or we timeout
* @param maxWaitMillis the maximum amount of time to keep trying for in milliseconds
* @param pollIntervalMillis the amount of time to wait between retries in milliseconds
* @param callback a function that gets a value
* @tparam A the type of the callback
* @return whatever the callback returns, or throws an exception
*/
@annotation.tailrec
private def retry[A](maxWaitMillis: Long, pollIntervalMillis: Long)(callback: => A): A = {
 val start = System.currentTimeMillis

 Try {
   callback
 } match {
   case Success(value) => value
   case Failure(thrown) => {
     val timeForTest = System.currentTimeMillis - start
     val maxTimeToSleep = Math.min(maxWaitMillis - pollIntervalMillis, pollIntervalMillis)
     val timeLeftToSleep = maxTimeToSleep - timeForTest

     if (maxTimeToSleep <= 0) {        throw thrown      }      else {        if (timeLeftToSleep > 0) {
         Thread.sleep(timeLeftToSleep)
       }
       retry(maxWaitMillis - pollIntervalMillis, pollIntervalMillis)(callback)
     }
   }
 }
}

The basis of our retry code is a simple recursive algorithm that takes a function, max wait time, and polling time. It executes the function until it succeeds, or until the max wait time is exceeded. From this method, we implemented three versions for our three unique cases.

  1. Get with retry takes a function with a return value


def numberOfChildren(implicit user: LucidUser): Int = {
 getWithRetry() {
   user.driver.getCssElement(visibleCss).children.size
 }
}
  1. Do with retry takes a function with no return type


def clickFillColorWell(implicit user: LucidUser) {
 doWithRetry() {
   user.clickElementByCss("#fill-colorwell-color-well-wrapper")
 }
  1. Predicate with retry takes function that returns a boolean and will retry on any false values


def onPage(implicit user: LucidUser): Boolean = {
 predicateWithRetry() {
   user.driver.getCurrentUrl.contains(pageUrl)
 }
}

With these three methods, we were able to reduce our false negatives down to roughly 2%. All of our methods default to a max wait time of 1 second and a polling interval of 50 milliseconds so the delay is negligible. In our best example, we were able to turn a test that was a false negative about 10% of the time and took 45 seconds into a test that produced no false negatives and only took 33 seconds to run.

6. Suite Retries

Our final effort in making tests more reliable was setting up suite retry. A suite retry simply catches a failure and then starts the test over from scratch. If the test passes on one of the subsequent retries, then the test is marked as passing. If the test is legitimately failing, it will fail every time it is run and still provide the failure notification.

At Lucid, we have made an effort to use suite retries as sparingly as possible. Regular false negatives is a sign of a poorly written test. Sometimes it is not worth the effort to correct the issues in a test to make it more robust. For us, we drew the line at tests that relied on third party integrations such as image uploading, SSO, and syncing to Google Drive. There are ways we could make tests better equipped to handle failures from external integrations and plugins, but they are not worth the time and effort to correct the false negatives that occur on occasion. A retry does not fix a test, but instead removes the noise of false negatives from the reportings.

Have Fun with It

When I first started working with Selenium, I found it to be very painful. My tests failed periodically for seemingly no reason. It was a tedious effort to get every user action correct. The tests were repetitive and hard to write. And it was not just me; other developers across the organization all felt this way. Selenium had become a dreaded task to be completed begrudgingly at the end of a new feature.

Establishing a framework that is reliable, maintainable, and scalable was simply the first step in making a great Selenium testing suite at Lucid. Since then we have added some really interesting and amazing tests. One developer designed a way to take a screenshot of our main drawing canvas and store it in Amazon’s S3 service. This was then integrated with a screenshot comparison tool to do image comparison tests. Another fun test suite focuses on collaboration on documents. It is rewarding to see tests that take several users and use chat and real time collaboration to build a large document. Other impressive tests include our integrations with Google Drive, Yahoo and Google SSO, IconFinder, Google Analytics, and many more.

With our Selenium test suite, we now catch several regressions every week during development. Of our test results, less than 1% are false negatives. We have seen great success in scaling, updating, and maintaining our test suite over the past months as we have implemented some of these steps. The test suite is growing every day and with each passing week it is becoming more and more valuable in helping us to provide the highest quality software to all our users.

This post is based on a presentation given at OpenWest 2015. The slide deck is included below:

7 Comments

  1. […] Selenium: 7 Things You Need To Know – Lucidchart – […]

  2. Really enjoyed this post. You say developers are responsible for their tests, does this mean you have dedicated ‘test’ developers or it’s considered a standard part of work for a developer?

  3. Jared YarnJuly 22, 2015 at 9:26 am

    We currently do not have any dedicated test developers. It is considered standard that all of our developers write tests for all code they write.

  4. Great post, enjoyed the way you presented the ideas.

  5. Really great article 🙂 . I have used retry method to make test cases reliable.

  6. Jaini bhavsarJanuary 8, 2017 at 10:10 am

    Great article. It is really helpful.

Your email address will not be published.