Monitor your A/B tests with Optimizely and Pingdom

Here at Blue Mango we’re pretty serious about improving the performance of our products and the products of our clients. It’s in our DNA. Because of this, it should not be a big suprise that we continuously run A/B and multivariate tests. We turn out to be pretty good at it as well.

Our tests kept breaking

When A/B testing, you are creating a variant based on a default. This default is the state of the page you are about to test, before applying any transformations that are part of the testing plan. In other words, it’s the basis of your page under test (or PUT, as I like to call it).

For your variant to work properly, the default should not change during the testing period. And that’s exactly how our tests got sabotaged. In most cases, we are not building and maintaining the PUTs of our clients, and the parties that do are continuously improving them. Because of this, the default of our PUTs can change, which ruins our tests.

So let’s say we have a simple test where we want to change the text in a button from option A: ‘to payment’ to option B: ‘next step’ to see if it increases the number of clicks. In the default we identify the button with the unique identifier main_cta. We change our B variant with something like:

document.getElementById('main_cta').innerText = 'next step'

Save. Start the test. The first data is coming in. Everything is looking good. But then, without us knowing it, the identifier of the button changes from main_cta to secondary_cta, due to third party actions. As you can guess, our variant for the main_cta no longer works, making the test useless.

We discovered most of these changes happen on specific days and times, likely because of a release after a sprint. So at those known moments, we let a few people check the PUTs to see if everything was still working.

For me as a developer this was a thorn in my side, especially in this era of automation. It took up too much time, and it wasn’t fail proof: we were still dealing with totally random changes in our PUTs.

The solution: Monitoring

Ideally, we want to get notified immediately when the default changes, so we can pause our tests and change the implementation.

When we started using Pingdom for uptime monitoring, I found out that Pingdom also has a service called Transaction Monitor. On their website, they state:

Now it’s easy to identify broken interactions. [..] You need to be the first to know when there is an issue with the login, search, check out, or any other user interactions on your website.

Bingo! Exactly what the doctor ordered.

When using Transaction Monitor, the Pingdom bot visits your website once in a (configured) while, and executes a number of defined actions (‘click on this element’) and assertions (‘this should be visible’). The interval of checks is configurable between one minute and seversal hours.

If a check comes back negative, you immediately get notified. Pingdom offers a great number of ways to alert you when something’s wrong, for example SMS, email, Twitter and the Pingdom app.

When you add a check in the Transaction Monitor, you can assemble a sequence of two types of steps: commands and validations.

Examples of available commands are:

Go to a URL
Click on an element
Fill a text field with text
Wait for element to exist
Submit a form

And a selection of possible validations:

Element should exist
Element should contain text
Text field should contain text
Checkbox should be checked

People who have used Selenium, CasperJS or any E2E testing frameworks will feel very familiar with these commands and assertions.

As you can imagine, you can make your checks as complex as you like. How complex your checks are going to be, depends on the number of variants and how complex your variants are.

To check if the default from our earlier example has not been changed, we only need two steps:

Check if #main_cta exist

That’s it. Now everytime the Pingdom bot visits your website, it will execute those two steps and let you know immediately when one of them failed.

You can use any CSS selector to identify elements on your page. I really like selectorgadget.com to determine a selector.

Always check your default

Did you see the monitoring=true query parameter in the first step? This is to make sure you verify your default, not the variants. You need to add a unique URL of your page that does not try to load one of your variants. This can easily be done by excluding URLs with a specific query string parameter.

Because we are using Optimizely to create and manage our A/B tests, I’m going to show you how to exclude URLs using Optimizely.

Let’s say we want to start monitoring the test that we’ve got running on yourclient.com. In Optimizely, go to the experiment you want to monitor and open the editor. Add a new audience using this icon:

Add a new audience

Give your new audience a descriptive name, like ‘Bots for monitoring’. Drag the ‘Query Parameters’ condition to the ‘Audience Conditions’. Pick a useful name for the query string and make sure you select ‘does not match’ in the dropdown.

New audience

Save the audience. Now we have created a URL that excludes its visitors from the experiment and shows the default: http://www.yourclient.com/?monitoring=true. This is the URL you should add to the first step when adding a Transaction Monitor check in Pingdom.

Update: There’s an easier solution that will ensure you’re always monitoring the default: you can just add ?optimizely_disable=true to the url so that Optimizely doesn’t run and track. Thanks to Tobias Urff for this tip in the comments.

Conclusion

Pingdom is a very powerful tool and its Transaction Monitor feature is only the tip of the iceberg. In the few weeks since we started using Pingdom to monitor our A/B tests, we have already been notified about some suddenly failing experiments we would have otherwise discovered too late. So without a doubt, using automated monitoring on our tests is extremely useful for us, and it might be for you as well.