• The cover of the 'Perl Hacks' book
  • The cover of the 'Beginning Perl' book
  • An image of Curtis Poe, holding some electronic equipment in front of his face.

The Surprises of A/B Testing

minute read



Find me on ... Tags

Yeah, sit back in your chair, you silly, naïve developer. I've been building software since before you were potty-trained and I know what I'm doing. I know what customers want and I know how to build it for them.

Er, except for the fact that I'm usually wrong.

When I first encountered A/B testing, I had fixed a bug in a search engine and tried to push my change. It was rejected. Why? Because I hadn't wrapped it in an A/B test. I was confused. Why would I want to A/B test a bugfix? Under certain conditions our search engine would return no results and I fixed it to ensure that it would not only return results, but they would be relevant results.

Boss: I don't care. A/B test it anyway.

So I did. And it was painful, waiting days for the results to come in ... and being dismayed to see very clearly that improving the behavior of the search engine led to significantly lower customer conversion. I even questioned the validity of the data, but to no avail. It took me a long time to truly realize the implications, but it harmonizes quite well with other things I teach about software:

Software behavior should be considered buggy when it causes unwanted customer behavior, not unwanted software behavior.

Yes, that's an oversimplification, but bear with me.

A/B Testing: Slayer of Egos

I had a client who introduced a horizontal scroll bar to their e-commerce site and significantly improved sales. Another company found that removing social media buttons increased conversions. And I found that our customers didn't respond well to better search results.

For many people those ideas might sound counter-intuitive, but they had real data to back them up. Instead, here's what often happens when you have experts guiding development by their "experience":

  • We need multiple pictures of our product on the "Buy now" page! Sales drop.
  • We need to show related items so we can cross-sell! Sales drop.
  • We need to show more relevant search results! Sales drop.

Quite often you won't find out that sales dropped as a result of a particular change because you haven't measured your customer's behavior and ignoring your customers is, by anyone's reckoning, a recipe for disaster. Remember how Digg went from king of the web to court jester in 2010 because they ignored their customers?

A/B testing is not a silver bullet, but it's an excellent way of getting real, honest, data-driven answers to your questions. And if you have an ego, it's a punch in the gut. Again and again and again. Why? Because the vast majority of A/B tests are inconclusive (no apparent change in behavior) or negative (you probably lost money). Your customers don't care about how experienced you are.

The Customer Isn't Always Right

Your customers care about what your customers care about. Unfortunately, they often don't know what they care about until they experience it first-hand or some bandwagon effect kicks in and everyone piles on (this can be either good or bad for you).

To give an example of how problematic this can be, consider the case of a company in France a friend of mind was running (sorry, I can't disclose the name). It was was a very innovative paid search engine covering a very technical field. The company raised a lot of seed funding and was growing steadily, but it wasn't profitable because potential customers weren't signing up fast enough. Surveys revealed they wanted the search engine to return research results in French and English. This was because the top research in this field was published in English and the French professionals wanted access to it.

Given that the search engine was in French and it was searching highly technical documents, converting it to also search English documents, search them correctly, buy subscriptions to the various technical sources those documents were found in, and presenting them seamlessly on a French-language website turned out to be both expensive and time-consuming. And it wasn't just development costs: it was ongoing costs to subscribe to, parse, and index the English language materials.

The feature was released to much fanfare and potential customers didn't care. At all. They asked for this feature but it had no significant impact on conversion. As it turns out, French professionals may very well speak English, but they still preferred French. What they wanted wasn't English-language documents; they wanted a reassurance they were getting a good value for their money. Given that the company had a simple metric—new customer subscriptions—A/B testing could have been a far less expensive way of divining this information. Maybe only provide a single English-language source. Maybe provide testimonials. Maybe something as simple as playing with language or images may have had an impact?

Instead, the cost of development, the opportunity cost of not developing other features or spending on marketing, and the ongoing cost of maintaining the English-language corpus were all significant contributing factors to the collapse of the company.

The literature is rife with stories of companies listening to what their customers say instead of paying attention to what their customers do. A/B testing tells you what they're actually doing.

Limitations of A/B Testing

A/B testing is a powerful tool in your toolbox, but it shouldn't be considered the only tool. Further, just as you don't use a hammer with screws, you shouldn't misuse A/B testing. Evan Miller has a great article entitled How Not To Run an A/B Test.

But statistical flaws aside, there are other issues with A/B testing. As part of your agile toolkit, you want to run them often. If you only release new versions of your software quarterly you're going to struggle to rapidly respond to customer needs. If you're testing at the bottom of a conversion funnel and only get 10 visitors a week, you might wait months to get a meaningful result. Or maybe you've run a very successful A/B test, but it's not related to KPIs of the company.

And the killer, one I've seen all too often: "that idea is rubbish! We don't need to test that!"

If you've made A/B tests painful to set up and can only run them periodically, I can understand that attitude (but the A/B test isn't the real problem there). However, if you can easily run tests and respond to results quickly, it often makes sense to test "dumb" ideas. Consider the case of the horizontal scroll bar I mentioned earlier.

You'll have plenty of experts telling you why horizontal scroll bars are a disaster, so why did it work for my aforementioned client?

First, the key thing to remember is that A/B tests tell you what your customers are doing, but not why.

With the horizontal scroll bar, the test clearly showed increased conversion, but the designer was extremely unhappy. After a lot of thought and examining the page, she noticed something interesting. A side effect of the horizontal scroll bar was that the full product description was now visible without vertical scrolling. She redesigned the page to use a vertical scroll bar instead of a horizontal one, but kept the full product description visible.

Once again there was another nice increase in conversion rates, significantly better than the original version.

Your expertise isn't in dictating features, it's in designing experiments and interpreting the results.

So you see? Your experience still matters and you can keep feeding your ego, but now you have hard data to back it up.

Summary

If you're not doing A/B testing, you really should consider it. There are plenty of companies which provide simple tools for integrating A/B testing into your Web site. Once you understand them, and you get a feel for the power of A/B testing, it's time for you to start building an internal tool that's more suitable for your needs. You'll be able to test things you could never test before, and you'll have better access to customer segmentation data.

Stop doing ego-driven development. This is a typical story card we see today:

As a <developer>
I want to <build random stuff>
So that <the boss stays happy>

This is what you're really looking for:

We suspect that <building this feature>
... <for these people>
... will achieve <this measurable result>
We will know we succeeded when we see <this market signal>.

Please leave a comment below!



If you'd like top-notch consulting or training, email me and let's discuss how I can help you. Read my hire me page to learn more about my background.


Copyright © 2018-2024 by Curtis “Ovid” Poe.