Designing Good Experiments

Designing Good Experiments | Humanizing Work Show | Mailbag

Identify validation and invalidation criteria that’ll actually change the mind of the skeptic or advocate of the hypothesis.

Complex problems, whether in product development or organizational change require an experimental approach. In this episode, Richard and Peter answer a question about how to design good experiments (or probes, to use the Cynefin term).

Episode Transcription

Peter Green

Welcome to the Humanizing Work Mailbag, where we answer questions from the Humanizing Work Community!

Today, we’re answering a question from a client about how to design good experiments.

Before we jump into it… We want to help you with whatever challenges are most frustrating you right now. If you’re feeling stuck on something, whether that’s trying to take on a more human-centric approach to your work, trying to make your product or business outcomes better, or if you’ve just got a simple, tactical, process-related question, let us know about it. Send us an email at mailbag@humanizingwork.com with a few details about your situation, and we’ll share how we might think through your challenge right here on the Humanizing Work Show.

Richard Lawrence

And just a quick reminder to rate and review the HW Show in your podcast app, or if you’re watching on YouTube, please subscribe, like, and share today’s episode if you find it valuable to you. We really appreciate our help spreading the word.

Peter

Want to get access to more content we produce, not just the show? Sign up for our newsletter where we share one key idea every week. You can sign up at humanizingwork.com/hwnews.

In our workshops, we teach our clients to use an experimental approach to complex questions like, “Will our target market buy this product?” or “Is this the right team structure for our organization?”

A client who’s putting that into practice in his org recently reached out asking for more information about how to design a good test.

Richard

The first key move is to articulate your hypothesis clearly. What’s the risky belief or assumption you need to test? It might be a belief about the existence or severity of a problem. Or maybe you’re confident about the problem and you have an assumption that solving the problem will produce some positive outcome. Or perhaps the risky assumption is about your ability to even solve the problem. Whatever it is, state it as a clear “we believe…” or “I believe…” statement.

Then, when it comes to designing the test or experiment itself, whether that’s to probe a hypothesis about your product or about how you work, we like to think about four attributes of a good test: A good test is real, clean, short, and safe.

Peter

Real means the test actually changes the system in an observable way. For a product assumption test, this means getting target customers to actually do something, whether that’s clicking on an ad, signing up for a waiting list, using your product, or paying money for something. For an org change assumption, real means actually trying a different way of working, running a pilot– something like that.

Richard

Clean has two parts. First, it’s about limiting variables, setting up your test so that you’re actually testing your hypothesis. Of course, this isn’t always easy to do in a human situation. It’s not like a chemistry experiment or something. But we’re trying to reduce the number of variables involved as much as we can. Second, it’s about designing your test with an awareness of cognitive biases and trying to mitigate those. The one we find ourselves needing to design around the most is confirmation bias, the tendency to see the data we expect to see and to not see the data that violates our expectations. It’s really easy to design an experiment that tells you what you want to believe. So, a clean experiment is one that is set up so it could change your mind.

Peter

Short is about getting results quickly. Days rather than weeks. Weeks rather than months. Now, of course, this is relative. Some test types are faster than others. Some hypotheses can be tested more quickly than others. Either way, we’re trying to get some data as fast as possible. A short test that gives us some early evidence about our hypothesis is usually better than a long running one that may be more convincing, but takes a long time to get the data.

Richard

And then “Safe” is about making failure—, which is to say falsifying your hypothesis—informative, not risky. An experiment that bets your job, your reputation, or your company isn’t a safe experiment. Not only do you not want that downside, but it’s going to be difficult to run an honest experiment when the stakes are that high.

Peter

Now, it’s rare that you can maximize all four of these attributes at the same time. There are tradeoffs. For example, making an experiment more real can make it less clean, short, or safe. Testing a value proposition by running search ads is pretty short and safe. Testing a value proposition with a Kickstarter campaign is more real but slower. Testing a value proposition with an actual product release is very real but not very clean, short, or safe.

As a result, we often find ourselves using a sequence of increasingly real and increasingly less safe experiments as we get more data and our confidence goes up. We think about these with a goal of early falsification: “What could I do now that would tell me as quickly as possible that my hypothesis is wrong?”

Richard

Finally, the last key move for designing a good experiment is to determine useful validation and invalidation criteria. A classic mistake in experiment design is to just identify validation criteria: LIke, We’re right if sales go up by 20%. Then, when sales go up by 19%, you find yourself explaining why “Well, 19 is still good enough– we probably still validated the assumption.”

Instead, choose validation criteria that’ll convince the skeptic that the hypothesis is valid. Maybe it would take a 30% increase in sales to convince the person who doesn’t believe the hypothesis.

Then choose invalidation criteria that’ll convince you that you’re wrong. Maybe an increase of 10% or less would make you say, “Hmm, maybe this hypothesis isn’t valid after all.”

If the result lands in the middle, no one is going to change their mind—the test didn’t give a strong enough result to validate or invalidate the hypothesis. So, design another test usuing what you know now.

That gap between validation and invalidation criteria makes a huge difference for overcoming confirmation bias and for designing a strong test.

Peter

So, to sum it up, if you want to design good experiments, whether for product, process, or org change assumptions:

State your hypothesis clearly
Design a test that’s the right balance of real, clean, short, and safe
And identify validation and invalidation criteria that’ll actually change the mind of the skeptic or advocate of the hypothesis

Richard

Please like and share this episode if you found it useful, and keep the questions coming! Email us at mailbag@humanizingwork.com. We love digging into tough topics, so send them our way.

Last updated June 5, 2023