How to Build a Lean Startup, step-by-step

swombat · on May 23, 2009

Excellent talk. It repeats many of the points on Eric's blog, but they're all very good points worth hearing again.

Here's my question, though. I'm really struggling with this one, and I think Eric Ries is aware of HN, so I'd really love an informed answer (hint hint).

I run a start-up, http://www.woobius.com

We are getting reasonable traffic levels for this niche industry, but are still at a fairly early stage. We do not have thousands of visitors a day, or thousands of users a day. Each user is influenced by all sorts of special circumstances, such as whether they're in a company that we've been talking to, whether they're an architect, an engineer, a project manager, etc... As far as I can tell, they are heterogeneous, each of them mostly unique.

Moreover, the line between signup and purchase is not so clear. My start-up's product is project- and company-based. People might use it every day yet never pay for it if one of their colleagues paid for it. That doesn't mean they're not a happy customer, it just means that, for example, they're at a point in their career where they're not directing projects or making purchasing decisions.

Users also differ in their usage patterns. Some of them use our application to send files. Others only to receive or download them. Again, the users can be sliced in many heterogeneous groups by which activities they favour.

In those conditions, I find it extremely difficult to devise A/B experiments that measure things against a productive end result ("$$$" to use the notation in this presentation). We do measure and learn, but the way we do this is by talking to our users, or standing over their shoulder and watching them use the application.

*

I'd love to be able to implement a more scientific approach to testing out new features, but it just doesn't seem practical to me, given the circumstances of my start-up.

If I don't slice the users into more homogeneous groups before doing the A/B testing, the results will, imho, be flawed because there might easily be more users of one kind in A than in B. If I do slice them, I'll end up with groups of 10-50 users, because of all those differences that I'll have to slice for. With such small numbers, individual circumstances will, in my opinion, have far more of an effect on usage patterns than whether or not I add a button somewhere.

*

So how do you apply this "A/B test every change" approach to such an environment? Especially since we do make many changes a day (though we deploy every few days), so letting each change sit around for a week to accumulate A/B users would severely slow down our progress.

Any advice would be most welcome.

eries · on May 23, 2009

Thanks for the really thoughtful comment. Let me try and unpack your question into a few parts, and answer each one separately.

First of all, the fundamental feedback loop doesn't require A/B tests. What matters is that you act in a disciplined way to transform ideas into products, measure what happens, and learn for the next set of ideas. Over time, you should get faster at executing this feedback loop, not slower. A/B testing is a great methodology, but not the only one. You might take a look at Net Promoter Score (NPS) for example, as one alternate way of gauging customer reaction to the changes you're making. If you look at the actual practice of science, you'll notice that not all branches can do controlled experimentation. In cosmology, for example, they have to rely on "natural experiments" because they (so far) lack the tools to conduct experiments involving large gravitational masses, etc. Subjective forms of data-collection, like in-person interviews and usability tests, can provide "validated learning about customers" if you are disciplined about it.

Second, I'm not sure I agree that you can't do A/B split-tests in this situation. The number of customers you have per day is not really relevant - that only affects how long it takes you to get a statistically significant result. You might need weeks to get enough customers through your 50/50 test to get good data, but that doesn't necessarily mean that's a bad idea. It might "slow you down" from the point of view of coding, but if it prevents you from building a feature that nobody wants, that speeds you up much more.

In fact, I would work backwards from "what do I need to have in order to validate my hypotheses" and then structure the rest of your business around that. For example, you might not want to spend the dollars on AdWords to drive traffic to your business at this stage, because the ROI is not high enough yet. On the other hand, if increased traffic leads to rapid iteration which leads to customer validation, that might be a good trade-off. Do what you have to do to accelerate your learning.

Last, your fear about the differences in types of customers is important to address. Start getting clear about the "customer archetype" you think is most likely to use your product. What does a day in their life look like, for example? Why are they crazy enough to use your early-stage product, instead of the more rational thing which would be to buy from an established player?

If I had to guess, I would say that, most likely, your current customers all have something pretty specific in common. Although they may have wildly different demographics and usage patterns, it's likely that they are all early adopters of ... something. Otherwise, they wouldn't be wasting their time with your product. The more you understand what that is, the better you'll be able to tailor your product to their needs. But, more importantly, this commonality probably means your split-tests (and usability tests) have more validity than you think.

Last note, just because you _can_ run split-tests on trivial changes (like "whether or not I add a button somewhere") doesn't mean that you should use them for that purpose. You can also run split-tests on big, meaty changes that elicit a strong reaction from customers. And if you recall from basic statistics, the incidence rate of the thing being measured is just as important in determining significance as the sample size. Thus, if you're split-testing important things, you can get a good result with much smaller samples. And, as an early-stage startup, I'd maintain it's only worth testing things that (you believe) have a large impact.

Does that help?

Further reading on the topic: http://startuplessonslearned.blogspot.com/search/label/split...

http://startuplessonslearned.blogspot.com/search/label/liste...

http://startuplessonslearned.blogspot.com/search/label/custo...

dawie · on May 25, 2009

I can't seem to view the webcast. Am I missing something?