Hacker News new | past | comments | ask | show | jobs | submit login
How I Used Amazon’s Mechanical Turk to Validate my Startup Idea (harperlindsey.wordpress.com)
157 points by toumhi on Sept 7, 2010 | hide | past | favorite | 58 comments



I'd caution making your survey "Hey would you use Web app A?" The Mechanical Turk users want to blaze through the survey as fast as possible and get paid. Clicking "Yes" gets them through the fastest and also lets them please the surveyor (the same bias as your friends or family).

Force them to make a choice. Present Web app A (real), Web app B (dummy), and Web app C (dummy). Make them rank which web app is best.

Set up the survey three ways: A, B, C. B, C, A. and C, B, A. Have a third of the sample take each survey. You would be surprised at the first choice bias with Mechanical Turk. Actually you wouldn't be surprised when you remember that these people just want to get the thing done.

Finally a good secondary survey is to make them rank order features for their worth. This helps find your MVP. (Read more about "conjoint analysis" if this interests you.)


It's not clear from the article what kind of business model the Startup in question is planning to use. This is relevant, as the fact the 73% of random strangers said they would "use the service as described" doesn't give any indication that they'd be willing to pay for it.

Clearly, the author got his $27.50 worth; how much that's really worth, though, in the long term, remains to be seen.


Yeah, I've done this before. Of the 200 people I surveyed a much higher % (around 95%) said they liked and would use the service...and then did not. My experience with this stuff makes me think that people who do it generally have a perception that you are their "boss" and that they need to sugar coat everything they tell you.

TLDR: Be HIGHLY suspect of any results you get from mechanical turk (same goes for feedback army and others).


It applies to everyone. Not just mechanical turk folks. When someone says they 'will use' your product they are non-committal. (I have done this so many times for the fear of offending the person who is asking me). When its time to take out their credit cards then they backtrack. They are 2 different decision points. The later is what matters to startups.


When its time to take out their credit cards then they backtrack

And to be fair, even if I am trying to be super-honest, I find it very hard to decide whether I would be happy to whip out a credit card from a concept pitch. It's not just a question of "would I pay for A", it is a question of "would I pay for A it it really rang my bell", and I can't tell whether you are going to ring my bell or not until I have tried your product and your customer support philosophy.

When I look at my credit card statement, I see a bunch of companies not just whose product I like and use, but also a bunch of folks for whom I can say "I like the way they do business".

I didn't pay to cloud-store photos for over a decade, until I found smugmug. So what is the right answer to the question "would you pay to store photos"?

This isn't the kind of nuance you would get from an MT survey I don't think.


This isn't 73% of random strangers. This is 73% of the Mechanical Turk users who accepted the hit for this.


Exactly, this is very relevant. The type of person who stumbles upon your web app is isn't necessarily the type of person who decides a good use of their time is completing mechanical turk tasks. But as long as you're aware of this when considering your results, I suppose they can still be considerably valuable.


Strange use of Mechanical Turk if you ask me. You could've spent about the same in Google Adwords asking people to take a survey. At least then you'd have a survey of "people who are searching for important keywords related to my topic of interest" rather "people who are so technically capable that they are using MT".

"I also found out to my surprise that more men than women said they’d use the service. I fully expected it to be the other way around." 200 people is not a large sample size and I would caution you against making sweeping decisions about the future of a company based on this small size. It could be that the MT program attracts more men than women? I don't know how you set it up but you said earlier that you didn't really do any segmentation.


It is still waaaaaaaaaaay better than asking your family and friends.


Dude. No.

They're both 100% useless.

Mechanical Turkers are incentivized to give positive feedback because there's a chance (lurking in the back of their minds, surely) that people are emotionally sensitive and if they get criticized, they will be emotionally wounded and won't approve their HIT.

Do you know how I know this? I asked Turkers to give me feedback on one of my sites, and it was nothing but sunshine and razzmatazz smoothies.

You know you're building something people want when they vote with their wallets or attention (give you their email address, phone number...)

New Theory: hell, an even better test might be are they willing to "invite their friends" to tell them about it? People share things with friends because they like their friends - is it of sufficiently high value that they will get a "social credit" for sharing with their friends?.


I guess you could ignore the 73% that said they would use it. But note that 27% said they wouldn't use it and also stated "WHY they wouldn’t use" it. This should be at least 1% useful.


When we created http://markiter.com we decided to present users with two options instead of just asking vague questions.

You could compare a few pictures, videos, landing pages, etc. And get some a vote with a comment on which is better.

Forced more legwork on the user to actually create multiple options.


People who use AMT for earning money may not necessarily be the right audience you are looking for.


I don't think that people use Mechanical Turk as their sole source for money. If I remember correctly, someone figured out that mechanical turk pays roughly $5/hour if you spend your whole day doing it. That's not very much.

Mechanical Turk seems to be filled with people itching for some community to participate in -- why not get paid for it at the same time?


In some countries $5/hour is a very good salary...

Even in Russia it's about the average income IIRC if you assume 40-hour workweek.


I actually agree with both of you. MTurk is not solely about the money - but there is enough money in the system for it to be the main motivating force.

Getting honest feedback probably depends very heavily on survey design, instruction sets, and, if you've been doing it for a while, reputation.

With a jobs size this small you'll get information that might be useful, but will probably be biased. I wouldn't trust it - but I would use it as a starting place.

For the incredibly low price I think it's worth it.


I've run an MTurk survey as well and asked standard demographic information. My results are surprisingly in line with national averages. I ran the survey at the suggestion of someone with much more experience running MTurk surveys, who claims he has verified that the results are better than typical phone and mall campaigns.


>I've run an MTurk survey as well and asked standard demographic information. My results are surprisingly in line with national averages.

Do MTurkers tell the truth though? Presumably one person can run a dozen fronts so that they can answer questions that require a specific demographic. Or they masquerade as elderly females to appear more trustworthy and shame you out of objecting if they've not done the job properly or whatever.


Do you think they would lie so as to mirror the national averages?


A study came out saying that most people on AMT use it because they're bored as an alternative to games or whatever else since if they're going to be bored on the internet they might as well make some money even if an insignificant amount.


I would take the "would you use this service" feedback with a grain of salt.

It costs you nothing to say "yes", but everything changes as soon as you start to charge for it.

Going from Free to $0.01 for any online product or service significantly changes your conversion numbers.

You might get better feedback standing on a corner downtown, asking if people would use your service... and charge them $1 for a voucher that would allow them to use it when it is ready.

PS: Please get out of stealth mode!


For anyone else interested in this "quick, cheap, feedback" cycle, PickFu is another service that does this: http://pickfu.com/

I spent about $20 to get 200 responses for my question, but only ended up collecting 103 answers before the question expired.

None of this stuff is perfect, but employing a few of these approaches all at the same time will generate results that can be handy to correlate.


Actually, it didn't expire, it finished with 104 of 200 answers picking "B - Paypal" http://pickfu.com/00PULC

Also, just to clarify, it's $17 for 200 responses.

Thanks for trying out PickFu!


Justin, my apologies on that, I misread the screen and thought I only got 104 answers.

It was a great experience overall, and for micro-testing (where something like Usability testing is too much) PickFu is on my short list for future projects.


No worries. Other people have misread it too so we probably need to tweak the wording to make it more clear.

Glad to hear it!


Thank you for doing this. Living outside of the US means that I cannot use the Amazon Mechanical Turk. I just launched a $17 test. If it works well, I'll use it again in the future.


Pickfu uses the Mechanical Turk API


This is Lindsey - who wrote the post. I wanted to clarify a couple of things, based on the feedback I’m seeing here. (btw. This was my first blog post :)). I am still in stealth mode (which is a completely different conversation that I’ll be blogging about soon). www.Swayable.com will be a consumer facing free service. With that said, the Turk system will not work for a lot of startups that need niche specific data. I was looking for general consumers that are web/computer savvy and the Turk system fit the bill for what I needed. The most valuable feedback I got was actually from the text responses as opposed to the yes/no response. I will definitely follow up on this post as soon as I am out of stealth mode, and provide more specific data. Thanks for all the great feedback!


You are in stealth mode, yet willing to pitch the idea to hundreds of random strangers?


cool. My girlfriend had an idea about startup and we started discussing it. At some point I told her 'lets stop hand-waving and look at some data' . For ~10 bucks we got 110 good responses to 11-question survey on Mechanical Turk. It really gave us new perspective on some of our problems. That was much easier than asking our friends for feedback (which we did too).


The post would be more interesting if we knew what questions were asked in the survey.

Edit: I meant how exactly were the survey looked like, including the service description. If it's a paid service, I highly doubt the 73% figure.


It's right there in the post:

I gave a brief description of what the website/service offering would be then asked the users:

   Gender
   Age
   Would they use the service as described above
   Give me 3 examples of how they would use the service.
   General feedback or ideas on the service, why they would or would not use it.*


Me, too. I checked my stats after using MT for a site-survey. Most of the users were from the same geographical region, they gave very positive feedback, and they spent very little time on the site. Pair that with an understanding that MT only pays well if you do lots of little tasks very fast, and you start to get the idea that the feedback might not be the most useful.


You need to put in a text box: "Please describe, in your own words, the purpose of my site:" Then throw out all the responses that are incomplete or way off. The problem is that when I did that, there were too few useable results.


This is incredibly good advice. All MTurk tasks of a small size or that will be approved by a single person, should ask an open ended question, that requires a response.

This greatly reduces fraud, and shoddy work. And it has a tendency to highlight the really good workers (some of which are amazing!) - maximizing your benefit as a requester.


or whether people in the target market were remotely likely to be found answering questions on Amazon Mechanical Turk...


Interesting article, but probably poorly timed submissing to HN. You could have used this article to promote your product, but you're in stealth mode so we get a "sign up to hear when my product is available" page, but still have no idea what your product is. Good article, bad marketing. Should have released this about a week after you launched the site :)


I wonder how many of these 200 people were from India, which appears to be supplying a good chunk of Mech Turk userbase.

I also ran a couple of polls with similar intent on MT and then stopped doing that because results were skewed dramatically. For one there's no filtering by a geographical region and you bet your 10c-per-answer that you will get a ton of replies from the poorest corners of the world. Secondly, responders are afraid of you rejecting their answer and thus affecting their MT rating, so they tend to tell you want you want to hear and not what they actually might be thinking. Hence your 73% approval rate.

This sampling is not random, very far from it. You are effectively sampling MT community with its own dynamics, the community that is likely NOT to be representative of your own target user base.


I've been using CrowdFlower as an interface to MTurk, and it let me exclude specific countries. Your point about the secondary motivation is unfortunately valid, though.


Might be a bit off topic, but does anyone know a good alternative to MTurk for people outside the US?


http://www.clickworker.com is based in Europe, with European workers and has a good reputation.

As others are suggesting, Crowdflower works, and works well, with an excellent API. Also they are just good people. (They use an assortment of workforces)

For smaller jobs self service jobs like the one described in this article SmartSheet UI is amazing. http://www.smartsheet.com/ (MTurk based)

And if you need bigger stuff (or text oriented stuff), with some consulting you can contact me, or my company http://castingwords.com (MTurk based)


I successfully used http://www.crowdflower.com to submit jobs to MT from outside the US.


Hi zumda, I have used http://crowdflower.com before. You get an account manager to help you which is neccessary as there is no supporting documentation and the interface is a little confusing / unhelpful when things don't work out.


I'm building an API on top of MTurk, http://houdinihq.com, that you could use. It's currently in alpha, but feel free to email me at presto@houdinihq.com if you're interested in an api key.


Simple API looks really simple, something that I definitely like!

How do you plan to accept payments? How much would Houdini itself cost (per task, per month?)? Do you plan to accept non-US users?


Thanks! We also have some more advanced apis in the works, like asking multiple workers to do the same work and automatically determining the 'true' answer. From your point of view, they'll be just as easy to use though.

Pricing is still TBD. MTurk charges will be passed through directly to you with Houdini-specific charges either per task or per month.

Yeah, I'm planning on accepting non-US users.


I believe that Feedback army is explicitly designed for this sort of thing. It is layered on top of MT, But takes paypal or credit cards directly, and is dead-simple to use.


I've seen weirder use for mechanical turk: http://www.youtube.com/watch?v=D_CC5r5Wfm0 So, I guess why not?


What if you asked for a follow up email address / phone number? The survey would then act as a screening device that would identify those who are really interested.


In case anyone doesn't know what a Mechanical Turk is: http://en.wikipedia.org/wiki/The_Turk


I think Amazon Turk could be better used for beta testing and early testing of word-of-mouth. First, ask some people to register on your site and then track what they do. Then, have them fill out a feedback form and tell you what they would have you improve. One iteration would cost you around $20. And if you do it once every 2 weeks or so, you can keep in touch with what your users actually want.


Does Amazon say anywhere how diverse Turk users are? Maybe you ended up only surveying mostly one gender in mostly one part of the world?


When we were creating http://markiter.com

We did a blog post referencing a study:

http://blog.markiter.com/#turk

http://behind-the-enemy-lines.blogspot.com/2008/03/mechanica...

Which has been updated recently. Some high quality data presented in a clear way.

http://behind-the-enemy-lines.blogspot.com/2010/03/new-demog...


Thanks for that - this set is from Feb 2010

http://behind-the-enemy-lines.blogspot.com/2010/03/new-demog...

    * United States: 46.80%    (65% female)
    * India:         34.00%    (70% male)
    * Miscellaneous: 19.20%


May be the survey data us worth useful if your startup's target audience consists of folks who earn pennies on the hour.


Strange? Sure. Useful? Probably.. But I really like the novelty of looking at AMT as a group real people and not just commoditized, bite-sized, labor. Opens my mind up to problems that might be solved with AMT.


Some good advice I've heard: When doing this kind of market research, don't ask people if they would buy, ask them to buy.

Of course, that's easier said than done in certain circumstances.


Come back after you launch and tell us if the feedback was accurate- until then, you are half way through executing a (rather interesting, imo) experiment.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: