A new week, a new thing to explore on the Web. A couple years ago, when Amazon.com's Mechanical Turk marketplace was announced, I was very intrigued. But I never had a chance (or reason) to use it until recently.
What's Mechanical Turk?
Amazon's Mechanical Turk service is an On Demand Workforce, that essentially lets employers post human-powered tasks. Workers, who call themselves "Turkers" from around the world are then able to grab them, work on them, and submit their answers.
It's ideal for parallelizable tasks that cannot be fully automated, such as tagging images with keywords, or transcribing podcasts.
The memorable, but mouthful-of-a-name "Mechanical Turk" comes from the legendary chess-playing seeming automaton of the late 1700's. (Read about it the Turk here.) Turns out, the automation was actually a human being.
Revolutionary Way To Work
For a relatively small virtual company like my own, it's a truly revolutionary way to work.
For the first time, you can hand out thousands of tiny little atomic tasks and pay a very small sum for them, and get surprisingly high quality work (on balance) back. (One popular quality-enforcement mechanism is to hand out duplicate tasks, and then match the results for quality -- a technique long-used by the data-entry and transcription industry.)
It's revolutionary in that it's scaleable, it allows you to parallelize a previously serial task (thus complete it much more quickly), and it's nearly entirely automated. It's also surprisingly inexpensive if you can get the task atomic (small and discrete) enough.
It can also be used to power entirely new kinds of digital projects. For instance, a very interesting initial socio-economic experiment using Mechanical Turk was the whimsical Sheep Market, performed by Aaron Koblin. Aaron created an MTurk task which asked people to draw 5 sheep facing left, and paid them $0.02 apiece. After this was done, Aaron turned around and sold these hand-crafted, digitally-stored sheep to others, creating artificial scarcity by limiting the number of times each virtual "sheet" of sheep could be sold to just 1.0. (The sheep are all sold out, earning Aaron a nice profit for doing very little work -- see the website here, and read an excellent article from Salon about it.)
My Initial Foray into MTurk: A Simple Survey
To get my feet wet, I decided to use MTurk's survey creation template to offer $0.02 for a simple answer to the question:
"What iPhone App do you wish existed but doesn't?"
I funded my account with a whopping $2.20, enough to generate about 100 answers from around the world to this question at this rate, plus a 10% commission to Amazon.com for the marketplace service.
It's been about 8 hours now and I have 8 answers to this question, ranging from "a decent 3D Asteroids clone" to "Adobe Flash so I can play games". I am intrigued by the idea of being able to parallelize problems and get them done much, much faster and also more cheaply. Of course, any survey on mturk is guaranteed to be skewed and suffers from selection bias. Nevertheless, where else could you pay $2.00 for 100 answers to a simple question, and get the results in 1 or 2 days?
Meanwhile, this week, I've been working on something much bigger, and potentially extremely useful, which is:
My End Goal: Further Categorize and Nutritionally Mark Up Recipes
Are you counting calories? Looking more closely at Nutrition Facts labels these days?
If you're one of the many Weight Watchers, Diabetics, High Blood Pressure Sufferers, or more, you probably are.
Now, BigOven recipe software for Windows has long had the ability to analyze any recipe for its nutritional content.
But this requires a manual link-up procedure in which you take an ingredient line like:
1 cup sugar, sifted
and then link it up to the equivalent nutritional record in the database, including a simple calculation that lets BigOven figure out the gram weight of that ingredient. It's cool, and it remembers your work as it goes along, but it's also a manual task.
One of the problems with fully automating this kind of parsing is that people enter recipe ingredients in a wide variety of ways on our user-generated-content site.
There are synonyms (e.g., you say "scallions" and I say "green onions"), pluralization recognition ("2 eggs" is the exact same thing as "1 egg" plus "1 egg", no S), and unit conversion problems (what is a "dash" of salt, exactly?)
All these complications suggest that human intervention could help the problem along a bit.
The end result is only a good estimate of nutritional content (the sticklers among you will correctly note that caloric change does happen during the cooking process), but done right, this level of approximation is very worthwhile information to know.
For instance, think of how fantastic it would be to be able to answer the question "Of the 500 lasagna recipes on BigOven.com, which is the lowest in fat?", or, "What's the highest protein breakfast drink known on the site?", or "I am watching my sodium intake. Tell me what I can have for dinner."
To answer these questions on 160,000+ recipes, it requires a lot of manual work. I had previously considered hiring a temporary intern and/or editorial staff to build this out, but when you do the math, 5 people could work for a year and still not be done.
So, what I've done is create an mturk project (currently more of an experiment) that is capable of farming out every single unique ingredient line on BigOven.com (there are 160,000 recipes, and about 600,000 unique ingredient lines across all recipes). These will be pair-matched for quality. So far I've got the basic mturk-task-creation process built. Am running a few tests. In theory, if I can get 600,000 ingredient lines mapped to their nutritional content, in most cases, BigOven will be able to programmatically recognize (and zap!-link-up) the nutritional content for that line. Almost like real-time spellchecking. Kind of cool, if it can be done at an affordable rate.
Right now I'm experimenting a bit with the number of assignments handed out (for quality control) and the rate I have to pay to get a response.
Stay tuned for more info on this project.
Recent Comments