Skip to content

Four years ago I posted a blog about using Amazon.com’s Mechanical Turk (MTurk) with Qualtrics. It addresses the mechanics of avoiding additional MTurk fees for posting more than nine HITs and offers advice on collecting data with MTurk. Here I will give step by step advice on how to set up your survey to collect data on MTurk.

Why? Because there are things you want to avoid if you want good data, including

  1. Bots completing your survey.
  2. Careless or inattentive responding.
  3. Purposeful negligence (e.g., copy-pasting from the Internet rather than following instructions for open-ended response).
  4. Doubts about whether to pay a worker who fails to input the code.

The techniques below are described with reference to Qualtrics, because that is the platform I’ve been using. They would undoubtedly work with other platforms, such as Survey Monkey.

How to avoid bots:

This is an easy one: Use a Captcha. (Cool explanation of Captchas)

At first, I routed MTurk workers to a separate survey that had only the Captcha. If they completed it correctly, they were automatically redirected to the study. Now that I am collecting Worker IDs (WIDs) automatically, I just put it at the start of the main survey.

Yes, you could collect the WID in one survey and carry it over to the next. But it’s an extra step. Easier just to have one survey. The reason I was doing it with two was to keep the WID separate from the data (I assigned a random number in the Captcha screening survey that carried over to the main survey). I’ve decided that’s not necessary.

Careless or inattentive responding

First, warn workers that they will not be paid if they do not complete the tasks as instructed. After that, it is a matter of identifying bad responses so you can not-pay them. With the exception of the first option (I don’t record participants), I suggest using multiple ways to check for bad data.

Option 1: Record participants (they must use webcams)

I know from reading worker fora that some requesters require workers to have their camera on, presumably to verify focused participation. Unless the study is about eye-tracking (is this possible?) or something that requires the video, I see this as intrusive and potentially a violation of privacy. I wouldn’t do it, but it is an option.

Option 2: Set up survey so that it bounces people who do not pass attention checks.

This is an old method for “catching” inattentive survey takers by asking them to select a certain response or perform a task. For example, you can tell them “Please select disagree.” You can also use two items with responses that must be opposite (e.g., “I am very happy right now” and “I am very unhappy right now” with true/false responses) and eliminate people who do not choose appropriately.

I do not believe this is a reliable way to identify bad cases. First, honest participants may miss them. I know this is the case, having had people I know were doing their best miss attention checks. Second, it could bias responding. Third, experienced MTurk workers can identify such “trap” questions readily and still speed through the rest of the tasks.

If you decide to use attention checks, you can then set up the Survey Flow in Qualtrics to send them to an alternate End of Survey. How to do so is covered in my post on Using Logic in Qualtrics. See below for more details on how to bounce MTurk participants.

Option 3: Set up survey so that it bounces people based on low response time.

This is my preferred method, with caveats. First, do not exclude based on total survey duration only. Workers are savvy to that and will simply open the survey, agree to participant and leave it hanging on one page while they do another HIT. Second, time everything. That is, every single page your participants will see.

I incorporated this method in my last MTurk data collection. The good: It does catch many bad responses. The bad: Workers will try to do it over and over again to figure out how to game the system. You can prevent this by selecting “prevent ballot box stuffing” in Survey Options. I generally don’t like to do this, because sometimes two good MTurk workers share the same computer. I don’t like to lose these folks.

But after having to discard hundreds of duplicates this time, I believe it may be necessary.

For both MTurk and undergraduate samples, I use a series of exclusion criteria.

1. Overall survey duration

First, I use total survey duration. This will catch the least cunning. To do this, first you have to set up an embedded data variable in Survey Flow. You can do this anywhere in the survey; just add it to an existing embedded data block, or add a new one. Choose Total Duration from the Survey Metadata options (see below).

Add an Embedded Data block, and set it to collect overall survey duration by choosing “Total Duration.” It will report in seconds.

Next, you will need to add a Branching block in Survey Flow before your final block. Your final block should be where you assign a random survey code (for Workers to input in MTurk when they submit the HIT). I also ask them to give me their WID on the final page, and have a comments section. You need to stop them from arriving at this final block if they have rushed through the survey.

Use the branch to set up a condition that will send them to a separate end of survey rather than your final block. In the example below, I redirected all participants who had spent less than 20 minutes in the survey.

See how to customize the End of Survey so that they are redirected in my Using Logic blog post.

My current customized low-time message reads as follows:

"You have spent insufficient time answering the questions and will not be paid.
By insufficient time, I mean less than one third of the time it takes me (a person who knows--in fact MADE--the questionnaires) to read the items."
2. Time spent on separate pages.

First, make sure you exclude people who fail to spend enough time doing the most important activity in your study design. It might be reading the instructions, or doing a “trial” task, if you expect them to need more time to read fully the first time. It might be completing a questionnaire that you are developing. Perhaps you’ve asked them to spend 1-2 minutes on a free response; you could exclude those who spend less than one minute. If you ask them to watch a video, make sure you time that page and exclude everyone who spends less time on it than the video takes to play.

Ideally, you will have a series of criteria for exclusions based on time spent on various pages. You could do this for ALL your pages, but if it’s a long survey, you will get tired of setting up survey flow.

Note: Be fair. Assume that you will have some fast readers, so set the time fairly low. Remember that if workers are completing the same task many times, they will get faster at it.

Once you have decided on your exclusion criteria, you must set up survey flow to redirect them to a non-paying end-of-survey option if they do not meet those criteria. You will need a separate branch for each timing criteria.

Above, participants who did not spend at least 25 seconds on the two pages I was timing were excluded automatically.

Remember that if you are randomly assigning to conditions that are part of your criteria, you will have to use if (condition x) is displayed AND time (page submit) is less than (y), then… branch logic.

In this survey, participants were randomly assigned to “cats” or “dogs,” so I had to exclude people who failed to spend 60 seconds on the page they were randomly assigned to. You can do so using If… and “and”.

Purposeful negligence

What about workers who simply refuse to do the task? Last fall I asked participants to spend 2-3 minutes writing a description of a person. Four copy-pasted from an Atlantic Monthly June 2016 article. Two copy-pasted from Bloomberg. Nearly a dozen copy-pasted from the prompt they were given.

The only reason I caught them (and subsequently rejected their HITs… although I did miss a few) was that I personally inspected the data. This is why I do not use more automated solutions, such as TurkPrime. I want to use my own eyes to detect inconsistencies, at least in most projects. Clearly there are datasets that are too big to “look at” literally.

You could do manual searches or set up code to find responses that matched part or all of your prompt. I suppose it would also be possible to use a plagiarism checker (like turnitin) to catch plagiarism.

One way or the other, if you use open-ended responses, you need to spend more effort in data cleaning. I try to avoid them just for this reason (and because they require coding), but open-ended responses also provide rich data and good manipulation checks.

What if a worker doesn’t input the code?

(See my first blog on how to use MTurk with Qualtrics for how to assign unique, randomly generated codes.)

Sometimes workers will not input the code. Some input their worker ID (WID). Some leave it blank. Others make up a code. I look at their data and make a judgment call. If it was an honest mistake (WID) or they contact me about it (saying they didn’t see a code or that the survey timed out), I always pay. If they make up a code, chances are it’s bogus and I don’t pay.

I don’t want to fail to pay anyone who deserves it, so in case of doubt I always pay. That said, I really don’t want to pay cheaters, so I set up multiple checks. On the final page (if people pass the criteria), they will get their unique code and have to input their WID. I also collect WIDs in embedded data (because sometimes people make typos). Ideally, all three will be “correct.” But things happen.

Now that I bounce people before the last page if they do not meet criteria, I can tell if they haven’t been presented (as they shouldn’t have been) with the code. First, Qualtrics tells me that the page was not presented to the participant. Second, they didn’t input their WID (which I specify in MTurk is a requirement for payment, as is the unique code).

You will get some workers claiming to have not seen the page, when they didn’t because they were bounced (at which point they get a “You didn’t spend enough time on this survey to have read, blah blah End-of-Survey message). They are just testing to see if they can get paid anyway. Say no. You will get some who should have seen the page but didn’t. Pay them. They won’t be many. (maybe 0.5%).

That’s all I can think of for now! If you have a question, leave it in the comments and I will try to answer!

Buy Me a Coffee at ko-fi.com

2 thoughts on “How to set up your survey to collect data on MTurk: Advice & suggestions”

  1. Pingback: Tips on using MTurk with Qualtrics : Wild World of Research

  2. Pingback: Format a document in Word for papers, theses, & dissertations (APA Style)

Leave a Reply