Note: This blog post is for marketing research professionals and those who work with quantitative data regularly.
Research is only as good as the data behind it. In the process of drafting a research plan, building a questionnaire, reviewing documents with teammates and clients, programming the survey, analyzing data, and putting together the report, a piece that doesn’t get enough attention is data cleaning. I believe this can be the case on the both the supplier-side and the client-side of the marketing research industry.
Clients will seldom notice – there may not even be a way for them to know – if many of the responses are junk. A researcher that goes the extra mile to ensure only quality data is used is not rewarded. In fact, their efforts are almost never even noticed. As a result, it is easy to deprioritize data cleaning. It is easier, cheaper, faster, and in some regards beneficial for your career to focus your energy elsewhere. Nevertheless, integrity calls us to do right by the decision makers using our data and give it the energy and attention it deserves.
Cleaning data varies in difficulty depending on the questionnaire and the sample size. In the B2B world of research where I work, sample sizes are typically under 500, which makes a high-quality cleaning of data quite reasonable. There are four areas I look to find “speeders and cheaters.” The first two are obvious.
Looking at the speed at which respondents complete the survey is the most obvious. I generally calculate the median time to complete the survey and note all those who completed it in less than half that amount. That’s strike one. Additionally, straight-liners are easy to find. These have a tendency to skew data positively as they often select the top answer choice – usually the most positive one – for all or most questions to get through the questionnaire quickly. This correlates with speed, but not perfectly.
The other two steps are less obvious, but extremely helpful in doing a proper job cleaning data: trap questions and open-ends. Trap questions are the easiest and least time consuming for researchers. Just add a question such as “select orange” and list several colors or “what does comes after Wednesday” and list days of the week. Have your survey automatically terminate respondents (or bots) speeding through too quickly to read the question and answer it correctly. There’s no data cleaning for the researcher on the back-end as they’re removed from your data in real time.
My last suggestion is the one I don’t feel researchers do enough because it is time-consuming and thankless. Include open-ends…and actually read them. In B2B research, one of the most effective ways to screen with open-ends is to ask job title as an open-ended question. You’d be amazed at how many “CEOs” of hospitals take surveys! Even in B2C surveys, questions like “what’s the primary reason you say that?” after an appeal or purchase intent question will often yield nonsensical answers that prompt you to remove a bad respondent.
What I’ve found to be most effective is using them all in combination. It is as much art as science. A respondent could have been confused on one open-end. They could have taken the survey seriously, but quickly, or they could have genuinely answered all the questions with the first choice and they weren’t straight-lining. Looking for two areas where they went wrong is key. Two strikes and your out!
None of this is rocket science, but it is an easy corner to cut. There seems to be little reward for taking the time to get something like this right, but for the sake of our clients and our industry I hope it is a corner you choose not to cut.