Stop Letting Garbage Data Get in the Way of Good Survey Results

Stop Letting Garbage Data Get in the Way of Good Survey Results

As the saying goes, “garbage in, garbage out.” The term comes from computer science but is true of many things, including surveys. Here are the best practices you can employ today to prevent garbage data from diluting your results. Instead, you will be able to be confident in your data and the decisions you need to make based on it.  


Modern tools and methods make it easy to collect the data you need from the people who have it. Unfortunately, not all data is created equal. How can you trust that the data you’ve collected is good data?

While you can’t block 100% of all bad data from getting into your data set, you can employ a few best practices to block the majority of it. I know what you’re thinking: “I can’t tolerate any percent of bad data in my data set.” Good news there too, because there are proven ways to identify poor-quality data points within your data set and remove them.

Therein lies the structure of how to think about improving data quality. There’s the proactive approach of preventing poor-quality data from getting into the data set, and the reactive approach of dealing with poor-quality data once it’s already in your data set.

The following best practices are framed with that delineation in mind.

How do you prevent bad data in the first place?

Smart research design can systematically prevent poor-quality data from entering your data set. Once you integrate these best practices into your questionnaire, you can sit back and let the survey do all the heavy lifting.

  • Red Herrings — A red herring is a fake answer choice amid an otherwise legitimate question. It is intended to catch those who are trying to “guess the right answer” to gain access to the survey. In the example below, “WorkChat” is a made-up product and if respondents selected it, they would be terminated from the survey.

  • Knowledge Checks — These questions are designed to test respondents’ knowledge of a topic that they’ll be asked about within the survey. The question should have an objectively correct answer, but not be something that can be quickly googled. As an example, don’t ask just to verify an acronym since anyone can look it up and answer correctly in a matter of seconds.

  • Attention Checks — This one is self-explanatory; it is designed to make sure the respondent is paying attention, and in the case of large-scale consumer surveys, catch bots. The question will have one obvious correct answer.

How do you identify bad data once it’s already there?

While preventative measures can do the heavy lifting, you will still likely need to do some fine-tuning. These best practices will highlight potential issues and turn what could otherwise be a several-hour manual process into an efficient, brief exercise in data cleaning.

  • Straight-Lining — This calls out respondents who select all the same answers down a column or across a row. It’s important to note that this requires a judgment call. It may be entirely reasonable for someone to select down a row and be providing perfectly reasonable responses. However, in the example below, it would be highly unlikely that a respondent would feel that the highest level of satisfaction applies to all factors.

  • Speeders — This category determines if a respondent completed the survey “too quickly.” This is also a judgment call. Although, if you have a respondent who completed the survey in half the amount of time as everyone else, that’s a strong reason to suspect that the person may not have been paying full attention.

 Caveat: Make sure to factor in any logic in your survey that would show a different number of questions to respondents based on their answers. You can have drastic differences in completion times for perfectly valid reasons.

  • Custom Flags — If you have questions that might allow respondents to provide contradictory answers, you can set up custom flags that will trip if a logical fallacy occurs. The flag shows only on the back end, but it allows you to quickly identify those flags in the data, review the responses, and determine if you need to remove them from the data set. It’s a quick way to quality-check the data and make informed decisions about how you deal with potentially poor data.
  • Open-Ended Questions — In addition to providing qualitative data, open-ended questions allow you to assess the quality of respondents. Are they answering the question being asked? Are they providing coherent and cogent responses? If not, you can remove them from the data set. If it’s less clear — perhaps the individual just misunderstood the question — then you can review the rest of that respondent’s data and make a more informed decision about how to handle it.

The Takeaway

There are a lot of ways to improve the quality of your data set. Some tactics can be used to prevent bad data from getting into your data set. Some tactics can be used to run a quality check on the data you’ve already collected.

You can use all these tactics or select a few applicable to your specific project. Either way, stop letting garbage data get in the way of good research and your ability to make confident decisions.


Check out the other articles in our Survey Series:


Will Mellor leads a team of accomplished project managers who serve financial service firms across North America. His team manages end-to-end survey delivery from first draft to final deliverable. Will is an expert on GLG’s internal membership and consumer populations, as well as survey design and research. Before coming to GLG, he was the VP of an economic consulting group, where he was responsible for designing economic impact models for clients in both the public sector and the private sector. Will has bachelor’s degrees in international business and finance and a master’s degree in applied economics.