Sunday, May 19, 2019

A Little Matter of Data Quality

A quote from Andrew Gelman:

So it’s good to be reminded: “Data” are just numbers. You need to know where the data came from before you can learn anything from them.

If you have followed my blog over the last few months, you have an idea of what I've been going on about, yeah? Numbers mean squat if I cannot trust their source. Think about that the next time someone gives you an improbably claim, offers up some very complex looking tables, figures, and test statistics, and then hopes you don't notice that the tables are a bit odd, the marginal means and cell means don't quite mesh they way they should, or that there were serious decision errors. Beware especially of work coming from researchers who are unusually prolific at publishing findings utilizing methods that would take heroic team efforts to publish at that rate, let alone a single individual. Garbage data give us garbage findings more often than not. Seems like a safe enough bet.

I go on about this because there is plenty of dodgy work in my field. There is reason to be concerned about some of the zombies (i.e., phenomena that should have been debunked that continue to be taught and treated as part of our popular lore) in my field. Stopping the proliferation of these zombies at this point is a multifaceted effort. Part of that effort is making sure we can actually examine the data from which findings are derived. In the meantime, remember rule #2 for surviving a zombie apocalypse (including zombie concepts): the double tap.

No comments:

Post a Comment