ASA President meets OCCAM data

Just leaving this quote from ASA President Jessica Utts here (Source: Amstat News Dec 2016):

A few days ago, I was in Vietnam and took a four-hour bus ride from Ha Long Bay to Hanoi. When I arrived, my fitness tracker had given me credit for taking 9,124 steps and climbing 81 flights of stairs during those four hours, even though I only left my seat once during a short rest stop.

In the opposite extreme, I once walked the full length of the Atlanta airport with my hand on my four-wheeled suitcase and got no credit for any steps. I've noticed a similar lack of credit when wheeling a grocery cart, and pushing a baby stroller allegedly has the same effect.

Great example of why (seemingly) complete data completely fail the analyst. Imagine the poor data analysts and "scientific" researchers mining and squeezing every ounce of information out of the data with their algorithmic bags of tricks.

And this is not just fun and games, either.

The health plan where [her friend] works sets rates based on data acquired from employees' personal fitness devices!

This is the trouble of using "adapted" data – data collected for other purposes. The requirements of quality and integrity vary by application, and business managers who blindly buy such "innovation" may end up losing a lot of money.

Worse than that, if one knows that the health plan sets rates based on the number of steps taken, one can easily hang the device off one's dog, or design any number of tactics to fool the machine.

The Fitbit-type data is a great example of OCCAM: observational, no controls, seemingly complete, adapted and merged datasets that are the norm in the Big Data age – and such data should not be analyzed without a lot of thinking!

More on OCCAM data here, here.

 

Big Data, Plainly Spoken (aka Numbers Rule Your World)