Tuesday , November 13 2018
Home / Constantin Gurdgiev: True Economics / 16/10/18: Data analytics. It really is messier than you thought

16/10/18: Data analytics. It really is messier than you thought

Summary:
An interesting study (H/T to @stephenkinsella) highlights the problems with empirical determinism that is the basis for our (human) evolving trust in 'Big Data' and 'analytics': the lack of determinism in statistics when it comes to social / business / finance etc data.Here is the problem: researchers put together 29 independent teams, with 61 analysts. They gave these teams the same data set on football referees decisions to give red cards to players. They asked the teams to evaluate the same hypothesis: are football "referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players"?Due to a variation of analytic models used, the estimated models produced a range of answers, from the effect of skin color of the player on red card issuance being

Topics:
[email protected] (Constantin Gurdgiev) considers the following as important: , , , , , , ,

This could be interesting, too:

James Picerno writes Explaining The “Robot” ETF’s Bull Run With Factor Analysis

David Salem writes Notes from the Diamond #1: Always Something New to Learn

JW Mason writes Lecture Notes for Research Methods

Gregor Samsa writes Tweet Of The Day: ML & AI


An interesting study (H/T to @stephenkinsella) highlights the problems with empirical determinism that is the basis for our (human) evolving trust in 'Big Data' and 'analytics': the lack of determinism in statistics when it comes to social / business / finance etc data.

Here is the problem: researchers put together 29 independent teams, with 61 analysts. They gave these teams the same data set on football referees decisions to give red cards to players. They asked the teams to evaluate the same hypothesis: are football "referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players"?

Due to a variation of analytic models used, the estimated models produced a range of answers, from the effect of skin color of the player on red card issuance being 0.89 at the lower end or the range to 2.93 at the higher end. Median effect was 1.31. Per authors, "twenty teams (69%) found a statistically significant positive effect [meaning that they found the skin color having an effect on referees decisions], and 9 teams (31%) did not observe a significant relationship" [meaning, no effect of the players' skin color was found].

To eliminate the possibility that analysts’ prior beliefs could have influenced their findings, the researchers controlled for such beliefs. In the end, prior beliefs did not explain these differences in findings. Worse, "peer ratings of the quality of the analyses also did not account for the variability." Put differently, the vast difference in the results cannot be explained by quality of analysis or priors.

The authors conclude that even absent biases and personal prejudices of the researchers, "significant variation in the results of analyses of complex data may be difficult to avoid... Crowdsourcing data analysis, a strategy in which numerous research teams are recruited to simultaneously investigate the same research question, makes transparent how defensible, yet subjective, analytic choices influence research results."

Good luck putting much trust into social data analytics.

Full paper is available here: http://journals.sagepub.com/doi/pdf/10.1177/2515245917747646.

Leave a Reply

Your email address will not be published. Required fields are marked *