Critical thoughts about big data analysis

Completed on 02-Jul-2016 (57 days)

Project 9 in full-screenProject 9 in PDF
Conclusions >
Accuracy redefinition

Accuracy is a very wide concept with many different implications, much more when dealing with the so complex (big) data reality. That's why the critic of this part was, purely speaking, divided into two different sections: Accuracy unconcerned and Shortsighted goals. All the conclusions of this two-part analysis will be included in the current section.

The basic idea is that the (big-data) model accuracy needs to be redefined because of having virtually lost all its meaning. Such a generic statement can be better understood as opposed to the following common misconceptions:
  • Big-data modelling is easy and anyone can get worthy results. Numerical modelling is a complex subfield where having a solid heterogeneous background (e.g., mathematics, programming, specific model-development expertise, etc.) represents an almost unreplaceable minimum requirement. Additionally and even under ideal conditions, the generated outputs are just probably-correct guesses. There is no intuitive, easy-for-everyone way to quickly build reliable models accounting for random situations.
  • Generally-applicable absolute truths are possible or a properly-built model can work forever. Although mathematics (unlikely science, understood as causal phenomena descriptor) can deliver absolute always-working truths, numerical models define variable realities where such an eventuality isn't possible. Additionally, models rarely possess a truly deep understanding of the given reality.
  • The more data, the better the model. Unlikely what happens with all the previous misconceptions, this statement might hold under very specific conditions (which, on the other hand, are rarely present). No good model can be built over bad-quality data; the higher the amount of data, the more difficult it to be high quality. Note that data quality improvements involve a set of imprecisely complex actions; that is: ensuring the (high) quality of certain dataset represents a notable increase of uncertainty.