Critical thoughts about big data analysis

Completed on 02-Jul-2016 (57 days)

Project 9 in full-screenProject 9 in PDF
Critic >
Data hording

An increasingly big number of businesses are realising about the numerous benefits of adequately understanding their clients' data. Additionally, internet and the associated huge amount of valuable information have played an important role in the wide adoption of data-analysis techniques. In fact, so much interest and availability of (free and easy-in-appearance) resources have provoked situations where data models are heavily misused. This section deals precisely with one of the main consequences of such a reality.

With data hording, I refer to a widely-spread-among-online-businesses attitude involving the collection of as much information as possible without properly analysing it. Such a (mis)proceeding has been somehow supported by the irruption of numerous big-data tools, commonly misunderstood as easy ways for people from any background to intuitively get worthy conclusions.

The aforementioned misconception has a negative impact on different fronts:
  • The underlying assumptions about data analysis (i.e., it is easy, anyone can do it, generally-applicable and absolute answers can be expected, etc.) provoke negligent behaviours and the information to not be properly maximised. The most likely consequence of this point is the allocation of disproportionately-restricted resources; for example: unexperienced analysts, too limited budget/time constraints or unrealistic expectations.
  • The more information, the more difficult to create a reliable model. In fact, notable increases in the amount of information being accounted usually provoke (even beyond-acceptable) increases of noise, what makes very difficult to create a model.
    Thus, more data is only better if the following two conditions are met: the quality of the additional information is high enough (or, at least, the noise increase is kept under control); and the given model is properly adapted (i.e., tuned, extended or even re-built) to account for all the additional information. Logically, this level of care is incompatible with the described essentially careless behaviours.