Critical thoughts about big data analysis
Completed on 02-Jul-2016 (57 days)
The immediate consequence of the previous section
ideas is that (big-)data modelling isn't cheap; at least, not reliable-enough data modelling.
Main issues to bear in mind on the budget front:
The more effort is put in the development of the model, the better. Top quality doesn't just imply building an accurate algorithm, but also: making it as scalable and adaptable as possible, collecting (filtering, correcting, grouping, etc.) the best information, developing all the required complementary applications (e.g., model assessment or intermediate actions automation), writing descriptive documentation about each part, etc.
Arbitrarily constrained resources have a negative impact on the model. Examples: setting unrealistic targets or expecting to be lied (i.e., looking-nice-rather-than-reliable conclusions).
Developing a data model rarely represents a one-time event. The most comprehensive and adaptable model needs to be tuned when its associated conditions change. Even under more or less stable conditions, models are formed by complex algorithms based upon multiple assumptions; correcting eventual errors, further extending their applicability or upgrading their functionalities are somehow common requirements. Actually, numerical models usually reach their top performance as a result of an evolution, rather than right after having been created.
The main reason to build a (big-)data model is precisely coming up with a cost-efficient solution for a given problem. Saving money in the development of the tool precisely meant to help save money doesn't make too much sense; much less when arbitrary restrictions might provoke a (perhaps uncorrectable or, even worse, undetectable) reduction of its money-saving effects.