About varocarbas.com


About me


Contact me


Visit customsolvers.com, my main website


Valid markup


Valid CSS


© 2015-2018 Alvaro Carballo Garcia


URL friendly


Optimised for 1920x1080 - Proudly mobile unfriendly

All projects in full-screen mode


Project 10

Project 9

Project 8

FlexibleParser code analysis:




Chromatic encryption

(v. 1.3)

Pages in customsolvers.com:

Upcoming additions

Failed projects

Active crawling bots:

Ranking type 2


FlexibleParser raw data:

Unit conversion (UnitParser)

Compound types (UnitParser)

Timezones (DateParser)

Currently active or soon to be updated:

Domain ranking

Project 10 is expected to be the last formal project of varocarbas.com. I will continue using this site as my main self-promotional R&D-focused online resource, but by relying on other more adequate formats like domain ranking.
Note that the last versions of all the successfully completed projects (5 to 10) will always be available.
Completed (57 days)

Data modelling >

Big data peculiarities

Completed (26 days)
Completed (47 days)
Completed (19 days)
Completed (14 days)
Critical thoughts about big data analysis
Completed on 02-Jul-2016 (57 days)

Project 9 in full-screenProject 9 in PDF

As already explained, I have recently been working on various big-data-related developments (the appendix of this project includes my detailed impressions about one of them). These experiences have helped me gain relevant insights into big data forecasting, as opposed to what is associated with my more-restricted-model background.

The most relevant differences which I observed when facing the aforementioned big-data problems are summarised in the following points:
  • Building comprehensive models (i.e., ones adequately accounting for virtually any sub-situation) is very difficult; in most of the cases, such a proceeding isn't even recommendable. The next point helps to understand this issue better; more specifically: the big-data expectations and/or assessing methodologies tend to favour not-so-bad-for-the-most outputs what penalises slightly-mispredicting-more-insightful approaches.
  • Generic assessing methodologies. A descriptive example to illustrate this point: by taking an average-based methodology and assuming that the modelled behaviour is defined by (input=>output) 1=>2, 2=>3 and 3=>1, predictions of the form 1=>2, 2=>2 and 3=>2 would be assumed perfect. Such a proceeding would provoke a relevant penalisation for high-accuracy-prone attempts: in the very unlikely scenario of delivering an actually-perfect answer, it would get the same score than the aforementioned simplistic average-value result; in any other case, it would be worse independently upon its real understanding of the underlying behaviour.
  • As a consequence of the two previous points, getting adapted to the peculiarities of this format seems an unavoidable requirement. Even a priori easy and intuitive ideas (e.g., keeping it as simple as possible) cannot be immediately applied, mainly in case of coming from a different background. The big-data character (i.e., huge training data sets, together with the conditions and expectations usually associated with these problems) has certainly a big influence on the way in which the given model is being developed.