About varocarbas.com

--

About me

--

Contact me

--

Visit customsolvers.com, another side of my work

--

Valid markup

--

Valid CSS

--

© 2015-2017 Alvaro Carballo Garcia

--

URL friendly

--

Optimised for 1920x1080 - Proudly mobile unfriendly

R&D projects RSS feed

All projects in full-screen mode

PDFs:

Project 10

Project 9

Project 8

FlexibleParser code analysis:

UnitParser

NumberParser

Tools:

Chromatic encryption

(v. 1.3)

Pages in customsolvers.com:

Upcoming additions

Failed projects

Active crawling bots:

Ranking type 2

(
)

FlexibleParser raw data:

Unit conversion (UnitParser)

Compound types (UnitParser)

Timezones (DateParser)

Currently active or soon to be updated:

Domain ranking

FlexibleParser (DateParser)

NO NEW PROJECTS:
Project 10 is expected to be the last formal project of varocarbas.com. I will continue using this site as my main self-promotional R&D-focused online resource, but by relying on other more adequate formats like domain ranking.
Note that the last versions of all the successfully completed projects (5 to 10) will always be available.
PROJECT 9
Completed (57 days)
Completed (26 days)
Completed (47 days)
Completed (19 days)
Completed (14 days)
Critical thoughts about big data analysis
Completed on 02-Jul-2016 (57 days)

Project 9 in full-screenProject 9 in PDF

After confirming that there wasn't enough time to maximise my approach, I took a quick peek at some of the public (Python) codes. Note that this was one of my first Kaggle challenges and I wasn't too sure about the exact meaning of these public contributions. Apparently, they were created by Kaggle's staff to provide some help to solvers.

This public code (note that all the ones I saw were slight modifications of the same algorithm) was performing notably better than my best attempt so far. Its basic structure was quite similar to the one of my approach; it was even accounting for the best combinations of variables as per my tests. On the other hand, it also had the following important differences with respect to my model:
  • It was accounting for the date/time variables in a quite complex way. During my tests, I did some (much more simplistic) attempts to bring this information into picture, but none of them provoked a relevant improvement.
  • It was filtering the cases on account of the is_booking variable. On one hand, the problem description clearly stated that this variable was considered in all the test cases; but on the other hand, all my tests and submissions on this front came to the conclusion that accounting for it wasn't beneficial. In fact, my last-moment tests, as described in the final paragraph of this section, seemed to support such conclusions.
  • It was giving some relevance to certain variable (distance) which my approach was ignoring. As per most of my tests, this variable had a quite low influence.
  • There were various looking-quite-arbitrary filters. Not sure about its exact motivation; perhaps a mistake or perhaps a new quite-complex-but-performing-surprisingly-well bit.
There wasn't much time remaining, this was my first contact ever with Python (and its spacing peculiarities) and that specific code was quite memory inefficient (at least, inefficient enough to not run on my computer). Despite all these problems, I was able to put together a reasonably-good benchmark which seemed to work perfectly; at least, until right the last moment (2 hours before the deadline), when the memory problems came back. My tests indicated that removing the reliance on is_booking would have allowed this model to score notably higher; but unfortunately, the last-moment memory problem didn't let me confirm such an assumption. It was definitively a curious end for a curious challenge.