Kaggle Competition Past Winner Solutions

We learn more from code, and from great code. Not necessarily always the 1st ranking solution, because we also learn what makes a stellar and just a good solution. I will post solutions I came upon so we can all learn to become better!

I collected the following source code and interesting discussions from the Kaggle held competitions for learning purposes. Not all competitions are listed because I am only manually collecting them, also some competitions are not listed due to no one sharing. I will add more as time goes by. Thank you.


Prudential Life Insurance Assessment [Mon 23 Nov 2015– Mon 15 Feb 2016]
Rossmann Store Sales [Wed 30 Sep 2015– Mon 14 Dec 2015]
Airbnb New User Bookings [Wed 25 Nov 2015– Thu 11 Feb 2016]


Walmart Recruiting: Trip Type Classification [Mon 26 Oct 2015– Sun 27 Dec 2015 ]


Algorithmic Trading Challenge

Allstate Purchase Prediction Challenge

Amazon.com – Employee Access Challenge

AMS 2013-2014 Solar Energy Prediction Contest

Belkin Energy Disaggregation Competition

Challenges in Representation Learning: Facial Expression Recognition Challenge

Challenges in Representation Learning: The Black Box Learning Challenge

Challenges in Representation Learning: Multi-modal Learning

Detecting Insults in Social Commentary

EMI Music Data Science Hackathon

Galaxy Zoo – The Galaxy Challenge

Global Energy Forecasting Competition 2012 – Wind Forecasting

KDD Cup 2013 – Author-Paper Identification Challenge (Track 1)

KDD Cup 2013 – Author Disambiguation Challenge (Track 2)

Large Scale Hierarchical Text Classification

Loan Default Prediction – Imperial College London

Merck Molecular Activity Challenge

MLSP 2013 Bird Classification Challenge

Observing the Dark World

PAKDD 2014 – ASUS Malfunctional Components Prediction

Personalize Expedia Hotel Searches – ICDM 2013

Predicting a Biological Response

Predicting Closed Questions on Stack Overflow

See Click Predict Fix

See Click Predict Fix – Hackathon

StumbleUpon Evergreen Classification Challenge

The Analytics Edge (15.071x)

The Marinexplore and Cornell University Whale Detection Challenge

Walmart Recruiting – Store Sales Forecasting

Source: http://www.chioka.in/kaggle-competition-solutions/


Read More


A time series is a sequence of data points, typically consisting of successive measurements made over a time interval. Forecasting is the use of a model to predict future based on past informations. This problem is a bit different to what most known as the pure cross-sectional problem.

In this post, I take the recent Kaggle challenge as example, sharing the finding and tricks I used. The competition – Rossmann Store Sales – attracted 3,738 data scientists, making it the second most popular competition by participants ever. Rossmann is a drug store giant operates over 3,000 stores in European, who challenged Kagglers to forecast 6 weeks of daily sales for 1,115 stores. The data is mainly comprised of store index, store type, competitor store information, holiday event, promotion event, whether store open, customers, and the sales which is what we’re tasked to predict.

Doing time series forecasting, a few things specific to time-series you need to know about are

  1. time-dependent features.
  2. validating by time.

I will walk thorough them latter. (more…)

Read More

Walmart Recruiting II: Sales in Stormy Weather

Predict how sales of weather-sensitive products are affected by snow and rain

Walmart operates 11,450 stores in 27 countries, managing inventory across varying climates and cultures. Extreme weather events, like hurricanes, blizzards, and floods, can have a huge impact on sales at the store and product level.

In their second Kaggle recruiting competition, Walmart challenges participants to accurately predict the sales of 111 potentially weather-sensitive products (like umbrellas, bread, and milk) around the time of major weather events at 45 of their retail locations.

Intuitively, we may expect an uptick in the sales of umbrellas before a big thunderstorm, but it’s difficult for replenishment managers to correctly predict the level of inventory needed to avoid being out-of-stock or overstock during and after that storm. Walmart relies on a variety of vendor tools to predict sales around extreme weather events, but it’s an ad-hoc and time-consuming process that lacks a systematic measure of effectiveness.

Helping Walmart better predict sales of weather-sensitive products will keep valued customers out of the rain. It could also earn you a position at one of the most data-driven retailers in the world!

You have been provided with sales data for 111 products whose sales may be affected by the weather (such as milk, bread, umbrellas, etc.). These 111 products are sold in stores at 45 different Walmart locations. Some of the products may be a similar item (such as milk) but have a different id in different stores/regions/suppliers. The 45 locations are covered by 20 weather stations (i.e. some of the stores are nearby and share a weather station).

The competition task is to predict the amount of each product sold around the time of major weather events. For the purposes of this competition, we have defined a weather event as any day in which more than an inch of rain or two inches of snow was observed. You are asked to predict the units sold for a window of ±3 days surrounding each storm.


Read More