DATA MINING PROBLEMS IN RETAIL

Retail is one of the most important business domains for data science and data mining applications because of its prolific data and numerous optimization problems such as optimal prices, discounts, recommendations, and stock levels that can be solved using data analysis methods. The rise of omni-channel retail that integrates marketing, customer relationship management, and inventory management across all online and offline channels has produced a plethora of correlated data which increases both the importance and capabilities of data-driven decisions.

Although there are many books on data mining in general and its applications to marketing and customer relationship management in particular [BE11, AS14, PR13 etc.], most of them are structured as data scientist manuals focusing on algorithms and methodologies and assume that human decisions play a central role in transforming analytical findings into business actions. In this article we are trying to take a more rigorous approach and provide a systematic view of econometric models and objective functions that can leverage data analysis to make more automated decisions. With this paper, we want to describe a hypothetical revenue management platform that consumes a retailer’s data and controls different aspects of the retailer’s strategy such as pricing, marketing, and inventory:

leading-diagram

(more…)

Read More

Xgboost example 1

The purpose of this Vignette is to show you how to use Xgboost to discover and understand your own dataset better.

This Vignette is not about predicting anything (see Xgboost presentation). We will explain how to use Xgboost to highlight the link between the features of your data and the outcome.

Pacakge loading:

require(xgboost)
require(Matrix)
require(data.table)
if (!require('vcd')) install.packages('vcd') 

VCD package is used for one of its embedded dataset only.

Preparation of the dataset

Numeric VS categorical variables

Xgboost manages only numeric vectors.

What to do when you have categorical data?

A categorical variable has a fixed number of different values. For instance, if a variable called Colour can have only one of these three values, red, blue or green, then Colour is a categorical variable.

In R, a categorical variable is called factor.

Type ?factor in the console for more information.

To answer the question above we will convert categorical variables to numeric one.

(more…)

Read More