4th place solution to datafactory challenge by Intermarché.

Last update: Mar 19, 2022

Related tags

Overview

Solution to Datafactory challenge by Intermarché.

4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to predict the sales made by intermarche in the first quarter of 2019. We have the data of the past year (2018) to train our model to fit the sales.

Data 💿

We have the record of sales for a set of pairs (store, item) and for each day of 2018 (if there was at least one sale). The data are structured as:

date	store	item	quantity
2018-01-01	1	12	1
2018-01-01	1	17	2
2018-01-01	1	22	3

We have additional tables available such as:

Product characteristics.
Store characteristics.
Product prices by store and by quarter.

Solution 🤖

The main difficulty of the challenge is to find the days for which a store has recorded no sales for a given product. Indeed, Intermarché does not provide records for which the target variable (quantity) is equal to 0. I found that adding up to 5 zeros after a sale for a given pair (store / item) maximized the performance of my model and limited the overfitting of my aggregates.

Features:

Aggregates by item / store (mean + std)
Aggregates on prices. (mean)
Aggregates on the characteristics of the stores. (mean)
Aggregates on product characteristics. (mean)
Rolling medians over the last 9 weeks.
Features on dates. (weekend / holidays / day of the week)

I used LightGBM and performed a 3-fold cross-validation with bagging to make my prediction. I transformed the target variable to train my model using quantity = log(1 + quantity). Poisson loss helps a bit. I didn't look for the hyperparameters of the model.

Finally I set all predictions of February and March as the predictions of the second and third week of January.

Also I set to 0 the set of predictions associated to triplets (store / item / day of the week) for which we have not enough records in the training set.

Run ♻️

To reproduce my results, you must download the data in the folder data/raw.

python scripts/prepare_raw_data.py
python scripts/features/aggs_items.py
python scripts/features/aggs_prices.py
python scripts/features/aggs_stores.py
python scripts/features/aggs.py 
python scripts/features/lags.py
python scripts/features/cal.py 
python scripts/make_train_test.py
python scripts/learn.py
python scripts/polish_sub.py

License

This project is free and open-source software licensed under the MIT license.

4th place solution to datafactory challenge by Intermarché.

Related tags

Overview

Solution to Datafactory challenge by Intermarché.

Data 💿

Solution 🤖

Run ♻️

License

Owner

Raphael Sourty

Transformer Huffman coding - Complete Huffman coding through transformer

A CROSS-MODAL FUSION NETWORK BASED ON SELF-ATTENTION AND RESIDUAL STRUCTURE FOR MULTIMODAL EMOTION RECOGNITION

Implementation of UNET architecture for Image Segmentation.

Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

kullanışlı ve işinizi kolaylaştıracak bir araç

This repo implements a 3D segmentation task for an airport baggage dataset.

Public repo for the ICCV2021-CVAMD paper "Is it Time to Replace CNNs with Transformers for Medical Images?"

Code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2021

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Estimation of human density in a closed space using deep learning.

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

Real-world Anomaly Detection in Surveillance Videos- pytorch Re-implementation

An evaluation toolkit for voice conversion models.

Code for CVPR 2018 paper --- Texture Mapping for 3D Reconstruction with RGB-D Sensor

A model which classifies reviews as positive or negative.

Official PyTorch implementation of "Synthesis of Screentone Patterns of Manga Characters"

Official Pytorch Code for the paper TransWeather

Parameter Efficient Deep Probabilistic Forecasting

Using Self-Supervised Pretext Tasks for Active Learning - Official Pytorch Implementation