4th place solution to datafactory challenge by Intermarché.

Overview

Solution to Datafactory challenge by Intermarché.

4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to predict the sales made by intermarche in the first quarter of 2019. We have the data of the past year (2018) to train our model to fit the sales.

Data 💿

We have the record of sales for a set of pairs (store, item) and for each day of 2018 (if there was at least one sale). The data are structured as:

date store item quantity
2018-01-01 1 12 1
2018-01-01 1 17 2
2018-01-01 1 22 3

We have additional tables available such as:

  • Product characteristics.
  • Store characteristics.
  • Product prices by store and by quarter.

Solution 🤖

The main difficulty of the challenge is to find the days for which a store has recorded no sales for a given product. Indeed, Intermarché does not provide records for which the target variable (quantity) is equal to 0. I found that adding up to 5 zeros after a sale for a given pair (store / item) maximized the performance of my model and limited the overfitting of my aggregates.

Features:

  • Aggregates by item / store (mean + std)
  • Aggregates on prices. (mean)
  • Aggregates on the characteristics of the stores. (mean)
  • Aggregates on product characteristics. (mean)
  • Rolling medians over the last 9 weeks.
  • Features on dates. (weekend / holidays / day of the week)

I used LightGBM and performed a 3-fold cross-validation with bagging to make my prediction. I transformed the target variable to train my model using quantity = log(1 + quantity). Poisson loss helps a bit. I didn't look for the hyperparameters of the model.

Finally I set all predictions of February and March as the predictions of the second and third week of January.

Also I set to 0 the set of predictions associated to triplets (store / item / day of the week) for which we have not enough records in the training set.

Run ♻️

To reproduce my results, you must download the data in the folder data/raw.

python scripts/prepare_raw_data.py
python scripts/features/aggs_items.py
python scripts/features/aggs_prices.py
python scripts/features/aggs_stores.py
python scripts/features/aggs.py 
python scripts/features/lags.py
python scripts/features/cal.py 
python scripts/make_train_test.py
python scripts/learn.py
python scripts/polish_sub.py

License

This project is free and open-source software licensed under the MIT license.

Owner
Raphael Sourty
PhD Student @ IRIT and Renault
Raphael Sourty
The original implementation of TNDM used in the NeurIPS 2021 paper (no longer being updated)

TNDM - Targeted Neural Dynamical Modeling Note: This code is no longer being updated. The official re-implementation can be found at: https://github.c

1 Jul 21, 2022
Animate molecular orbital transitions using Psi4 and Blender

Molecular Orbital Transitions (MOT) Animate molecular orbital transitions using Psi4 and Blender Author: Maximilian Paradiz Dominguez, University of A

3 Feb 01, 2022
Multi-Task Deep Neural Networks for Natural Language Understanding

New Release We released Adversarial training for both LM pre-training/finetuning and f-divergence. Large-scale Adversarial training for LMs: ALUM code

Xiaodong 2.1k Dec 30, 2022
High-Resolution Image Synthesis with Latent Diffusion Models

Latent Diffusion Models arXiv | BibTeX High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach*, Andreas Blattmann*, Dominik Lorenz

CompVis Heidelberg 5.6k Dec 30, 2022
DetCo: Unsupervised Contrastive Learning for Object Detection

DetCo: Unsupervised Contrastive Learning for Object Detection arxiv link News Sparse RCNN+DetCo improves from 45.0 AP to 46.5 AP(+1.5) with 3x+ms trai

Enze Xie 234 Dec 18, 2022
Tutorial repo for an end-to-end Data Science project

End-to-end Data Science project This is the repo with the notebooks, code, and additional material used in the ITI's workshop. The goal of the session

Deena Gergis 127 Dec 30, 2022
ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022
Code for our paper 'Generalized Category Discovery'

Generalized Category Discovery This repo is a placeholder for code for our paper: Generalized Category Discovery Abstract: In this paper, we consider

107 Dec 28, 2022
U-Time: A Fully Convolutional Network for Time Series Segmentation

U-Time & U-Sleep Official implementation of The U-Time [1] model for general-purpose time-series segmentation. The U-Sleep [2] model for resilient hig

Mathias Perslev 176 Dec 19, 2022
A compendium of useful, interesting, inspirational usage of pandas functions, each example will be an ipynb file

Pandas_by_examples A compendium of useful/interesting/inspirational usage of pandas functions, each example will be an ipynb file What is this reposit

Guangyuan(Frank) Li 32 Nov 20, 2022
Cleaned test data list of DukeMTMC-reID, ICCV2021

Cleaned DukeMTMC-reID Cleaned data list of DukeMTMC-reID released with our paper accepted by ICCV 2021: Learning Instance-level Spatial-Temporal Patte

14 Feb 19, 2022
A light and fast one class detection framework for edge devices. We provide face detector, head detector, pedestrian detector, vehicle detector......

A Light and Fast Face Detector for Edge Devices Big News: LFD, which is a big update of LFFD, now is released (2021.03.09). It is strongly recommended

YonghaoHe 1.3k Dec 25, 2022
WRENCH: Weak supeRvision bENCHmark

🔧 What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development

Jieyu Zhang 176 Dec 28, 2022
Alignment Attention Fusion framework for Few-Shot Object Detection

AAF framework Framework generalities This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is

Pierre Le Jeune 20 Dec 16, 2022
Official Implementation of VAT

Semantic correspondence Few-shot segmentation Cost Aggregation Is All You Need for Few-Shot Segmentation For more information, check out project [Proj

Hamacojr 114 Dec 27, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 29 Jan 08, 2023
A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

Mathieu Godbout 1 Nov 19, 2021
Agile SVG maker for python

Agile SVG Maker Need to draw hundreds of frames for a GIF? Need to change the style of all pictures in a PPT? Need to draw similar images with differe

SemiWaker 4 Sep 25, 2022
ByteTrack: Multi-Object Tracking by Associating Every Detection Box

ByteTrack ByteTrack is a simple, fast and strong multi-object tracker. ByteTrack: Multi-Object Tracking by Associating Every Detection Box Yifu Zhang,

Yifu Zhang 2.9k Jan 04, 2023