SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

Overview

License: MIT Python GitHub code size in bytes Downloads GitHub Workflow Status PyPI version GitHub issues GitHub commit activity GitHub last commit arXiv

[arXiv]

The main motivation of the SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service IQON, which was actually in operation for a decade. In addition, the SHIFT15M dataset has several types of dataset shifts, allowing us to evaluate the robustness of the model to different types of shifts (e.g., covariate shift and target shift).

We provide the Datasheet for SHIFT15M. This datasheet is based on the Datasheets for Datasets [1] template.

System Python 3.6 Python 3.7 Python 3.8
Linux CPU
Linux GPU
Windows CPU / GPU Status Currently Unavailable Status Currently Unavailable Status Currently Unavailable
Mac OS CPU

SHIFT15M is a large-scale dataset based on approximately 15 million items accumulated by the fashion search service IQON.

Installation

From PyPi

$ pip install shift15m

From source

$ git clone https://github.com/st-tech/zozo-shift15m.git
$ cd zozo-shift15m
$ poetry build
$ pip install dist/shift15m-xxxx-py3-none-any.whl

Download SHIFT15M dataset

Use Dataset class

You can download SHIFT15M dataset as follows:

from shift15.datasets import NumLikesRegression

dataset = NumLikesRegression(root="./data", download=True)

Download directly by using download scripts

Please download the dataset as follows:

$ bash scripts/download_all.sh

To avoid downloading the test dataset for set matching (80GB), which is not required in training, you can use the following script.

$ bash scripts/download_all_wo_set_testdata.sh

Tasks

The following tasks are now available:

Tasks Task type Shift type # of input dim # of output dim
NumLikesRegression regression target shift (N, 25) (N, 1)
SumPricesRegression regression covariate shift, target shift (N, 1) (N, 1)
ItemPriceRegression regression target shift (N, 4096) (N, 1)
ItemCategoryClassification classification target shift (N, 4096) (N, 7)
Set2SetMatching set-to-set matching covariate shift (N, 4096)x(M, 4096) (1)

Benchmarks

As templates for numerical experiments on the SHIFT15M dataset, we have published experimental results for each task with several models.

Original Dataset Structure

The original dataset is maintained in json format, and a row consists of the following:

{
  "user":{"user_id":"xxxx", "fav_brand_ids":"xxxx,xx,..."},
  "like_num":"xx",
  "set_id":"xxx",
  "items":[
    {"price":"xxxx","item_id":"xxxxxx","category_id1":"xx","category_id2":"xxxxx"},
    ...
  ],
  "publish_date":"yyyy-mm-dd"
}

Contributing

To learn more about making a contribution to SHIFT15M, please see the following materials:

License

The dataset itself is provided under a CC BY-NC 4.0 license. On the other hand, the software in this repository is provided under the MIT license.

Dataset metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name SHIFT15M Dataset
alternateName SHIFT15M
alternateName shift15m-dataset
url
sameAs https://github.com/st-tech/zozo-shift15m
description SHIFT15M is a multi-objective, multi-domain dataset which includes multiple dataset shifts.
provider
property value
name ZOZO Research
sameAs https://ja.wikipedia.org/wiki/ZOZO
license
property value
name CC BY-NC 4.0
url

Citation

@misc{Kimura_SHIFT15M_Multiobjective_LargeScale_2021,
author = {Kimura, Masanari and Nakamura, Takuma and Saito, Yuki},
month = {8},
title = {SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts},
year = {2021}
}

Errata

No errata are currently available.

References

  • [1] Gebru, Timnit, et al. "Datasheets for datasets." arXiv preprint arXiv:1803.09010 (2018).
Comments
Releases(v0.2.0)
  • v0.2.0(Sep 20, 2022)

    • add tags info as follows:
    {
      "user":{"user_id":"xxxx", "fav_brand_ids":"xxxx,xx,..."},
      "like_num":"xx",
      "set_id":"xxx",
      "items":[
        {"price":"xxxx","item_id":"xxxxxx","category_id1":"xx","category_id2":"xxxxx"},
        ...
      ],
      "publish_date":"yyyy-mm-dd",
      "tags": "tag_a, tag_b, tag_c, ..."
    }
    
    • add superset matching benchmark
    • fix a label creation bug on set matching with multiple splits
    Source code(tar.gz)
    Source code(zip)
  • v.0.1.2(Nov 24, 2021)

Owner
ZOZO, Inc.
ZOZO, Inc.
Fair Recommendation in Two-Sided Platforms

Fair Recommendation in Two-Sided Platforms

gourabgggg 1 Nov 10, 2021
[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger 🍔 Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

Gsunshine 271 Dec 29, 2022
Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

Pgn2Latex (WIP) A simple script to make pdf from pgn files and studies. It's sti

12 Jul 23, 2022
PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

TUPE PyTorch implementation of Rethinking Positional Encoding in Language Pre-training. Quickstart Clone this repository. git clone https://github.com

Jake Tae 5 Jan 27, 2022
RRxIO - Robust Radar Visual/Thermal Inertial Odometry: Robust and accurate state estimation even in challenging visual conditions.

RRxIO - Robust Radar Visual/Thermal Inertial Odometry RRxIO offers robust and accurate state estimation even in challenging visual conditions. RRxIO c

Christopher Doer 64 Dec 29, 2022
Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Period-alternatives-of-Softmax Experimental Demo for our paper 'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechani

slwang9353 0 Sep 06, 2021
Evolving neural network parameters in JAX.

Evolving Neural Networks in JAX This repository holds code displaying techniques for applying evolutionary network training strategies in JAX. Each sc

Trevor Thackston 6 Feb 12, 2022
A hand tracking demo made with mediapipe where you can control lights with pinching your fingers and moving your hand up/down.

HandTrackingBrightnessControl A hand tracking demo made with mediapipe where you can control lights with pinching your fingers and moving your hand up

Teemu Laurila 19 Feb 12, 2022
Hide screen when boss is approaching.

BossSensor Hide your screen when your boss is approaching. Demo The boss stands up. He is approaching. When he is approaching, the program fetches fac

Hiroki Nakayama 6.2k Jan 07, 2023
A project for developing transformer-based models for clinical relation extraction

Clinical Relation Extration with Transformers Aim This package is developed for researchers easily to use state-of-the-art transformers models for ext

uf-hobi-informatics-lab 101 Dec 19, 2022
The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Box-Aware Tracker (BAT) Pytorch-Lightning implementation of the Box-Aware Tracker. Box-Aware Feature Enhancement for Single Object Tracking on Point C

Kangel Zenn 5 Mar 26, 2022
Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks

Introduction This repository contains the modified caffe library and network architectures for our paper "Automated Melanoma Recognition in Dermoscopy

Lequan Yu 47 Nov 24, 2022
NeurIPS 2021, "Fine Samples for Learning with Noisy Labels"

[Official] FINE Samples for Learning with Noisy Labels This repository is the official implementation of "FINE Samples for Learning with Noisy Labels"

mythbuster 27 Dec 23, 2022
SenseNet is a sensorimotor and touch simulator for deep reinforcement learning research

SenseNet is a sensorimotor and touch simulator for deep reinforcement learning research

59 Feb 25, 2022
This repo is to be freely used by ML devs to check the GAN performances without coding from scratch.

GANs for Fun Created because I can! GOAL The goal of this repo is to be freely used by ML devs to check the GAN performances without coding from scrat

Sagnik Roy 13 Jan 26, 2022
LBK 20 Dec 02, 2022
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization Official PyTorch implementation for our URST (Ultra-Resolution Sty

czczup 148 Dec 27, 2022
CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

Frederick Wang 3 Apr 26, 2022
Code for weakly supervised segmentation of a single class

SingleClassRL Implementation of weak single object segmentation from paper "Regularized Loss for Weakly Supervised Single Class Semantic Segmentation"

16 Nov 14, 2022
AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models

AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models Description

Angel de Paula 0 Jun 08, 2022