[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Last update: Dec 08, 2022

Related tags

Overview

On Sampling Collaborative Filtering Datasets

This repository contains the implementation of many popular sampling strategies, along with various explicit/implicit/sequential feedback recommendation algorithms. The code accompanies the paper "On Sampling Collaborative Filtering Datasets" [ACM] [Public PDF] where we compare the utility of different sampling strategies for preserving the performance of various recommendation algorithms.

We also provide code for Data-Genie which can automatically predict the performance of how good any sampling strategy will be for a given collaborative filtering dataset. We refer the reader to the full paper for more details. Kindly send me an email if you're interested in obtaining access to the pre-trained weights of Data-Genie.

If you find any module of this repository helpful for your own research, please consider citing the below WSDM'22 paper. Thanks!

@inproceedings{sampling_cf,
  author = {Noveen Sachdeva and Carole-Jean Wu and Julian McAuley},
  title = {On Sampling Collaborative Filtering Datasets},
  url = {https://doi.org/10.1145/3488560.3498439},
  booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
  series = {WSDM '22},
  year = {2022}
}

Code Author: Noveen Sachdeva ([email protected])

Setup

Environment Setup

$ pip install -r requirements.txt

Data Setup

Once you've correctly setup the python environments and downloaded the dataset of your choice (Amazon: http://jmcauley.ucsd.edu/data/amazon/), the following steps need to be run:

The following command will create the required data/experiment directories as well as download & preprocess the Amazon magazine and the MovieLens-100K datasets. Feel free to download more datasets from the following web-page http://jmcauley.ucsd.edu/data/amazon/ and adjust the setup.sh and preprocess.py files accordingly.

$ ./setup.sh

How to train a model on a sampled/complete CF-dataset?

Edit the hyper_params.py file which lists all config parameters, including what type of model to run. Currently supported models:

Sampling Strategy	What is sampled?	Paper Link
Random	Interactions
Stratified	Interactions
Temporal	Interactions
SVP-CF w/ MF	Interactions	LINK & LINK
SVP-CF w/ Bias-only	Interactions	LINK & LINK
SVP-CF-Prop w/ MF	Interactions	LINK & LINK
SVP-CF-Prop w/ Bias-only	Interactions	LINK & LINK
Random	Users
Head	Users
SVP-CF w/ MF	Users	LINK & LINK
SVP-CF w/ Bias-only	Users	LINK & LINK
SVP-CF-Prop w/ MF	Users	LINK & LINK
SVP-CF-Prop w/ Bias-only	Users	LINK & LINK
Centrality	Graph	LINK
Random-Walk	Graph	LINK
Forest-Fire	Graph	LINK

Finally, type the following command to run:

$ CUDA_VISIBLE_DEVICES=<SOME_GPU_ID> python main.py

Alternatively, to train various possible recommendation algorithm on various CF datasets/subsets, please edit the configuration in grid_search.py and then run:

$ python grid_search.py

How to train Data-Genie?

Edit the data_genie/data_genie_config.py file which lists all config parameters, including what datasets/CF-scenarios/samplers etc. to train Data-Genie on
Finally, use the following command to train Data-Genie:

$ python data_genie.py

License

MIT

[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Related tags

Overview

On Sampling Collaborative Filtering Datasets

Setup

Environment Setup

Data Setup

How to train a model on a sampled/complete CF-dataset?

How to train Data-Genie?

License

Owner

Noveen Sachdeva

S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

DTCN IJCAI - Sequential prediction learning framework and algorithm

(ICCV 2021) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing."

Submodular Subset Selection for Active Domain Adaptation (ICCV 2021)

Convert Mission Planner (ArduCopter) Waypoint Missions to Litchi CSV Format to execute on DJI Drones

Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

NeoPlay is the project dedicated to ESport events.

Implements pytorch code for the Accelerated SGD algorithm.

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

Custom TensorFlow2 implementations of forward and backward computation of soft-DTW algorithm in batch mode.

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

Fast, accurate and reliable software for algebraic CT reconstruction

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

《Rethinking Sptil Dimensions of Vision Trnsformers》(2021)

Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

Official Implementation of "Third Time's the Charm? Image and Video Editing with StyleGAN3" https://arxiv.org/abs/2201.13433

[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Related tags

Overview

On Sampling Collaborative Filtering Datasets

Setup

Environment Setup

Data Setup

How to train a model on a sampled/complete CF-dataset?

How to train Data-Genie?

License

Owner

Noveen Sachdeva

S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

DTCN IJCAI - Sequential prediction learning framework and algorithm

(ICCV 2021) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing."

Submodular Subset Selection for Active Domain Adaptation (ICCV 2021)

Convert Mission Planner (ArduCopter) Waypoint Missions to Litchi CSV Format to execute on DJI Drones

Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

NeoPlay is the project dedicated to ESport events.

Implements pytorch code for the Accelerated SGD algorithm.

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

Custom TensorFlow2 implementations of forward and backward computation of soft-DTW algorithm in batch mode.

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

Fast, accurate and reliable software for algebraic CT reconstruction

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

《Rethinking Sptil Dimensions of Vision Trnsformers》(2021)

Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

Official Implementation of "Third Time's the Charm? Image and Video Editing with StyleGAN3" https://arxiv.org/abs/2201.13433

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务