This is the repository of our article published on MDPI Entropy "Feature Selection for Recommender Systems with Quantum Computing".

Related tags

Deep LearningCQFS
Overview

Collaborative-driven Quantum Feature Selection

This repository was developed by Riccardo Nembrini, PhD student at Politecnico di Milano. See the websites of our quantum computing group and of our recommender systems group for more information on our teams and works. This repository contains the source code for the article "Feature Selection for Recommender Systems with Quantum Computing".

Here we explain how to install dependencies, setup the connection to D-Wave Leap quantum cloud services and how to run experiments included in this repository.

Installation

NOTE: This repository requires Python 3.7

It is suggested to install all the required packages into a new Python environment. So, after repository checkout, enter the repository folder and run the following commands to create a new environment:

If you're using virtualenv:

virtualenv -p python3 cqfs
source cqfs/bin/activate

If you're using conda:

conda create -n cqfs python=3.7 anaconda
conda activate cqfs

Remember to add this project in the PYTHONPATH environmental variable if you plan to run the experiments on the terminal:

export PYTHONPATH=$PYTHONPATH:/path/to/project/folder

Then, make sure you correctly activated the environment and install all the required packages through pip:

pip install -r requirements.txt

After installing the dependencies, it is suggested to compile Cython code in the repository.

In order to compile you must first have installed: gcc and python3 dev. Under Linux those can be installed with the following commands:

sudo apt install gcc 
sudo apt-get install python3-dev

If you are using Windows as operating system, the installation procedure is a bit more complex. You may refer to THIS guide.

Now you can compile all Cython algorithms by running the following command. The script will compile within the current active environment. The code has been developed for Linux and Windows platforms. During the compilation you may see some warnings.

python run_compile_all_cython.py

D-Wave Setup

In order to make use of D-Wave cloud services you must first sign-up to D-Wave Leap and get your API token.

Then, you need to run the following command in the newly created Python environment:

dwave setup

This is a guided setup for D-Wave Ocean SDK. When asked to select non-open-source packages to install you should answer y and install at least D-Wave Drivers (the D-Wave Problem Inspector package is not required, but could be useful to analyse problem solutions, if solving problems with the QPU only).

Then, continue the configuration by setting custom properties (or keeping the default ones, as we suggest), apart from the Authentication token field, where you should paste your API token obtained on the D-Wave Leap dashboard.

You should now be able to connect to D-Wave cloud services. In order to verify the connection, you can use the following command, which will send a test problem to D-Wave's QPU:

dwave ping

Running CQFS Experiments

First of all, you need to prepare the original files for the datasets.

For The Movies Dataset you need to download The Movies Dataset from Kaggle and place the compressed files in the directory recsys/Data_manager_offline_datasets/TheMoviesDataset/, making sure the file is called the-movies-dataset.zip.

For CiteULike_a you need to download the following .zip file and place it in the directory recsys/Data_manager_offline_datasets/CiteULike/, making sure the file is called CiteULike_a_t.zip.

We cannot provide data for Xing Challenge 2017, but if you have the dataset available, place the compressed file containing the dataset's original files in the directory recsys/Data_manager_offline_datasets/XingChallenge2017/, making sure the file is called xing_challenge_data_2017.zip.

After preparing the datasets, you should run the following command under the data directory:

python split_NameOfTheDataset.py

This python script will generate the data splits used in the experiments. Moreover, it will preprocess the dataset and check for any error in the preprocessing phase. The resulting splits are saved in the recsys/Data_manager_split_datasets directory.

After splitting the dataset, you can actually run the experiments. All the experiment scripts are in the experiments directory, so enter this folder first. Each dataset has separated experiment scripts that you can find in the corresponding directories. From now on, we will assume that you are running the following commands in the dataset-specific folders, thus running the scripts contained there.

Collaborative models

First of all, we need to optimize the chosen collaborative models to use with CQFS. To do so, run the following command:

python CollaborativeFiltering.py

The resulting models will be saved into the results directory.

CQFS

Then, you can run the CQFS procedure. We divided the procedure into a selection phase and a recommendation phase. To perform the selection through CQFS run the following command:

python CQFS.py

This script will solve the CQFS problem on the corresponding dataset and save all the selected features in appropriate subdirectories under the results directory.

After solving the feature selection problem, you should run the following command:

python CQFSTrainer.py

This script will optimize an ItemKNN content-based recommender system for each selection corresponding to the given hyperparameters (and previously obtained through CQFS), using only the selected features. Again, all the results are saved in the corresponding subdirectories under the results directory.

NOTE: Each selection with D-Wave Leap hybrid service on these problems is performed in around 8 seconds for The Movies Dataset and around 30 for CiteULike_a. Therefore, running the script as it is would result in consuming all the free time given with the developer plan on D-Wave Leap and may result in errors or invalid selections when there's no free time remaining.

We suggest to reduce the number of hyperparameters passed when running the experiments or, even better, chose a collaborative model and perform all the experiments on it.

This is not the case when running experiments with Simulated Annealing, since it is executed locally.

For Xing Challenge 2017 experiments run directly on the D-Wave QPU. Leaving all the hyperparameters unchanged, all the experiments should not exceed the free time of the developer plan. Pay attention when increasing the number of reads from the sampler or the annealing time.

Baselines

In order to obtain the baseline evaluations you can run the corresponding scripts with the following commands:

# ItemKNN content-based with all the features
python baseline_CBF.py

# ItemKNN content-based with features selected through TF-IDF
python baseline_TFIDF.py

# CFeCBF feature weighting baseline
python baseline_CFW.py

Acknowledgements

Software produced by Riccardo Nembrini. Recommender systems library by Maurizio Ferrari Dacrema.

Article authors: Riccardo Nembrini, Maurizio Ferrari Dacrema, Paolo Cremonesi

Owner
Quantum Computing Lab @ Politecnico di Milano
Quantum Machine Learning group
Quantum Computing Lab @ Politecnico di Milano
Galileo library for large scale graph training by JD

近年来,图计算在搜索、推荐和风控等场景中获得显著的效果,但也面临超大规模异构图训练,与现有的深度学习框架Tensorflow和PyTorch结合等难题。 Galileo(伽利略)是一个图深度学习框架,具备超大规模、易使用、易扩展、高性能、双后端等优点,旨在解决超大规模图算法在工业级场景的落地难题,提

JD Galileo Team 128 Nov 29, 2022
A compendium of useful, interesting, inspirational usage of pandas functions, each example will be an ipynb file

Pandas_by_examples A compendium of useful/interesting/inspirational usage of pandas functions, each example will be an ipynb file What is this reposit

Guangyuan(Frank) Li 32 Nov 20, 2022
TSIT: A Simple and Versatile Framework for Image-to-Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation This repository provides the official PyTorch implementation for the following p

Liming Jiang 255 Nov 23, 2022
Facilitates implementing deep neural-network backbones, data augmentations

Introduction Nowadays, the training of Deep Learning models is fragmented and unified. When AI engineers face up with one specific task, the common wa

40 Dec 29, 2022
PyTea: PyTorch Tensor shape error analyzer

PyTea: PyTorch Tensor Shape Error Analyzer paper project page Requirements node.js = 12.x python = 3.8 z3-solver = 4.8 How to install and use # ins

ROPAS Lab. 240 Jan 02, 2023
PyTorch implementation of adversarial patch

adversarial-patch PyTorch implementation of adversarial patch This is an implementation of the Adversarial Patch paper. Not official and likely to hav

Jamie Hayes 172 Nov 29, 2022
Fast and Easy Infinite Neural Networks in Python

Neural Tangents ICLR 2020 Video | Paper | Quickstart | Install guide | Reference docs | Release notes Overview Neural Tangents is a high-level neural

Google 1.9k Jan 09, 2023
TensorFlow for Raspberry Pi

TensorFlow on Raspberry Pi It's officially supported! As of TensorFlow 1.9, Python wheels for TensorFlow are being officially supported. As such, this

Sam Abrahams 2.2k Dec 16, 2022
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Unified Multi-modal Transformers This repository maintains the official implementation of the paper UMT: Unified Multi-modal Transformers for Joint Vi

Applied Research Center (ARC), Tencent PCG 84 Jan 04, 2023
General-purpose program synthesiser

DeepSynth General-purpose program synthesiser. This is the repository for the code of the paper "Scaling Neural Program Synthesis with Distribution-ba

Nathanaël Fijalkow 24 Oct 23, 2022
A Flow-based Generative Network for Speech Synthesis

WaveGlow: a Flow-based Generative Network for Speech Synthesis Ryan Prenger, Rafael Valle, and Bryan Catanzaro In our recent paper, we propose WaveGlo

NVIDIA Corporation 2k Dec 26, 2022
Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features"

EDM-subgenre-classifier This repository contains the code for "Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Fea

11 Dec 20, 2022
An Unsupervised Graph-based Toolbox for Fraud Detection

An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s

SafeGraph 99 Dec 11, 2022
MediaPipeのPythonパッケージのサンプルです。2020/12/11時点でPython実装のある4機能(Hands、Pose、Face Mesh、Holistic)について用意しています。

mediapipe-python-sample MediaPipeのPythonパッケージのサンプルです。 2020/12/11時点でPython実装のある以下4機能について用意しています。 Hands Pose Face Mesh Holistic Requirement mediapipe 0.

KazuhitoTakahashi 217 Dec 12, 2022
Code for "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans" CVPR 2021 best paper candidate

News 05/17/2021 To make the comparison on ZJU-MoCap easier, we save quantitative and qualitative results of other methods at here, including Neural Vo

ZJU3DV 748 Jan 07, 2023
Fast and customizable reconnaissance workflow tool based on simple YAML based DSL.

Fast and customizable reconnaissance workflow tool based on simple YAML based DSL, with support of notifications and distributed workload of that work

Américo Júnior 3 Mar 11, 2022
Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

Peidong Liu(刘沛东) 54 Dec 17, 2022
Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Deep Reinforcement Learning for Smart Cities Documentation RLlib: https://docs.ray.io/en/master/rllib.html Mesa: https://mesa.readthedocs.io/en/stable

1 May 15, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 190 Dec 27, 2022
Matthew Colbrook 1 Apr 08, 2022