A library of metrics for evaluating recommender systems

Overview

recmetrics

A python library of evalulation metrics and diagnostic tools for recommender systems.

**This library is activly maintained. My goal is to continue to develop this as the main source of recommender metrics in python. Please submit issues, bug reports, feature requests or controbute directly through a pull request. If I do not respond you can ping me directly at [email protected] **

Description Command
Installation pip install recmetrics
Notebook Demo make run_demo
Test make test

Full documentation coming soon.... In the interm, the python notebook in this repo, example.ipynb, contains examples of these plots and metrics in action using the MovieLens 20M Dataset. You can also view my Medium Article.

This library is an open source project. The goal is to create a go-to source for metrics related to recommender systems. I have begun by adding metrics and plots I found useful during my career as a Data Scientist at a retail company, and encourage the community to contribute. If you would like to see a new metric in this package, or find a bug, or have suggestions for improvement, please contribute!

Long Tail Plot

recmetrics.long_tail_plot()

The Long Tail plot is used to explore popularity patterns in user-item interaction data. Typically, a small number of items will make up most of the volume of interactions and this is referred to as the "head". The "long tail" typically consists of most products, but make up a small percent of interaction volume.

Long Tail Plot

The items in the "long tail" usually do not have enough interactions to accurately be recommended using user-based recommender systems like collaborative filtering due to inherent popularity bias in these models and data sparsity. Many recommender systems require a certain level of sparsity to train. A good recommender must balance sparsity requirements with popularity bias.

[email protected] and [email protected]

recmetrics.mark()

recmetrics.mark_plot()

recmetrics.mapk_plot()

Mean Average Recall at K ([email protected]) measures the recall at the kth recommendations. [email protected] considers the order of recommendations, and penalizes correct recommendations if based on the order of the recommendations. [email protected] and [email protected] are ideal for evaluating an ordered list of recommendations. There is a fantastic implmentation of Mean Average Precision at K ([email protected]) available here, so I have not included it in this repo.

Mar@k

[email protected] and [email protected] metrics suffer from popularity bias. If a model works well on popular items, the majority of recommendations will be correct, and [email protected] and [email protected] can appear to be high while the model may not be making useful or personalized recommendations.

Coverage

recmetrics.prediction_coverage()

recmetrics.catalog_coverage()

recmetrics.coverage_plot()

Coverage is the percent of items that the recommender is able to recommend. It referred as prediction coverage and it's depicted by the next formula.

Coverage Equation

Where 'I' is the number of unique items the model recommends in the test data, and 'N' is the total number of unique items in the training data. The catalog coverage is the rate of distinct items recommended over a period of time to the user. For this purpose the catalog coverage function take also as parameter 'k' the number of observed recommendation lists. In essence, both of metrics quantify the proportion of items that the system is able to work with.

Coverage Plot

Novelty

recmetrics.novelty()

Novelty measures the capacity of recommender system to propose novel and unexpected items which a user is unlikely to know about already. It uses the self-information of the recommended item and it calculates the mean self-information per top-N recommended list and averages them over all users.

Coverage Equation

Where the absolute U is the number of users, count(i) is the number of users consumed the specific item and N is the length of recommended list.

Personalization

recmetrics.personalization()

Personalization is the dissimilarity between user's lists of recommendations. A high score indicates user's recommendations are different). A low personalization score indicates user's recommendations are very similar.

For example, if two users have recommendations lists [A,B,C,D] and [A,B,C,Y], the personalization can be calculated as:

Coverage Plot

Intra-list Similarity

recmetrics.intra_list_similarity()

Intra-list similarity uses a feature matrix to calculate the cosine similarity between the items in a list of recommendations. The feature matrix is indexed by the item id and includes one-hot-encoded features. If a recommender system is recommending lists of very similar items, the intra-list similarity will be high.

Coverage Plot

Coverage Plot

MSE and RMSE

recmetrics.mse()
recmetrics.rmse()

Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) are used to evaluate the accuracy of predicted values yhat such as ratings compared to the true value, y. These can also be used to evalaute the reconstruction of a ratings matrix.

MSE Equation

RMSE Equation

Predicted Class Probability Distribution Plots

recmetrics.class_separation_plot()

This is a plot of the distribution of the predicted class probabilities from a classification model. The plot is typically used to visualize how well a model is able to distinguish between two classes, and can assist a Data Scientist in picking the optimal decision threshold to classify observations to class 1 (0.5 is usually the default threshold for this method). The color of the distribution plots represent true class 0 and 1, and everything to the right of the decision threshold is classified as class 0.

binary class probs

This plot can also be used to visualize the recommendation scores in two ways.

In this example, and item is considered class 1 if it is rated more than 3 stars, and class 0 if it is not. This example shows the performance of a model that recommends an item when the predicted 5-star rating is greater than 3 (plotted as a vertical decision threshold line). This plot shows that the recommender model will perform better if items with a predicted rating of 3.5 stars or greater is recommended.

ratings scores

The raw predicted 5 star rating for all recommended movies could be visualized with this plot to see the optimal predicted rating score to threshold into a prediction of that movie. This plot also visualizes how well the model is able to distinguish between each rating value.

ratings distributions

ROC and AUC

recmetrics.roc_plot()

The Receiver Operating Characteristic (ROC) plot is used to visualize the trade-off between true positives and false positives for binary classification. The Area Under the Curve (AUC) is sometimes used as an evaluation metrics.

ROC

Recommender Precision and Recall

recmetrics.recommender_precision()
recmetrics.recommender_recall()

Recommender precision and recall uses all recommended items over all users to calculate traditional precision and recall. A recommended item that was actually interacted with in the test data is considered an accurate prediction, and a recommended item that is not interacted with, or received a poor interaction value, can be considered an inaccurate recommendation. The user can assign these values based on their judgment.

Precision and Recall Curve

recmetrics.precision_recall_plot()

The Precision and Recall plot is used to visualize the trade-off between precision and recall for one class in a classification.

PandRcurve

Confusion Matrix

recmetrics.make_confusion_matrix()

Traditional confusion matrix used to evaluate false positive and false negative trade-offs.

PandRcurve

Comments
  • Unable to import recmetrics

    Unable to import recmetrics

    I am working on a recommendation engine using collaborative filtering and wanted to try the metrics provided by recmetrics. Here, the error I get trying to import the package (version 0.0.12).

    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-309-301854677c00> in <module>
    ----> 1 import recmetrics
          2 
          3 recmetrics.long_tail_plot()
    
    ~/.virtualenvs/py3/lib/python3.6/site-packages/recmetrics/__init__.py in <module>
    ----> 1 from .plots import long_tail_plot, mark_plot, mapk_plot, coverage_plot, class_separation_plot, roc_plot, precision_recall_plot
          2 from .metrics import mark, coverage, personalization, intra_list_similarity, rmse, mse, make_confusion_matrix, recommender_precision, recommender_recall
    
    ~/.virtualenvs/py3/lib/python3.6/site-packages/recmetrics/plots.py in <module>
          5 from matplotlib.lines import Line2D
          6 from sklearn.metrics import roc_curve, auc, precision_recall_curve, average_precision_score
    ----> 7 from sklearn.utils.fixes import signature
          8 
          9 
    
    ImportError: cannot import name 'signature'
    
    bug 
    opened by kleekaai 3
  • Unused Requirement

    Unused Requirement

    Surprise is listed as a module dependency but is not used in metrics or plots. Might be worth removing the dependency - especially since it requires additional built tools (Visual C++) and thus may throw unnecessary errors.

    opened by VedantVarshney 2
  • module 'recmetrics' has no attribute 'prediction_coverage'

    module 'recmetrics' has no attribute 'prediction_coverage'

    Hi there I am trying to run example notebook. But I am getting 'module 'recmetrics' has no attribute 'prediction_coverage'' and "attribute error: module 'recmetrics' has no attribute 'catalog_coverage'"

    any pointer or suggestion.

    Thanks in advance

    opened by rhkaz 2
  • TypeError on class_separation_plot of example notebook

    TypeError on class_separation_plot of example notebook

    I attached the error below

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-30-05160122655c> in <module>
    ----> 1 recmetrics.class_separation_plot(pred_df, n_bins=45, class0_label="True class 0", class1_label="True class 1")
    
    TypeError: class_separation_plot() got an unexpected keyword argument 'class0_label'
    
    opened by itsoum 2
  • License

    License

    This is missing a license. You can use https://tldrlegal.com/ for an overview. The top-3 are MIT, BSD and GPL (see my analysis).

    The simplest way to add it is in the setup.py as license='MIT' or similar.

    opened by MartinThoma 2
  • Is surprise really required?

    Is surprise really required?

    First of all: this package looks great! It's exactly what I need for some small projects, so thanks for putting it out there!

    I'm looking at the setup.py, and it lists surprise as a requirement. I don't see it imported anywhere in the package though, so I'm wondering if it can be removed? I get that it's useful for the example notebook, but that wouldn't be included in the pip install anyway. (I might suggest making surprise an extras_require if you want to keep it in there for demo purposes.)

    If you're open to some packaging changes along these lines, I'd be happy to send a PR your way.

    help wanted good first issue 
    opened by bmcfee 2
  • Fix 35/optimize personalization calculation

    Fix 35/optimize personalization calculation

    This relates to #35. As the cosine similarity metric is symmetric, we don't need the upper triangle indices to calculate the mean of the matrix. Just subtract the diagonal (all ones) and divide by the number of distances (without the diagonal). This way the performance is increased and is noticeable on matrices over 50k x 50k. All tests passed. Performance before and after the modification (skipping make_rec_matrix): performance

    opened by ibuda 1
  • Personalization metric calculation optimization

    Personalization metric calculation optimization

    Hi @statisticianinstilettos,

    kudos for a great tool! I would like to propose an optimization for calculating Personalization Metric here:

    #get indicies for upper right triangle w/o diagonal
    upper_right = np.triu_indices(similarity.shape[0], k=1)
    
    #calculate average similarity
    personalization = np.mean(similarity[upper_right])
    return 1-personalization
    

    There is no need to get the upper triangle indices, as the cosine similarity is a symmetric distance. I will follow up with a pull request for this.

    opened by ibuda 1
  • RecMetrics Revisions

    RecMetrics Revisions

    Description

    This pull request is designed to introduce reproducibility and maintainability across the RecMetrics library.

    • Test coverage for metrics and plots scripts.
    • Create Docker images for RecMetrics development and notebook demo
    • Additional Makefile commands
      • build - Create RecMetrics Docker image (Development)
      • build_demo - Build RecMetrics Docker image (Demo)
      • clean - Remove files from repo
      • download_movielens - Download MovieLens data to repo
      • run_demo - Run RecMetrics Docker image (Demo)
      • test - Test RecMetrics Docker image
    • Type hinting for all functions
    • Support for Poetry
      • Used within both Dockerfiles
    • Fix metrics_plot function
      • Now displays output in Jupyter notebook

    Fixes # (issue)

    • #2 - Is Surprise required?
    • #14 - TypeError on class_separation_plot of example notebook
    • #23 - ImportError: cannot import name 'signature'
    • #25 - module 'recmetrics' has no attribute 'prediction_coverage'

    Type of change

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [x] This change requires a documentation update

    How Has This Been Tested?

    All tests now run within a Docker container. Current test coverage is in the following scripts:

    • test_metrics.py
    • test_plots.py

    Notes

    • Building a new package for PyPI has not been tested.
    • Surprise is included in the Poetry files, and is installed for both Docker images.
    • Tests within test_plots.py assume visualizations are correct; per the references, visualizations can be difficult to test.
    • In the future, there's potential to automate tests with GitHub Actions.

    References

    opened by gregwchase 1
  • fix setup requires error

    fix setup requires error

    add plotly in setup.py as install_requires

    • error
    pip3 install git+https://github.com/statisticianinstilettos/recmetrics.git
    
    python3
    >>> import recmetrics
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/uni/Library/Python/3.7/lib/python/site-packages/recmetrics/__init__.py", line 1, in <module>
        from .plots import long_tail_plot, mark_plot, mapk_plot, coverage_plot, class_separation_plot, roc_plot, precision_recall_plot
      File "/Users/uni/Library/Python/3.7/lib/python/site-packages/recmetrics/plots.py", line 7, in <module>
        import plotly.graph_objects as go
    ModuleNotFoundError: No module named 'plotly'
    
    opened by uni-3 1
  • Minor fixes on example jupyter notebook

    Minor fixes on example jupyter notebook

    1. I add brackets in the python print
    2. I changed the first parameter in the coverage function calls because it was wrong.
    3. If the prediction and catalog coverages are accepted we should change also the function name in the calls.
    opened by itsoum 1
  • Implement MAP@k

    Implement [email protected]

    [email protected] implementation linked in the documentation (https://github.com/benhamner/Metrics) has not been updated for 7 years and has bugs in [email protected] implementation (e.g. https://github.com/benhamner/Metrics/issues/51, https://github.com/benhamner/Metrics/issues/57). It would be really useful to have [email protected] implementation in recmetrics. Would it be possible to implement it? It would be almost identical to the existing mark() function.

    opened by j-adamczyk 1
  • Integration with Deep Learning Based Frameworks

    Integration with Deep Learning Based Frameworks

    Is there any way to integrate this with recommender system frameworks that involve more deep learning-based algorithms such as PyTorch etc.? Sci-Kit Learn's with Surprise doesn't really support such algorithms

    opened by agb2k 0
  • Coverage over 100%

    Coverage over 100%

    In the example bellow, the coverage measured exceeds 100%, which does not make sense.

    This happens when items that are not listed on the catalog are recommended.

    > from rcmetrics import prediction_coverage
    > prediction_coverage([['x', 'y'], ['w', 'z']], catalog=['w', 'x', 'y'])
    133.33
    
    opened by vascosmota 2
  • personalization() has explosive memory requirements due to pairwise comparison

    personalization() has explosive memory requirements due to pairwise comparison

    On my system (16gb ram), a list of 10k recommendations will run. A list of 50k will crash out. I'd like to try to understand the personalization score across my entire hypothetical customer base 250k+.

    Is there a way to chunk the scipy.sparse.csr_matrix and iteratively calculate the cosine similarity to avoid holding the whole thing in memory?

    opened by ahgraber 0
  • Installation issues

    Installation issues

    Hi! Have been trying to install recmetrics with "pip install recmetrcis", keep getting an error "ERROR: Could not build wheels for scikit-learn, which is required to install pyproject.toml-based projects". I'm using Windows, Python version 3.9.7, pip all upgraded. pip freeze shows that scikit-learn is actually already installed: "scikit-learn==0.24.2". I've also tried installing with pip from git, same result. Any ideas what I could still try?

    opened by Erin59 5
Releases(v0.1.5)
Owner
Claire Longo
Full Stack Data Scientist/Machine Learning Engineer
Claire Longo
RetaGNN: Relational Temporal Attentive Graph Neural Networks for Holistic Sequential Recommendation

RetaGNN: Relational Temporal Attentive Graph Neural Networks for Holistic Sequential Recommendation Pytorch based implemention of Relational Temporal

28 Dec 28, 2022
Deep recommender models using PyTorch.

Spotlight uses PyTorch to build both deep and shallow recommender models. By providing both a slew of building blocks for loss functions (various poin

Maciej Kula 2.8k Dec 29, 2022
Codes for AAAI'21 paper 'Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation'

DHCN Codes for AAAI 2021 paper 'Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation'. Please note that the default link

Xin Xia 124 Dec 14, 2022
Collaborative variational bandwidth auto-encoder (VBAE) for recommender systems.

Collaborative Variational Bandwidth Auto-encoder The codes are associated with the following paper: Collaborative Variational Bandwidth Auto-encoder f

Yaochen Zhu 14 Dec 11, 2022
Elliot is a comprehensive recommendation framework that analyzes the recommendation problem from the researcher's perspective.

Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation

Information Systems Lab @ Polytechnic University of Bari 215 Nov 29, 2022
大规模推荐算法库,包含推荐系统经典及最新算法LR、Wide&Deep、DSSM、TDM、MIND、Word2Vec、DeepWalk、SSR、GRU4Rec、Youtube_dnn、NCF、GNN、FM、FFM、DeepFM、DCN、DIN、DIEN、DLRM、MMOE、PLE、ESMM、MAML、xDeepFM、DeepFEFM、NFM、AFM、RALM、Deep Crossing、PNN、BST、AutoInt、FGCNN、FLEN、ListWise等

(中文文档|简体中文|English) 什么是推荐系统? 推荐系统是在互联网信息爆炸式增长的时代背景下,帮助用户高效获得感兴趣信息的关键; 推荐系统也是帮助产品最大限度吸引用户、留存用户、增加用户粘性、提高用户转化率的银弹。 有无数优秀的产品依靠用户可感知的推荐系统建立了良好的口碑,也有无数的公司依

3.6k Dec 30, 2022
Recommendation System to recommend top books from the dataset

recommendersystem Recommendation System to recommend top books from the dataset Introduction The recom.py is the main program code. The dataset is als

Vishal karur 1 Nov 15, 2021
Movie Recommender System

Movie-Recommender-System Movie-Recommender-System is a web application using which a user can select his/her watched movie from list and system will r

1 Jul 14, 2022
Spark-movie-lens - An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

A scalable on-line movie recommender using Spark and Flask This Apache Spark tutorial will guide you step-by-step into how to use the MovieLens datase

Jose A Dianes 794 Dec 23, 2022
[ICDMW 2020] Code and dataset for "DGTN: Dual-channel Graph Transition Network for Session-based Recommendation"

DGTN: Dual-channel Graph Transition Network for Session-based Recommendation This repository contains PyTorch Implementation of ICDMW 2020 (NeuRec @ I

Yujia 25 Nov 17, 2022
A recommendation system for suggesting new books given similar books.

Book Recommendation System A recommendation system for suggesting new books given similar books. Datasets Dataset Kaggle Dataset Notebooks goodreads-E

Sam Partee 2 Jan 06, 2022
A TensorFlow recommendation algorithm and framework in Python.

TensorRec A TensorFlow recommendation algorithm and framework in Python. NOTE: TensorRec is not under active development TensorRec will not be receivi

James Kirk 1.2k Jan 04, 2023
Pytorch domain library for recommendation systems

TorchRec (Experimental Release) TorchRec is a PyTorch domain library built to provide common sparsity & parallelism primitives needed for large-scale

Meta Research 1.3k Jan 05, 2023
A Python scikit for building and analyzing recommender systems

Overview Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data. Surprise was designed with th

Nicolas Hug 5.7k Jan 01, 2023
Code for MB-GMN, SIGIR 2021

MB-GMN Code for MB-GMN, SIGIR 2021 For Beibei data, run python .\labcode.py For Tmall data, run python .\labcode.py --data tmall --rank 2 For IJCAI

32 Dec 04, 2022
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Recommendation engines are one of the most well known, widely used and highest value use cases for applying machine learning. Despite this, while there are many resources available for the basics of

International Business Machines 793 Dec 18, 2022
A Library for Field-aware Factorization Machines

Table of Contents ================= - What is LIBFFM - Overfitting and Early Stopping - Installation - Data Format - Command Line Usage - Examples -

1.6k Dec 05, 2022
reXmeX is recommender system evaluation metric library.

A general purpose recommender metrics library for fair evaluation.

AstraZeneca 258 Dec 22, 2022
Hierarchical Fashion Graph Network for Personalized Outfit Recommendation, SIGIR 2020

hierarchical_fashion_graph_network This is our Tensorflow implementation for the paper: Xingchen Li, Xiang Wang, Xiangnan He, Long Chen, Jun Xiao, and

LI Xingchen 70 Dec 05, 2022
Spotify API Recommnder System

This project will access your last listened songs on Spotify using its API, then it will request the user to select 5 favorite songs in that list, on which the API will proceed to make 50 recommendat

Kevin Luke 1 Dec 14, 2021