当前位置:网站首页>Order based evaluation index (especially for recommendation system and multi label learning)
Order based evaluation index (especially for recommendation system and multi label learning)
2022-07-26 09:18:00 【Min fan】
Abstract : Some learners output real prediction for recommendation system or multi label learning . Such as , Forecast No i i i Users to j j j The score of items is 4.2 4.2 4.2, Or predict the number i i i Of the sample j j j The probability of positive labels is 0.46 0.46 0.46. How to evaluate the effectiveness of prediction ? This paper describes several evaluation indexes based on order (Ranking-based evaluation measures) Motivation and physical meaning of .
1. Non order based evaluation index
This section describes several non order based evaluation indicators , And point out its defects .
1.1 Mean absolute error (MAE)
Let the actual score be r i j r_{ij} rij, The predicted score is r ^ i j \hat{r}_{ij} r^ij, Unknown score ( What needs to be predicted ) user - Item set is Ω \Omega Ω, be
M A E = ∑ ( i , j ) ∈ Ω ∣ r i j − r ^ i j ∣ / ∣ Ω ∣ (1) MAE = \sum_{(i, j) \in \Omega} \vert r_{ij} - \hat{r}_{ij}\vert / |\Omega|\tag{1} MAE=(i,j)∈Ω∑∣rij−r^ij∣/∣Ω∣(1)
It represents the absolute difference between the predicted score and the actual score .
advantage : Simple and direct .
defects : Suppose you recommend it regularly for each user 10 10 10 A project . It is easy to cite such counter examples : Put the user's favorite 10 10 10 Projects are at the top ( The recommended effect is perfect ), But the error is great , Such as : The real score is 5, But the prediction score is only 3.6–3.9 ( The prediction scores of other projects are less than 3.6). Such a counterexample can even be cited : MAE Not bad , But the recommended list is not good ( Put users' favorite , The score is 5 The project forecast of is 4.4 branch ; But users like it for the first time , The score is 4 The project forecast of is 4.5 branch ).
1.2 Root squre mean error (RSME)
And MAE Empathy .
1.3 Accuracy
Here with multiple labels ( It is equivalent to the expansion of two categories ) Take an example to illustrate .
Make the actual label y i j ∈ { 0 , 1 } y_{ij} \in \{0, 1\} yij∈{ 0,1}, The prediction label is y ^ i j \hat{y}_{ij} y^ij, The number of test data is n n n, The number of tags is q q q, Then the accuracy
A c c = n q − ∑ i , j ∣ y i j − y ^ i j ∣ n q Acc = \frac{nq - \sum_{i, j} |y_{ij} - \hat{y}_{ij}|}{nq} Acc=nqnq−∑i,j∣yij−y^ij∣
advantage : Simple and direct , Calculate the correct proportion of the forecast .
shortcoming 1: Because the initial prediction value is the initial value ( As before 0.42), You need a threshold to convert it into a distribution value 0 / 1 0/1 0/1. If you use thresholds simply and brutally 0.5, The effect is not good .
shortcoming 2: Due to category imbalance , Negative label ( The actual value of the label is 1 1 1) Positive label ( The actual value of the label is 0 0 0) A lot more . In some extreme multi label datasets , The proportion of negative labels is 99% above , At this time, you only need to judge that all labels are negative or very high Accuray, But it obviously has no practical significance .
1.4 F1
F1-score The main response is Accuracy The shortcomings of 2. See Misclassification cost and class imbalance data , as well as F-measure And cost sensitive evaluation index .
2. Evaluation index based on order
This section describes several order based evaluation indicators .
2.1 Peak-F1
Take all the samples - The tag pair is based on the predicted value ( A pure decimal ) In reverse order . The first k k k Before the second thought k k k individual sample - Label alignment is positive . Draw F1 curve , Finally, take the maximum value in the curve , be called Peak-F1.
advantage : Answer 1.3 In the festival Accuracy The shortcomings of 1, There is no need to select the threshold ( Children make choices ).
shortcoming : Only the highlight moment is recorded , Maybe the quality of the front row is very high , But the quality of the back is not good . It's just a weakness .
2.2 ROC curve And AUC
Take all the samples - The tag pair is based on the predicted value ( A pure decimal ) In reverse order . From two-dimensional coordinates (0, 0) set out , The first 1 One is positive , Just walk up 1 Step , Otherwise, go right 1 Step . Go up 1 The distance of steps is 1 / P 1/P 1/P, turn right 1 The distance of steps is 1 / N 1/N 1/N, among P P P ( N N N) Is actually positive ( negative ) Total number of labels . The curve thus obtained is called ROC, See Receiver operating characteristic curve.
AUC (Area Under Curve) Is the area under the curve , Usually a pure decimal (AUC = 1 It's too much ).
advantage 1: Same as Peak-F1.
characteristic 1: Measure as a whole . If you care about the overall performance , It is the index relative to Peak-F1 The advantages of . If you only care about the first few ( Recommendation system ), It may become a disadvantage .
2.3 nDCG
Be lazy , See https://zhuanlan.zhihu.com/p/371432647.
2.4 [email protected], [email protected], [email protected]
Continue to be lazy , See http://manikvarma.org/downloads/XC/XMLRepository.html.
边栏推荐
- Li Mu D2L (V) -- multilayer perceptron
- 187. Repeated DNA sequence
- Zipkin installation and use
- Advanced mathematics | Takeshi's "classic series" daily question train of thought and summary of error prone points
- What is the difference between NFT and digital collections?
- Li Mu D2L (VI) -- model selection
- Qtcreator reports an error: you need to set an executable in the custom run configuration
- 原根与NTT 五千字详解
- 多项式开根
- 滑动窗口、双指针、单调队列、单调栈
猜你喜欢
论文笔记: 知识图谱 KGAT (未完暂存)
187. Repeated DNA sequence
Server memory failure prediction can actually do this!
volatile 靠的是MESI协议解决可见性问题?(下)
Cat installation and use
异常处理机制二
What is the difference between NFT and digital collections?
【Mysql】认识Mysql重要架构(一)
Original root and NTT 5000 word explanation
redis原理和使用-基本特性
随机推荐
Error: Cannot find module ‘umi‘ 问题处理
volatile 靠的是MESI协议解决可见性问题?(上)
839. 模拟堆
C# Serialport的发送和接收
CF1481C Fence Painting
Sliding window, double pointer, monotone queue, monotone stack
JS - DataTables 关于每页显示数的控制
Redis principle and use - Basic Features
Redis principle and usage - installation and distributed configuration
优秀的 Verilog/FPGA开源项目介绍(三十零)- 暴力破解MD5
Hbuilderx runs the wechat developer tool "fail to open ide" to solve the error
NFT与数字藏品到底有何区别?
【Mysql】一条SQL语句是怎么执行的(二)
js闭包:函数和其词法环境的绑定
pycharm 打开多个项目的两种小技巧
Sending and receiving of C serialport
QtCreator报错:You need to set an executable in the custom run configuration.
JVM command induction
李沐d2l(五)---多层感知机
760. 字符串长度