当前位置:网站首页>Order based evaluation index (especially for recommendation system and multi label learning)
Order based evaluation index (especially for recommendation system and multi label learning)
2022-07-26 09:18:00 【Min fan】
Abstract : Some learners output real prediction for recommendation system or multi label learning . Such as , Forecast No i i i Users to j j j The score of items is 4.2 4.2 4.2, Or predict the number i i i Of the sample j j j The probability of positive labels is 0.46 0.46 0.46. How to evaluate the effectiveness of prediction ? This paper describes several evaluation indexes based on order (Ranking-based evaluation measures) Motivation and physical meaning of .
1. Non order based evaluation index
This section describes several non order based evaluation indicators , And point out its defects .
1.1 Mean absolute error (MAE)
Let the actual score be r i j r_{ij} rij, The predicted score is r ^ i j \hat{r}_{ij} r^ij, Unknown score ( What needs to be predicted ) user - Item set is Ω \Omega Ω, be
M A E = ∑ ( i , j ) ∈ Ω ∣ r i j − r ^ i j ∣ / ∣ Ω ∣ (1) MAE = \sum_{(i, j) \in \Omega} \vert r_{ij} - \hat{r}_{ij}\vert / |\Omega|\tag{1} MAE=(i,j)∈Ω∑∣rij−r^ij∣/∣Ω∣(1)
It represents the absolute difference between the predicted score and the actual score .
advantage : Simple and direct .
defects : Suppose you recommend it regularly for each user 10 10 10 A project . It is easy to cite such counter examples : Put the user's favorite 10 10 10 Projects are at the top ( The recommended effect is perfect ), But the error is great , Such as : The real score is 5, But the prediction score is only 3.6–3.9 ( The prediction scores of other projects are less than 3.6). Such a counterexample can even be cited : MAE Not bad , But the recommended list is not good ( Put users' favorite , The score is 5 The project forecast of is 4.4 branch ; But users like it for the first time , The score is 4 The project forecast of is 4.5 branch ).
1.2 Root squre mean error (RSME)
And MAE Empathy .
1.3 Accuracy
Here with multiple labels ( It is equivalent to the expansion of two categories ) Take an example to illustrate .
Make the actual label y i j ∈ { 0 , 1 } y_{ij} \in \{0, 1\} yij∈{ 0,1}, The prediction label is y ^ i j \hat{y}_{ij} y^ij, The number of test data is n n n, The number of tags is q q q, Then the accuracy
A c c = n q − ∑ i , j ∣ y i j − y ^ i j ∣ n q Acc = \frac{nq - \sum_{i, j} |y_{ij} - \hat{y}_{ij}|}{nq} Acc=nqnq−∑i,j∣yij−y^ij∣
advantage : Simple and direct , Calculate the correct proportion of the forecast .
shortcoming 1: Because the initial prediction value is the initial value ( As before 0.42), You need a threshold to convert it into a distribution value 0 / 1 0/1 0/1. If you use thresholds simply and brutally 0.5, The effect is not good .
shortcoming 2: Due to category imbalance , Negative label ( The actual value of the label is 1 1 1) Positive label ( The actual value of the label is 0 0 0) A lot more . In some extreme multi label datasets , The proportion of negative labels is 99% above , At this time, you only need to judge that all labels are negative or very high Accuray, But it obviously has no practical significance .
1.4 F1
F1-score The main response is Accuracy The shortcomings of 2. See Misclassification cost and class imbalance data , as well as F-measure And cost sensitive evaluation index .
2. Evaluation index based on order
This section describes several order based evaluation indicators .
2.1 Peak-F1
Take all the samples - The tag pair is based on the predicted value ( A pure decimal ) In reverse order . The first k k k Before the second thought k k k individual sample - Label alignment is positive . Draw F1 curve , Finally, take the maximum value in the curve , be called Peak-F1.
advantage : Answer 1.3 In the festival Accuracy The shortcomings of 1, There is no need to select the threshold ( Children make choices ).
shortcoming : Only the highlight moment is recorded , Maybe the quality of the front row is very high , But the quality of the back is not good . It's just a weakness .
2.2 ROC curve And AUC
Take all the samples - The tag pair is based on the predicted value ( A pure decimal ) In reverse order . From two-dimensional coordinates (0, 0) set out , The first 1 One is positive , Just walk up 1 Step , Otherwise, go right 1 Step . Go up 1 The distance of steps is 1 / P 1/P 1/P, turn right 1 The distance of steps is 1 / N 1/N 1/N, among P P P ( N N N) Is actually positive ( negative ) Total number of labels . The curve thus obtained is called ROC, See Receiver operating characteristic curve.
AUC (Area Under Curve) Is the area under the curve , Usually a pure decimal (AUC = 1 It's too much ).
advantage 1: Same as Peak-F1.
characteristic 1: Measure as a whole . If you care about the overall performance , It is the index relative to Peak-F1 The advantages of . If you only care about the first few ( Recommendation system ), It may become a disadvantage .
2.3 nDCG
Be lazy , See https://zhuanlan.zhihu.com/p/371432647.
2.4 [email protected], [email protected], [email protected]
Continue to be lazy , See http://manikvarma.org/downloads/XC/XMLRepository.html.
边栏推荐
- Announcement | FISCO bcos v3.0-rc4 is released, and the new Max version can support massive transactions on the chain
- 李沐d2l(四)---Softmax回归
- Study notes of canal
- pycharm 打开多个项目的两种小技巧
- 布隆过滤器
- Qtcreator reports an error: you need to set an executable in the custom run configuration
- What is the difference between NFT and digital collections?
- Bloom filter
- 839. 模拟堆
- C# Serialport的发送和接收
猜你喜欢

Clean the label folder

Advanced mathematics | Takeshi's "classic series" daily question train of thought and summary of error prone points

布隆过滤器

What is the difference between NFT and digital collections?

谷粒学院的全部学习源码

Original root and NTT 5000 word explanation

jvm命令归纳

Go intelligent robot alpha dog, alpha dog robot go

2022流动式起重机司机考试题模拟考试题库模拟考试平台操作

Selection and practice of distributed tracking system
随机推荐
谷粒学院的全部学习源码
2B和2C
Datax的学习笔记
PAT 甲级 A1034 Head of a Gang
力扣刷题,三数之和
Zipkin安装和使用
李沐d2l(四)---Softmax回归
十大蓝筹NFT近半年数据横向对比
CF1481C Fence Painting
Ext4 file system opens dir_ After nlink feature, link_ Use link after count exceeds 65000_ Count=1 means the quantity is unknown
Sending and receiving of C serialport
"Could not build the server_names_hash, you should increase server_names_hash_bucket_size: 32"
838. Heap sorting
【final关键字的使用】
JS closure: binding of functions to their lexical environment
CSDN Top1 "how does a Virgo procedural ape" become a blogger with millions of fans through writing?
MySQL strengthen knowledge points
760. String length
高数 | 武爷『经典系列』每日一题思路及易错点总结
力扣——二叉树剪枝