当前位置:网站首页>Order based evaluation index (especially for recommendation system and multi label learning)
Order based evaluation index (especially for recommendation system and multi label learning)
2022-07-26 09:18:00 【Min fan】
Abstract : Some learners output real prediction for recommendation system or multi label learning . Such as , Forecast No i i i Users to j j j The score of items is 4.2 4.2 4.2, Or predict the number i i i Of the sample j j j The probability of positive labels is 0.46 0.46 0.46. How to evaluate the effectiveness of prediction ? This paper describes several evaluation indexes based on order (Ranking-based evaluation measures) Motivation and physical meaning of .
1. Non order based evaluation index
This section describes several non order based evaluation indicators , And point out its defects .
1.1 Mean absolute error (MAE)
Let the actual score be r i j r_{ij} rij, The predicted score is r ^ i j \hat{r}_{ij} r^ij, Unknown score ( What needs to be predicted ) user - Item set is Ω \Omega Ω, be
M A E = ∑ ( i , j ) ∈ Ω ∣ r i j − r ^ i j ∣ / ∣ Ω ∣ (1) MAE = \sum_{(i, j) \in \Omega} \vert r_{ij} - \hat{r}_{ij}\vert / |\Omega|\tag{1} MAE=(i,j)∈Ω∑∣rij−r^ij∣/∣Ω∣(1)
It represents the absolute difference between the predicted score and the actual score .
advantage : Simple and direct .
defects : Suppose you recommend it regularly for each user 10 10 10 A project . It is easy to cite such counter examples : Put the user's favorite 10 10 10 Projects are at the top ( The recommended effect is perfect ), But the error is great , Such as : The real score is 5, But the prediction score is only 3.6–3.9 ( The prediction scores of other projects are less than 3.6). Such a counterexample can even be cited : MAE Not bad , But the recommended list is not good ( Put users' favorite , The score is 5 The project forecast of is 4.4 branch ; But users like it for the first time , The score is 4 The project forecast of is 4.5 branch ).
1.2 Root squre mean error (RSME)
And MAE Empathy .
1.3 Accuracy
Here with multiple labels ( It is equivalent to the expansion of two categories ) Take an example to illustrate .
Make the actual label y i j ∈ { 0 , 1 } y_{ij} \in \{0, 1\} yij∈{ 0,1}, The prediction label is y ^ i j \hat{y}_{ij} y^ij, The number of test data is n n n, The number of tags is q q q, Then the accuracy
A c c = n q − ∑ i , j ∣ y i j − y ^ i j ∣ n q Acc = \frac{nq - \sum_{i, j} |y_{ij} - \hat{y}_{ij}|}{nq} Acc=nqnq−∑i,j∣yij−y^ij∣
advantage : Simple and direct , Calculate the correct proportion of the forecast .
shortcoming 1: Because the initial prediction value is the initial value ( As before 0.42), You need a threshold to convert it into a distribution value 0 / 1 0/1 0/1. If you use thresholds simply and brutally 0.5, The effect is not good .
shortcoming 2: Due to category imbalance , Negative label ( The actual value of the label is 1 1 1) Positive label ( The actual value of the label is 0 0 0) A lot more . In some extreme multi label datasets , The proportion of negative labels is 99% above , At this time, you only need to judge that all labels are negative or very high Accuray, But it obviously has no practical significance .
1.4 F1
F1-score The main response is Accuracy The shortcomings of 2. See Misclassification cost and class imbalance data , as well as F-measure And cost sensitive evaluation index .
2. Evaluation index based on order
This section describes several order based evaluation indicators .
2.1 Peak-F1
Take all the samples - The tag pair is based on the predicted value ( A pure decimal ) In reverse order . The first k k k Before the second thought k k k individual sample - Label alignment is positive . Draw F1 curve , Finally, take the maximum value in the curve , be called Peak-F1.
advantage : Answer 1.3 In the festival Accuracy The shortcomings of 1, There is no need to select the threshold ( Children make choices ).
shortcoming : Only the highlight moment is recorded , Maybe the quality of the front row is very high , But the quality of the back is not good . It's just a weakness .
2.2 ROC curve And AUC
Take all the samples - The tag pair is based on the predicted value ( A pure decimal ) In reverse order . From two-dimensional coordinates (0, 0) set out , The first 1 One is positive , Just walk up 1 Step , Otherwise, go right 1 Step . Go up 1 The distance of steps is 1 / P 1/P 1/P, turn right 1 The distance of steps is 1 / N 1/N 1/N, among P P P ( N N N) Is actually positive ( negative ) Total number of labels . The curve thus obtained is called ROC, See Receiver operating characteristic curve.
AUC (Area Under Curve) Is the area under the curve , Usually a pure decimal (AUC = 1 It's too much ).
advantage 1: Same as Peak-F1.
characteristic 1: Measure as a whole . If you care about the overall performance , It is the index relative to Peak-F1 The advantages of . If you only care about the first few ( Recommendation system ), It may become a disadvantage .
2.3 nDCG
Be lazy , See https://zhuanlan.zhihu.com/p/371432647.
2.4 [email protected], [email protected], [email protected]
Continue to be lazy , See http://manikvarma.org/downloads/XC/XMLRepository.html.
边栏推荐
- Codeworks DP collection
- STM32+MFRC522完成IC卡号读取、密码修改、数据读写
- 网络安全漫山遍野的高大上名词之后的攻防策略本质
- Li Mu D2L (V) -- multilayer perceptron
- Grain College of all learning source code
- Announcement | FISCO bcos v3.0-rc4 is released, and the new Max version can support massive transactions on the chain
- NTT(快速数论变换)多项式求逆 一千五百字解析
- Object type collections are de duplicated according to the value of an attribute
- 语音聊天app源码——钠斯直播系统源码
- 字节缓冲流&字符流详解
猜你喜欢
redis原理和使用-安装和分布式配置
Nuxt - 项目打包部署及上线到服务器流程(SSR 服务端渲染)
Grain College of all learning source code
[leetcode database 1050] actors and directors who have cooperated at least three times (simple question)
Stm32+mfrc522 completes IC card number reading, password modification, data reading and writing
Elastic APM installation and use
Study notes of dataX
对标注文件夹进行清洗
NFT与数字藏品到底有何区别?
Study notes of canal
随机推荐
QtCreator报错:You need to set an executable in the custom run configuration.
【线上问题】Timeout waiting for connection from pool 问题排查
Use of off heap memory
Matlab 绘制阴影误差图
209. Subarray with the smallest length
2022茶艺师(中级)特种作业证考试题库模拟考试平台操作
字节缓冲流&字符流详解
【无标题】
Study notes of dataX
Advanced mathematics | Takeshi's "classic series" daily question train of thought and summary of error prone points
分布式跟踪系统选型与实践
JS closure: binding of functions to their lexical environment
Error: cannot find module 'UMI' problem handling
布隆过滤器
Horizontal comparison of the data of the top ten blue chip NFTs in the past half year
Error: Cannot find module ‘umi‘ 问题处理
Datawhale panda book has been published!
2B and 2C
Li Mu D2L (IV) -- softmax regression
【线上死锁分析】由index_merge引发的死锁事件