当前位置:网站首页>R language -- principle of Cox model calibration curve (I) data source
R language -- principle of Cox model calibration curve (I) data source
2022-07-19 12:47:00 【Eat two bites at a time】
List of articles
Preface
About cox The calibration curve of the model is drawn , You can search in the browser , Most of the parameter settings have been made very clear ( There is not much to say here ), But about the principle of calibration curve , About Calibrate each point on the graph 、 The meaning of the error line and how to calculate it , I believe most people don't quite understand . By reading the source code , Record what I learned here , It can also make some people better understand their own data and models .
One 、 What is the calibration curve ?
The calibration curve is a scatter plot of the actual occurrence rate and the predicted occurrence rate .
This article will introduce the points in the figure 、 Data behind the line . Let's take a look at an ordinary calibration curve ( Parameters not changed ).
Two 、 Calibration curve
1. How to draw ?
( The sending assistant said there was too little content , I use the parameter water point length )
Mainly drawing parameters :
- x: adopt calibrate Function to get object
- xlab:x Axis title
- ylab:y Axis title
- subtitles = TRUE: Subheading , Here refers to the notes at the bottom of the figure ( Small text ), The default is TRUE, You may need to set it when posting an article FALSE To remove .
- conf.int = TRUE: Whether it is necessary to draw error lines , The default is TRUE, draw
- cex.subtitles = 0.75: The text size of the sub title
- riskdist = TRUE: Whether to draw the shaft whisker diagram above the diagram , It's the thing like long hair
- add = FALSE, Whether to create a new graph or add it to another graph .
- scat1d.opts: This parameter is used to set the axis whisker graph , Don't pay too much attention to , Details refer to scat1d The parameters of the function .
- par.corrected: Parameter type is list, Parameters for setting error points . The default is NULL, The function will be automatically assigned as
list(col = "blue", lty = 1, lwd = 1, pch = 4), Your new parameter will overwrite the duplicate parameter . For example, editors may have strange requirements , Not that calibrated little one x spot , You can setpar.corrected=list(col='white).
dd=datadist(data)
options(datadist="dd")
f<-cph(Surv(DMFS new ,DMFS Outcome type )~ Histological subdivision +T Sub classification +S100,time.inc=60,data = data,x=TRUE,y=TRUE,surv=TRUE)
cal <- calibrate(f, cmethod="KM", method="boot", u=60, m= round(nrow(data)/3), B=1000)
cal
plot(cal,xlim=c(0,1),ylim=c(0,1))

2. How to understand ?
From the calibration curve, we can find , There is 6 individual points,3 A solid 3 individual x shape points, also 3 Error bars . combination cal The calculation result shows .
Solid points: Abscissa : yes calibrate function ( hereinafter referred to as cal) In the calculation results KM Column , The ordinate is :cal Medium mean.predicted Column .
x points: The ordinate of the forked point is cal Medium KM.corrected Column .
Error bar : The error is based on cal Medium std.err The specific calculation function calculated by the column is : Top :ifelse(KM == 0, 0, pmin(1, KM * exp(1.959964 * std.err))). lower end :ifelse(KM == 0, 0, KM * exp(1.959964 * (-std.err))).
Notes in the lower left and right corners : You can find some annotation information below the figure , The meanings are respectively :
- n=72 On behalf of 72 data
- d=29 The number of events representing the outcome is 29( That is, in the data 29 Example sutus=1)
- p=6 representative cox The coefficients in the model are 6 individual ( Be careful : The number of coefficients here is not the same as the number of variables modeled , Because the classification variable will have multiple coefficients )
- 24 subjects per group Indicates data grouping ( I will introduce it later when calculating ) when , Each group 24 Data
- Gray: ideal The gray line is the ideal line
- B=1000 Here again, it means that 1000 Secondary resampling calculation .
- Based on observed-predicted Express cal In the data index.orig How is this column calculated , It has no effect on drawing ( Dig a hole , I'll talk about it later ).
3. verification
Then run the following code according to the above to see whether the figure is the same .
## Prepare error bar chart data
errl <- ifelse(cal[,"KM"] == 0, 0, cal[,"KM"] * exp(1.959964 * (-cal[,"std.err"])))
errh <- ifelse(cal[,"KM"] == 0, 0, pmin(1, cal[,"KM"] * exp(1.959964 * cal[,"std.err"])))
## Draw an error line
errbar(x = cal[,"mean.predicted"],y = cal[,"KM"],yminus = errl,yplus = errh,
pch=16,cex=1.2,xlim = c(0,1),ylim = c(0,1),asp=1,xaxs='i',
yaxs='i',
xlab = 'Fraction surviving 60 Day',
ylab = 'Predicted 60 Days Survival')
## Add gray reference diagonal
abline(a = 0,b = 1,col='grey')
## Add the attachment
lines(x = cal[,"mean.predicted"],y = cal[,"KM"])
## Add the calibrated point
points(x = cal[,"mean.predicted"],y = cal[,"KM.corrected"],pch=4)
## Draw a whisker diagram
scat1d(x = attr(cal,"predicted"),nhistSpike = 200)

Can't say very similar , I can only say it's the same ( To show that I really don't just change the original drawing parameters , It is obvious that the gray reference line layer in the above figure is under the implementation ). Yes, of course , Due to the problem of coordinate proportion in the original function , The appearance may be different , But the meaning of the picture is really the same .
summary
In order to better understand the calibration curve , I went to see the source code of related functions , It can be regarded as one's own learning record .
This article is just the beginning , It's far from over , It will continue to be more .
My little white , If there is a mistake , Please criticize and correct me
边栏推荐
- Yunxi and Tencent cloud have reached a strategic cooperation to accelerate the expansion of the global live broadcast market
- My favorite 10 machine learning official account
- 市场“不确定性”中的投资逻辑 2020-03-18
- Acwing786. 第k个数
- Day 1 Experiment
- Binary tree 2-symmetry recursion problem
- mysql中对于数据库的基本操作
- 数组去重 数组排序 最大值 求和 类数组转化 过滤数组
- 若依excel合并单元格导出ExcelUtils
- Talk about the redis cache penetration scenario and the corresponding solutions
猜你喜欢

Li Hongyi machine learning 1 Introduction of this course

Installation and use of MySQL under Linux

Notes on the fifth day

R语言--Cox模型校准曲线原理(一)数据来源

ros(26):ros::Time::now(),ros::Duration,toSec(),toNSec(); Calculate program execution time

关于“抄底”心态,让你筋疲力尽 2020-03-15

C # from introduction to mastery Part 1: C # overview and introduction

What is the relationship between softmax and cross enterprise?

Go unit test

When will the moon appear
随机推荐
MIHA tour 2023 autumn recruitment officially begins ~ early approval has the opportunity to avoid written examination!
Day 1 Experiment
关于“抄底”心态,让你筋疲力尽 2020-03-15
Useeffect summary
ros(26):ros::Time::now(),ros::Duration,toSec(),toNSec(); Calculate program execution time
WAV和PCM的关系和区别
若依excel合并单元格导出ExcelUtils
Dynamic memory planning
C # from introduction to mastery Part II: C # basic grammar
Ultrasonic sensor (chx01) learning notes Ⅲ - I2C reading and writing operation
Investment logic in market "uncertainty" 2020-03-18
Basic database operations in MySQL
mysql数据库表增添字段,删除字段、修改字段的排列等操作,还不快来
超声波传感器系列文章汇总
Return to risk ratio: the most important indicator of investment opportunities 2020-03-14
String correlation function (II)
35岁以上的测试/开发程序员职业生涯走向,是职场转折点吗?
Is the career direction of test / development programmers over 35 a turning point in the workplace?
Ah Qu's thinking
ASP.NET协同OA办公服务管理平台源码