当前位置:网站首页>[machine learning] random forest
[machine learning] random forest
2022-07-18 13:19:00 【Cabbage that wants to become powerful】
List of articles
- One 、 What is a random forest ?
- Two 、 Random forests – Random Forest | RF
- 3、 ... and 、 To construct a random forest 4 A step
- Four 、 Advantages and disadvantages of random forest
- 5、 ... and 、 Random forests 4 Comparison test of two implementation methods
- 6、 ... and 、 Random forest 4 Two application directions
- Reference link
One 、 What is a random forest ?
Random forest is a kind of forest Decision tree Composed of The integration algorithm , He can perform well in many situations .
Understand the above sentence deeply , Please read my other two articles :
- 【 machine learning 】 Decision tree – Decision Tree
- 【 machine learning 】 Integrated learning - Ensemble Learning
1. Random forest is an integrated learning algorithm
Random forest belongs to Integrated learning Medium Bagging(Bootstrap AGgregation For short ) Method . The relationship between them is shown as follows :

2. The basic learning machine of random forest is decision tree
Decision tree :
Decision tree is a very simple method based on if-then-else The rules have supervised learning algorithm , The above picture can intuitively express the logic of the decision tree .
Two 、 Random forests – Random Forest | RF
Random forests :

Random forest is made up of many decision trees , There is no correlation between different decision trees .
When we do classification tasks , New input samples enter , Let each decision tree in the forest be judged and classified separately , Each decision tree will get its own classification result , Which of the classification results of the decision tree is the most , Then random forest will take this result as the final result .
3、 ... and 、 To construct a random forest 4 A step

- extract N Samples : One sample size is N The sample of , There's a pull back N Time , Every time I draw 1 individual , Finally formed N Samples . This is the chosen N A sample is used to train a decision tree , As a sample at the root of the decision tree .
- choice m Attributes : When each sample has M When attributes , When each node of the decision tree needs to be split , Random from here M Select from the attributes m Attributes , Meet the conditions m << M . And then from here m Use a certain strategy in each attribute ( For example, information gain ) Choose 1 Attributes are split attributes of this node .
- Construct a decision tree : In the process of decision tree formation, every node should follow the steps 2 To divide ( It's easy to understand , If the next attribute selected by this node is the one just used when its parent node split , Then the node has reached the leaf node , There is no need to continue to split ). Until it can no longer be divided . Note that there is no pruning during the whole decision tree formation process .
- Form a forest : Follow the steps 1~3 Build a lot of decision trees , So it's a random forest .
Four 、 Advantages and disadvantages of random forest
1. advantage
- It can handle very high dimensions ( There are many features ) The data of , And no dimension reduction , No feature selection required
- It can judge the importance of features
- The interaction between different features can be judged
- It's not easy to over fit
- The training speed is relatively fast , Easy to make parallel methods
- It's easy to implement
- For unbalanced data sets , It balances the error .
- If a large part of the feature is missing , Still maintain accuracy .
2. shortcoming
- Random forest has been proved to be over fitted in some noisy classification or regression problems .
- For data with different values for attributes , The attributes with more value division will have greater influence on the random forest , So the attribute weights produced by the random forest on this data are not trusted
5、 ... and 、 Random forests 4 Comparison test of two implementation methods
Random forest is a common machine learning algorithm , It can be used in classification problem , It can also be used for regression problems . This paper deals with scikit-learn、Spark MLlib、DolphinDB、XGBoost Four platforms of random forest algorithm implementation for comparative testing . Evaluation indicators include memory usage 、 Running speed and classification accuracy .
The test results are as follows :

6、 ... and 、 Random forest 4 Two application directions

Random forest can be used in many places :
- Classification of discrete values
- Regression of continuous values
- Unsupervised learning clustering
- Abnormal point detection
Reference link
边栏推荐
猜你喜欢

Yiwen xuxue pyspark data analysis foundation: Spark local environment deployment and construction

PPP comprehensive experiment

hcip第一天学习--复习hcia(静态)

Cadence learning path (VI) component packaging drawing

【随记】从入门到入土的密码学 | AES

MATLAB学习第二天(基础语法、变量、命令以及新建自己文件)

Implementation of MCU stack backtracking debugging principle based on gd32 c10x

内存管理页面属性

The share price fell through! Is Muse going to pay for the "IQ tax"?

傲梅轻松克隆系统盘备份
随机推荐
IIC read / write EEPROM
The share price fell through! Is Muse going to pay for the "IQ tax"?
HMS core graphics and image technology shows the latest functions and application scenarios, and accelerates the construction of digital intelligence life
Kingbasees SQL language reference manual of Jincang database (3.1.1.9. network address type)
一个优秀的智慧展厅应该具备哪些能力
【机器学习】随机森林 – Random forest
快速上手Jupyter Notebook
Mgre/ospf comprehensive experiment
剑指 Offer 27. 二叉树的镜像
viewpager冲突解决
力扣暑假刷题
剑指 Offer 26. 树的子结构
Qunhui 7.1 add hard disk with SHR
台式万用表究竟如何选型?
【机器学习 - 决策树】信息增益
03_案例搭建【RestTemplate 调用微服务】
金仓数据库 KingbaseES SQL 语言参考手册 (3.1.1.4. 日期/时间类型)
【随记】从入门到入土的密码学 | DES
吉时利万用表DMM6500
金仓数据库 KingbaseES SQL 语言参考手册 (3.1.1.9. 网络地址类型)