当前位置:网站首页>LSM storage model
LSM storage model
2022-07-18 05:18:00 【Qingdong】
order
Excerpt from :
https://zhuanlan.zhihu.com/p/37193700
LSM
LSM(log-structed-merge-tree)
Often used in HBase、BigTable、Cassandra、MongoDB etc. NoSql The underlying storage model is LSM.
Hbase It is also used in the learning process LSM Model , So let's sort it out again LSM Model .
background
Everybody knows The sequential read and write speed of the disk is very fast , Random reading and writing is slow .
Now on the market 7200rpm Seagate SATA Hard disk sequential reading and writing can basically achieve 300MB/s;
But random reading and writing are slow ,100 IOPS, Suppose random reading and writing every time IO The size is 1KB, Then the random read / write data bandwidth is 100KB/s.
There is a difference of three orders of magnitude between sequential reading and writing and random reading and writing .
So we should try to create some sequential read-write file blocks , To help us find and read files .
For the above characteristics of disk , Applications are optimized according to their own business reading and writing characteristics .
Mysql How to do it?
Everyone is familiar Mysql The internal structure and storage structure of relational database ,
Mysql Of innodb For storage engine bottom B+ Tree data structure to organize data on disk ,
B+ Tree demonstration :
https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html
A ladder may be needed
B+ The degree of nodes of a tree is much larger than that of a balanced binary tree
Degree is a Node Saved data
The degree of balanced binary tree is 2
therefore B+ The tree is very low ( It's usually 3~4), Every data query only needs 3~4 Secondary disk random IO You can find the data , It's very efficient ;
The statement is not accurate , In fact, it is to find the data page 16K, Load into memory ,
Then find the data by dichotomy , Memory binary search takes much less time than disk IO, Negligible
however insert and update The operation is random ,update Find the updated meaning first primary-key, to update , adjustment B+ Trees ;
So in the process of writing , Need to be right B+ Tree Make changes
This means that we need to take out the nodes that need to be changed , Then change the value .
lookup primary-key The process is very efficient , But adjust B+ The disk of the tree IO But it costs a lot , So relational databases mysql Their writing efficiency has been criticized .
Is there a substitute B+ Data organization model of tree , Without affecting the reading efficiency , Improve data writing efficiency ?
from O'Neil Proposed LSM The storage model LSM paper Is to solve the above problems .

LSM How to solve B+ Write efficiency of
about B+ Tree Write efficiency of ,LSM It is solved in this way :
Simply speaking , It is to give up some disk read performance in exchange for writing order .
Let's suppose we want to write a 1000 individual key Is the data of random numbers ,
For disks , The fastest way to write must be to write each write directly to the disk in sequence .
But the problem with this is , There is no way to query , Because each time you query a value, you need to traverse the entire data to find , The reading performance is too poor ;
So if I want to get the highest disk read performance , What should be done ? Just sort all the data .
B+ Trees are such structures , but B+ The writing performance of the tree is too poor , Need to improve write , You can give up some disk read performance , What shall I do? ?
Divide many small ordered structures .
Like every M Data , Sort once in memory , below 100 Data , Sort again ……
Do this in turn , I can get N/M An ordered small ordered structure .
So I sort in memory every time , Then write to the disk in sequence with a structure size , Writing efficiency has been improved .
So how to query ?
So let's look for every smallest ordered structure , Find all ordered structures or required structures and return .
Then every query with ordered structure is :
log2M
Two points search
The total time spent is :
N/M * log2M
N Total data
M Ordered structure size
So you can see LSM Is to sacrifice the efficiency of reading in exchange for the efficiency of writing .
Maybe we still don't think it's enough , therefore HBase Then there is the bloom filter bloomfilter and compaction Mechanism .
Of course, every data structure has its own quality , You need to use .
May refer to :RocksDBwiki
HBase Why choose LSM?
B+ Index tree and log type (append) File operations ( database WAL journal ) It is the two extremes of data reading and writing .
B+ The tree has high reading efficiency and poor writing efficiency ;
log Type file operation has high writing efficiency and poor reading efficiency ;
So you have to sort and log Make a compromise between type file operations , So we introduced log-structed merge tree Model .
It can be seen from the name that LSM Existing log file operations , Improve write efficiency , In every sstable Middle order , Ensure the query efficiency .
Very much in HBase This requires a large number of distributed write formats .
边栏推荐
- Dynamic programming | longest common subsequence
- C#编写一个GUI工具并反编译
- SourceInsight 插件使用
- Matlab_ Figure is displayed on the top during debugging
- CRMEB Pro v1.4,让用户体验更出彩!
- Redis04: three special data types of redis
- cJSON使用
- Redis03: five common data types of redis
- Redis02: install redis in Linux Environment
- 科科过信管】信管论文写作要求-不合格论文
猜你喜欢

Salesforce中实施Campaign Influence模型注意事项

小白挑战学c语言第一天----运行环境的搭建

Matlab_ Figure is displayed on the top during debugging

Information system project manager 10 days before the exam limit sprint + answer (7)

Information system project manager 10 days before the exam limit sprint + answer (10): summary of comprehensive knowledge

Profiles vs Permission Sets

Salesforce email sent to spam mailbox or SF email processing method not received (dkim - New CNAME version)

CAS Compare and Swap 比较后交换

Typora入门:全网最全教程

(pc+wap) Zhimeng template waterproof building materials website
随机推荐
Salesforce Dynamic Forms
What fault simulation does the chaosblade now support for the database? Do the teachers have any information?
FreeModbus 在 STM32F1 平台的移植和解析
Information system project managers must memorize the core examination points (III) 14 graphic tools of UML
mysql 创建学生表并查询成绩
微信小程序从入门到学会第十天-------小程序的其它操作
Matlab_ Figure is displayed on the top during debugging
Dynamic programming | matrix multiplication
信息系统项目管理师必背核心考点(一)国家信息化体系六要素
Nifi listsftp intensive talk
使用TIBCO Rendezvous发送hello world,实现监听和发送
CRC16校验 C语言实现
@Controller和@RestController的区别
The 9th Blue Bridge Cup group B provincial tournament.
STM32F103 串口 +DMA中断实现数据收发
STM32 IAP远程更新
Information system project manager 10 days before the exam limit sprint + answer (7)
蚂蚁隐私计算创新TEE技术获学术认可
hbuilder提交代码
1 start.s分析