当前位置：网站首页>LSM storage model

LSM storage model

2022-07-18 05:18:00 【Qingdong】

order

Excerpt from ：

https://zhuanlan.zhihu.com/p/37193700

LSM

LSM（log-structed-merge-tree）

Often used in HBase、BigTable、Cassandra、MongoDB etc. NoSql The underlying storage model is LSM.

Hbase It is also used in the learning process LSM Model , So let's sort it out again LSM Model .

background

Everybody knows The sequential read and write speed of the disk is very fast , Random reading and writing is slow .

Now on the market 7200rpm Seagate SATA Hard disk sequential reading and writing can basically achieve 300MB/s;

But random reading and writing are slow ,100 IOPS, Suppose random reading and writing every time IO The size is 1KB, Then the random read / write data bandwidth is 100KB/s.

There is a difference of three orders of magnitude between sequential reading and writing and random reading and writing .

So we should try to create some sequential read-write file blocks , To help us find and read files .

For the above characteristics of disk , Applications are optimized according to their own business reading and writing characteristics .

Mysql How to do it?

Everyone is familiar Mysql The internal structure and storage structure of relational database ,

Mysql Of innodb For storage engine bottom B+ Tree data structure to organize data on disk ,

B+ Tree demonstration ：

https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

A ladder may be needed

B+ The degree of nodes of a tree is much larger than that of a balanced binary tree

Degree is a Node Saved data

The degree of balanced binary tree is 2

therefore B+ The tree is very low （ It's usually 3~4）, Every data query only needs 3~4 Secondary disk random IO You can find the data , It's very efficient ;

The statement is not accurate , In fact, it is to find the data page 16K, Load into memory ,

Then find the data by dichotomy , Memory binary search takes much less time than disk IO, Negligible

however insert and update The operation is random ,update Find the updated meaning first primary-key, to update , adjustment B+ Trees ;

So in the process of writing , Need to be right B+ Tree Make changes

This means that we need to take out the nodes that need to be changed , Then change the value .

lookup primary-key The process is very efficient , But adjust B+ The disk of the tree IO But it costs a lot , So relational databases mysql Their writing efficiency has been criticized .

Is there a substitute B+ Data organization model of tree , Without affecting the reading efficiency , Improve data writing efficiency ？

from O'Neil Proposed LSM The storage model LSM paper Is to solve the above problems .

LSM How to solve B+ Write efficiency of

about B+ Tree Write efficiency of ,LSM It is solved in this way ：

Simply speaking , It is to give up some disk read performance in exchange for writing order .

Let's suppose we want to write a 1000 individual key Is the data of random numbers ,

For disks , The fastest way to write must be to write each write directly to the disk in sequence .

But the problem with this is , There is no way to query , Because each time you query a value, you need to traverse the entire data to find , The reading performance is too poor ;

So if I want to get the highest disk read performance , What should be done ？ Just sort all the data .

B+ Trees are such structures , but B+ The writing performance of the tree is too poor , Need to improve write , You can give up some disk read performance , What shall I do? ？

Divide many small ordered structures .

Like every M Data , Sort once in memory , below 100 Data , Sort again ……

Do this in turn , I can get N/M An ordered small ordered structure .

So I sort in memory every time , Then write to the disk in sequence with a structure size , Writing efficiency has been improved .

So how to query ？

So let's look for every smallest ordered structure , Find all ordered structures or required structures and return .

Then every query with ordered structure is ：

log2M

Two points search

The total time spent is ：

N/M * log2M

N Total data

M Ordered structure size

So you can see LSM Is to sacrifice the efficiency of reading in exchange for the efficiency of writing .

Maybe we still don't think it's enough , therefore HBase Then there is the bloom filter bloomfilter and compaction Mechanism .

Of course, every data structure has its own quality , You need to use .

May refer to ：RocksDBwiki

HBase Why choose LSM?

B+ Index tree and log type （append） File operations （ database WAL journal ） It is the two extremes of data reading and writing .

B+ The tree has high reading efficiency and poor writing efficiency ;

log Type file operation has high writing efficiency and poor reading efficiency ;

So you have to sort and log Make a compromise between type file operations , So we introduced log-structed merge tree Model .

It can be seen from the name that LSM Existing log file operations , Improve write efficiency , In every sstable Middle order , Ensure the query efficiency .

Very much in HBase This requires a large number of distributed write formats .

原网站

版权声明
本文为[Qingdong]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/199/202207151521343943.html