当前位置:网站首页>The integrated real-time HTAP database stonedb, how to replace MySQL and achieve nearly 100 times the improvement of analysis performance
The integrated real-time HTAP database stonedb, how to replace MySQL and achieve nearly 100 times the improvement of analysis performance
2022-07-26 04:45:00 【StoneDB】

The industry revolves around MySQL build HTAP Mainstream programs
1. MySQL + Hadoop

2. MySQL + Data Lake

3. MySQL + ClickHouse/Greenplum

4. Based on multiple copies of Divergent Design

The system architecture is too heavy , The complexity of operation and maintenance is high ; TP Data is passed through ETL Mode to AP In the system , The data delay is large , It is difficult to meet the real-time requirements of services for analysis ; Heterogeneous database combination , Technically, two sets of database systems need to be maintained , It involves many technology stacks , Higher requirements for technical personnel ; NewSQL System , Various compatibility adaptations are required , Adaptation work will be more complex , The requirements for technicians are also relatively high . So , We brought in HTAP The solution to the problem :StoneDB, An open source integrated real-time HTAP database .

StoneDB Plug in access to MySQL, By inquiring / Write interface and MySQL server Layers interact , The main features of the current integrated architecture are :
Organize data by column storage , Combined with efficient compression algorithm , bring StoneDB While obtaining high performance, it also has the advantage of storage cost . Based on knowledge grid (Knowledge Grid) Approximate query and parallel processing mechanism , bring StoneDB When dealing with massive data and complex queries , It can minimize irrelevant data IO. Using histogram , Data block bitmap and many other statistical information to further accelerate the speed of query processing . Adopt Column-at-a-time An execution engine for columnar storage , And further improve the efficiency of the execution engine . Provide high-speed data loading capability .
Let's take a look StoneDB Architecture design :
Architecture design : Data organization form

stay StoneDB in , Data is organized in columns . This form of data organization , Friendly to all kinds of compression algorithms , According to the type of each column 、 Data and other factors, choose the appropriate efficient compression algorithm , To save IO and Memory Purpose of resources . It also has the following advantages :
Cache Line friendly . During query , The operations for each column are performed concurrently , Finally, aggregate the complete recordset in memory . When querying ad hoc , Just scan specific columns , No need to consume IO Resources to read the values of other columns . No need to maintain index , Support ad hoc query of any column combination . It can provide knowledge-based grid capabilities , Improve data search efficiency .
Architecture design : Column based data compression
As mentioned above , Data is organized in columns , All records in the column have the same type , You can choose the corresponding efficient compression algorithm according to the data type , because :
The probability of duplicate values in the column is high , The compression effect is obvious . The data node size is fixed , Maximize compression performance and efficiency . Compress according to a specific numeric type (int,float,date/time,string etc. ).
StoneDB It can support up to 20+ An adaptive compression algorithm , At present, it mainly uses :
PPM LZ4 B2 Delta wait
Architecture design : Data organization structure and knowledge grid

Physical data is divided into fixed data blocks , For storage , Usually called :Data Node, Usually it is :128KB, The system is convenient for IO Optimization of efficiency . meanwhile , It can also provide block based (Block) Efficient compression / encryption algorithm . Knowledge grids can be query optimizers , Implementation and compression algorithm support . for example : Query based on knowledge grid , The optimizer will use the knowledge grid to decide what to grab Data Node To perform data operations .
Data nodes (Data Node,DN): The data block size is fixed ( Typical values 128KB), Optimize IO efficiency , Provide block based (Block) Efficient compression / encryption algorithm . Knowledge grid (Knowledge Grid,KG): For metadata storage . Metadata node (Metadata Node,MDN): Metadata information describing the data node . By the knowledge node (Knowledge Node,KN) form , For the query optimizer , Support plan execution and compression algorithm .
Architecture design - Inquire about : Knowledge grid ( Knowlegde Grid ) overview

Architecture design - Inquire about : be based on Knowlegde Grid The optimizer for



Architecture design - Inquire about : Processing flow


select * from xx where seller = 86, The internal execution process is as follows :Implementation plan optimization and implementation :
Based on knowledge grid Cost-based Optimize IO Thread pool maintenance Memory allocation and management
SMP Support ( Concurrent query ) Vectorization execution

Fully compatible with MySQL. Whether it's grammar or ecology MySQL Users can seamlessly switch to StoneDB. Business 、 Integration of analysis . There is no need to ETL, Transactional data is synchronized to the analysis engine in real time . It enables users to obtain real-time business analysis results . Fully open source . Compare with MySQL Provide 10-100 Times AP Ability . Hundred million level multi meter correlation rapid response , There is no need to wait for the result of the decision . 10 Times the import speed . because AP scenario , The amount of analysis data is huge , Efficient import speed , It can bring you a good user experience . 1/10 Of TCO cost ,StoneDB Have efficient compression algorithm , Seamless business migration capability , And its simple architecture , Can bring to users TCO Reduction .
StoneDB 2.0 Will bring a new architecture
StoneDB Open source warehouse
https://github.com/stoneatom/stonedb

I worked in Huawei 、 Iqiyi 、 Peking University is engaged in the design of the core architecture of the database kernel . exceed 10 Experience in database kernel development , Good at query engine , Execution engine , Large scale parallel processing and other technologies . It has dozens of database invention patents , The author of 《PostgreSQL Inquiry engine source code technology analysis 》.

Graduated from Huazhong University of science and technology , Like to study the mainstream database architecture and source code .8 Years of experience in database kernel development , Once engaged in distributed database CirroData 、RadonDB and TDengine Kernel Research and development , I am now in the position of StoneDB Kernel architects and StoneDB project PMC.
https://www.bilibili.com/video/BV1U3411F76U
https://www.bilibili.com/video/BV1gS4y1H7NK
https://www.bilibili.com/video/BV19f4y1Z7JB

This article is from WeChat official account. - StoneDB(StoneDB2021).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- Batch convert ppm format pictures to JPG format
- Sliding window -- leetcode solution
- C语言——指针一点通※
- 三、@RequestMapping注解
- Recognized again | saining network security has been listed in the ccsip 2022 panorama of China's network security industry
- 2022河南萌新联赛第(三)场:河南大学 L - 合成游戏
- 创建MySQL数据库的两种方式
- Rman-06031 cannot convert database keywords
- 7、 Restful
- Array sort 2
猜你喜欢

SQL加解密注入详解

Array sort 2

Working principle and application of fast recovery diode

Offline installation of idea plug-in (continuous update)

can 串口 can 232 can 485 串口转CANbus总线网关模块CAN232/485MB转换器CANCOM

Bsdiff and bspatch incremental updates

Spark Structured Streaming HelloWorld

UE4 two ways to obtain player control
![[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (VIII)](/img/a0/b2b0f5fb63301f5b7dd14302aa39e2.png)
[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (VIII)

STM32 development | ad7606 parallel multi-channel data acquisition
随机推荐
STM32开发 | AD7606并行多路采集数据
2022河南萌新联赛第(三)场:河南大学 A - 玉米大炮
滑动窗口——leetcode题解
What are the restrictions on opening futures accounts? Where is the safest place to open an account?
Add watermark to ffmpeg video
FFmpeg 视频添加水印
Face database collection summary
2022 Henan Mengxin League game (3): Henan University L - synthetic game
Compiled by egg serialize JS
Steam science education endows classroom teaching with creativity
egg-ts-sequelize-CLI
自动化测试框架该如何搭建?
C language lseek() function: move the read and write location of the file
Keil V5 installation and use
Postman imports curl, exports curl, and exports corresponding language codes
Creative design principle of youth maker Education
Kubernetes 进阶训练营 调度器
SQL加解密注入详解
Kubernetes advanced training camp scheduler
Have you known several distribution methods of NFT? What are the advantages and disadvantages of different distribution methods?

