当前位置：网站首页>Dhfs read / write process

Dhfs read / write process

2022-07-19 01:45:00 【Hyf 。】

Catalog

One 、HDFS Write data flow

Two 、 Node distance calculation

3、 ... and 、 Replica node selection

Four 、HDFS Read data flow

23 God

Interview focus

One 、HDFS Write data flow

flow chart ：（ The picture comes from Shang Silicon Valley ）

Flow chart analysis :

1、 Client pass Distributed FileSystem Module to NameNode Request file upload ,NameNode Check if the destination file already exists , Does the parent directory exist .

2、NameNode Returns whether you can upload .

3、 The client requests the first one Block Which ones to upload DataNode Server .

4、NameNode return 3 individual DatatNode, Respectively dn1、dn2、dn3.

5、 Client pass FSDataOutputStream Module request dn1 Upload data ,dn1 Upon receipt of the request, the call continues dn2, then dn2 call dn3, Set up the communication channel .

6、dn1、dn2、dn3 Step by step reply client .

7、 The client starts going dn1 Upload the first one Block（ The data is first read from disk and put into a local memory cache ）, With Packet In units of ,dn1 Receive a Packet Will be passed on to dn2,dn2 Pass to dn3;dn1 Each one packet A reply queue is placed waiting for the reply .

8、 When one Block Once the transmission is complete , The client requests again NameNode Upload the second Block Service for device .（ repeat 3-7 Step ）.

Two 、 Node distance calculation

Node distance : The sum of the distances of two nodes to the nearest common ancestor .

Read data flow chart ：（ From Shang Silicon Valley ）

3、 ... and 、 Replica node selection

1、 The first copy is in Client On the same node . If the client is outside the cluster , Choose one at random .

2、 The second copy is at a random node in another rack

3、 The third replica is at a random node in the rack where the second replica is located .

Reason for copy selection ： The node is closest , The fastest upload speed ; Ensure the reliability of the data ;

Four 、HDFS Read data flow

The following figure is from shangsilicon Valley

If too much data is read, other nodes will be accessed , Let other nodes read data

1、 Client pass DistributedFileSystem towards NameNode Request file download .NameNode By querying metadata . Locate the file block DataNode Address .

2、 Choose a DataNode（ Follow the principle of proximity , Then a random ） The server , Request read data .

3、DataNode Start transferring data to client （ Reads the data input stream from the disk , With packet I'm going to do the check for units ）.

4、 The client to packet Is unit reception , Cache locally first , Then write to the target file .

原网站

版权声明
本文为[Hyf 。]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/200/202207170004532033.html

当前位置：网站首页>Dhfs read / write process

Dhfs read / write process

One 、HDFS Write data flow

Two 、 Node distance calculation

3、 ... and 、 Replica node selection

边栏推荐

猜你喜欢

随机推荐