当前位置:网站首页>[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (VIII)
[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (VIII)
2022-07-26 04:31:00 【Big data Institute】
Continuous sharing is useful 、 valuable 、 Selected high-quality big data interview questions
We are committed to building the most comprehensive big data interview topic database in the whole network

71、Zookeeper What are the typical application scenarios of ?
Refer to the answer :
Zookeeper It's a typical release / The distributed data management and coordination framework of subscription mode , Developers can use it to publish and subscribe distributed data .
Through to Zookeeper The rich data nodes in are used in cross , coordination Watcher Event notification mechanism , It is very convenient to build a series of core functions that will be involved in distributed applications , Such as :
1. Data Publishing / subscribe
2. Load balancing
3. Naming service
4. Distributed coordination / notice
5. Cluster management
6.Master The election
7. Distributed lock
8. Distributed queues
72、ZooKeeper What is it? ?
Refer to the answer :
ZooKeeper Is an open source distributed coordination service , It's the manager of the cluster , Monitor the status of each node in the cluster, and make the next reasonable operation according to the feedback submitted by the nodes . Final , Will be simple to use interface and efficient performance 、 A stable system is provided to the user .
Distributed applications can be based on Zookeeper Implementation such as data publishing / subscribe 、 Load balancing 、 Naming service 、 Distributed coordination / notice 、 Cluster management 、Master The election 、 Distributed lock and distributed queue .
73、Zookeeper What are the characteristics ?
Refer to the answer :
Uniformity 、 Atomicity 、 A single view 、 reliability 、 The real time .
74、Zookeeper What is the mechanism of cluster ?
Refer to the answer :
Half the mechanism : More than half of the machines in the cluster survive , Clusters are available .
75、Zookeeper Server-side processing Watcher Realization
Refer to the answer :
1. The client registers with the server watcher, Server receive Watcher And store .
2.Watcher Trigger .
3. call process Method to trigger Watcher.
76、 One hadoop colony ,hdfs Replications for 3, Deposit at this time 1G data . And then modify hdfs The configuration file , Set the copy to 2, And then restart hadoop colony , Then deposit 1G data . Excuse me at this time hdfs How much data is there in the cluster ?
Refer to the answer :
First deposit 1G data :1G*3=3G Second deposit 1G data :1G*2=2G( After the configuration is modified and restarted , Only valid for subsequent access data , If you want to change the copy of the stored data, you need to modify it through the command line ) Total data size :3G+2G=5G
77、HDFS Metadata will exist in NameNode The memory of the , therefore NameNode The memory size directly determines the maximum capacity supported by the cluster , So how to estimate NameNode Required memory size ? For example, one contains 200 Cluster of nodes , Each node has 24TB Of disks , Every Block The size is 128MB, Each piece has 3 Copies , So, please. , In this scenario, you need NameNode What is the memory size of ?
Prerequisite : commonly 1GB Memory can be managed 100 m block file
Refer to the answer :
First calculate the number of block files ;200*25165824MB(24TB)/128*3=13107200 commonly 1GB Memory can be managed 100 Ten thousand block file According to this method, it will take 13.1072GB Of memory . In addition, there are 10000 based on sex block file , So in choosing NameNode Select a reasonable integer value greater than this value when memory
78、 sketch namenode Of HA Mechanism . How did he achieve failover ?
Refer to the answer :
1. adopt QJM solve NameNode Metadata shared storage problem
NameNode Recorded HDFS Metadata such as directory files , Every time the client adds, deletes or modifies a file ,Namenode Will record a log , be called editlog, Metadata is stored in fsimage in . In order to maintain Stadnby And active In the same state ,standby You need to get every message in real time as much as possible editlog journal , And applied to FsImage in . At this time, you need a shared storage editlog,standby Can get logs in real time .
There are two key points to ensure :
1) Shared storage is highly available .
2) Two... Need to be prevented NameNode Writing data to shared storage at the same time results in data corruption .
The common way of shared storage is Qurom Journal Manager,QJM It can be considered to include some JournalNode The cluster of ,JournalNode Running on different machines , Every JournalNode Is a very lightweight daemon , So it can be deployed in hadoop On the nodes of the cluster ,QJM There must be at least 3 individual JournalNode, because edit log It has to be written JournalNodes In most nodes , Like running 3,5,7 individual JournalNode, If you run N individual JournalNode, Then the system can tolerate at most (N-1)/2 Nodes failed .

Shared storage implementation logic :
1) After the initialization ,Active NN hold editlog Write about most JN And return to success ( Greater than or equal to N+1) That is, it is deemed to be successful .
2)Standby NN On a regular basis from JN Read a batch of editlog, And applied to memory FsImage in .
3)NameNode Every time Editlog All need to pass a number Epoch to JN,JN Will compare Epoch, If it's better than what you saved Epoch Big or the same , You can write ,JN Update your own Epoch Up to date , Otherwise, reject the operation . When switching ,Standby Convert to Active when , Will be able to Epoch+1, This prevents even the previous NameNode towards JN Write the log , Even writing will fail .
2. utilize Zookeeper Realization NameNode Fail over

3. HDFS2 NN Active / standby switching process of

79、 stay hive Where is your metadata information stored in the database ?
Refer to the answer :
- hive The default built-in metabase is derby database .
- We use mysql database .
80、 sketch HDFS Architecture ?
Refer to the answer :
HDFS Have the Lord / From architecture . One HDFS The cluster contains a NameNode( A master server ), Used to manage file system namespace and manage client access to files . Besides , Many more DataNode, Usually one for each node in the cluster DataNode, For data storage .HDFS Expose the file system namespace and allow user data to be stored in files . In the internal , The file is divided into one or more blocks , These blocks are stored in a set of DataNode in .NameNode Perform file system namespace operations , Open as , Close and rename files and directories . It also determines the block to DataNode Mapping .DataNode Responsible for providing read and write requests from file system clients .DataNode still NameNode Execute block creation under the instruction of , Delete , Copy .

Continuous sharing is useful 、 valuable 、 Selected high-quality big data interview questions
We are committed to building the most comprehensive big data interview topic database in the whole network
边栏推荐
猜你喜欢

P-norm (2-norm is Euclidean norm)

UE4 靠近物体时显示文字,远离时文字消失

第三篇如何使用SourceTree提交代码

理性认知教育机器人寓教于乐的辅助作用

Design and implementation of smart campus applet based on cloud development

数据仓库

Support proxy direct connection to Oracle database, jumpserver fortress v2.24.0 release

Keil V5 installation and use

Steam science education endows classroom teaching with creativity

综合评价与决策方法
随机推荐
数组排序1
数据仓库
【UOJ 429】串串划分(Runs)(容斥)+ 有关 Lyndon Tree 及其应用的小小记录
2022杭电多校第二场 A.Static Query on Tree(树剖)
七、RESTful
Wu Enda's machine learning after class exercises - linear regression
[learning notes] agc041
十、拦截器
Acwing_ 12. Find a specific solution for the knapsack problem_ dp
How to make your language academic when writing a thesis? Just remember four sentences!
MySQL log classification: error log, binary log, query log, slow query log
UE4 displays text when it is close to the object, and disappears when it is far away
UE4 two ways to obtain player control
Life related -- the heartfelt words of a graduate tutor of Huake (mainly applicable to science and Engineering)
UE4 键盘控制开关灯
ASP. Net core actionfilter filter details
Threadpooltaskexecutor and ThreadPoolExecutor
RTSP/Onvif协议视频平台EasyNVR服务一键升级功能的使用教程
idea插件离线安装(持续更新)
九、文件上传和下载