当前位置:网站首页>SparkCore核心设计:RDD,220716,
SparkCore核心设计:RDD,220716,
2022-07-17 01:45:00 【啊六六六】

port Already in user
链式编程,



没有这个功能:建议安装专业版,问我要安装包
upload
记住:Windows中修改了,一定要同步一下
分布式集群中运行,打印也在集群中
去log看到得到
集群用于测试集群环境中运行有没有问题
RDD函数的操作:Executor
打印在Executor的运行日志中
Executor运行Worker节点上
打印在worker日志,
1-Driver中打印的,我们是能看到的
2-Executor中打印的,我们是看不到的(18080,stdout,能看到)
那所有的本地模式输出都可以都可以看到结果吗 不是说本地模式只有一个driver
RDD.算子打印
print:Driver中运行的
算子由Task来调用执行,Task运行Executor中,Executor运行在Worker节点上
sbin:集群管理命令
bin:客户端命令
需求:使用客户端命令将程序提交给集群运行
pyspark:python命令行客户端
spark-sql/beeline:提交SQL客户端你
spark-submit:提交python文件客户端
argv

每个程序只有1个Driver
kill,status可以在监控界面,
0-命令,1-选项,2-文件,3-参数

1-Master,2-Worker 3-Client
Driver进程运行在哪?
![]()
为什么Driver中的打印会显示呢?

本地模式,18080能看到,8080看不到,
0-命令,1-选项,2-文件,3-参数

hdfs://node1:8020/export/data/pyspark_core_word_args.py
YARN界面中Spark程序可以直接跳转18080
YARN界面中MR程序的直接跳转到19888
yarn提交,8032 ,
会根据从节点的资源来启动,能启动多少就是多少

driver运行的位置

只有drive分task,
向drive反向注册,

appmaster只能在从节点,
开发:对数据进行分析处理
python是单节点,

dataframe:数据表,数据+表的结构

对RDD的转换操作,本质上是RDD所有分区并行操作

文件是逻辑,物理上存储的块

3个分区3个task,
全局分组要shuffle,

tonight,review,
Spark程序集群模式运行时会启动两种进程:Driver驱动进程 + Executor计算进程,每种进程运行时都需要资源

ResourceManager
NodeManager
AppMaster
Container
MapTask和ReduceTask

preview

注意:PySpark中在本地模式使用wholeTextFiles有Bug,会导致单进程内存不足,集群环境可以正常使用
边栏推荐
- Oracle queries the host name and the corresponding IP address
- Latest installation tutorial of VMware Tools (rhel8)
- 論文閱讀:U-Net++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation
- Rewrite equals why rewrite hashcode
- Oracle queries the maximum partition of non self growing partition
- Dive Into Deep Learning——2.2数据预处理
- Win10 onedrive failure reinstallation
- 本地存储localStorage⽤法详解
- Chengxin University envi_ IDL first week experiment test: simple operation of array + detailed analysis
- [MySQL] MHA high availability
猜你喜欢

基于Pandoc与VSCode的 LaTeX环境配置

Paper reading: u-net++: redesigning skip connections to exploit multiscale features in image segmentation

Wechat applet -- Summary of problems in the actual development of taro framework

Number of supported question banks and examination question banks of swiftui examination question bank project (tutorial includes source code)

ResNet

Reptile learning (5): teach you reptile requests practice hand in hand

web语义化(强调标签-em-斜体)(重点强调标签-strong-粗体)(自定义列表:dl、dt、dd)

波士顿房价分析作业总结

通过OpenHarmony兼容性测评,大师兄开发板与丰富教培资源已ready

Bisenetv2 face segmentation
随机推荐
My most productive easypypi once again has been updated! V1.4.0 release
By voting for the destruction of STI by Dao, seektiger is truly community driven
options has an unknown property ‘before‘
GNOME-BOXES虚拟机创建安装
Dive into deep learning - 2.2 data preprocessing
Install Net prompt "cannot establish a certificate chain to trust the root authority" (simple method with download address)
ES6 learning notes - brother Ma at station B
Wechat applet -- Summary of problems in the actual development of taro framework
GoogLeNet
Envi: (the most detailed tutorial in 2022) custom coordinate system
MySQL optimized index
2.9.2 Ext JS的数字类型处理及便捷方法
KubeCon + CloudNativeCon Europe 2022
mysqldump: [Warning] Using a password on the command line interface can be insecure.
Game theory of catching lice
Leetcode: subsequence problem in dynamic programming
Yolov6 learning first chapter
Chengxin University envi_ The second week of IDL experiment content: extract aod+ in all MODIS aerosol products for detailed analysis
Leetcode: dynamic programming [basic problem solving]
Transaction and storage engine in MySQL database
