当前位置:网站首页>Build spark on yarn environment
Build spark on yarn environment
2022-07-19 02:05:00 【sun_ xo】
1) Build
## download spark-3.2.1.tgz from http://archive.apache.org/dist/
## unpack to ~/work/spark-3.2.1-src
$ cd ~/work/spark-3.2.1-src
$ export MAVEN_OPTS="-Xss64m -Xmx2g -XX:ReservedCodeCacheSize=1g"
$ dev/make-distribution.sh --name without-hadoop \
--pip --tgz -Phive -Phive-thriftserver -Phadoop-provided -Pyarn
$ tar xvf spark-3.2.1-bin-without-hadoop.tgz -C ..
$ cd ..
$ mv spark-3.2.1-bin-without-hadoop spark-3.2.1
## configure
$ cd spark-3.2.1
$ diff -u conf/spark-env.sh.template conf/spark-env.sh
--- conf/spark-env.sh.template 2022-06-24 09:16:18.000000000 +0800
+++ conf/spark-env.sh 2022-06-24 17:52:47.000000000 +0800
@@ -71,3 +71,7 @@
# You might get better performance to enable these options if using native BLAS (see SPARK-21305).
# - MKL_NUM_THREADS=1 Disable multi-threading of Intel MKL
# - OPENBLAS_NUM_THREADS=1 Disable multi-threading of OpenBLAS
+
+JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home
+SPARK_LOCAL_IP=localhost
+SPARK_DIST_CLASSPATH=`hadoop classpath`$ diff -u conf/log4j.properties.template conf/log4j.properties
$ diff -u conf/log4j.properties.template conf/log4j.properties
--- conf/log4j.properties.template 2022-06-24 09:16:18.000000000 +0800
+++ conf/log4j.properties 2022-06-24 16:28:28.000000000 +0800
@@ -16,7 +16,7 @@
#
# Set everything to be logged to the console
-log4j.rootCategory=INFO, console
+log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout## test
$ `hadoop classpath` bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
examples/jars/spark-examples_2.12-3.2.1.jar 10
22/06/24 17:53:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Pi is roughly 3.13873113873113852) Spark on yarn
## configure yarn
$ cd ~/work/hadoop
$ diff -u etc/hadoop/yarn-site.xml.orig etc/hadoop/yarn-site.xml
--- etc/hadoop/yarn-site.xml 2022-05-17 09:20:54.000000000 +0800
+++ /Users/sun_xo/work/hadoop/etc/hadoop/yarn-site.xml 2022-06-23 10:13:52.000000000 +0800
@@ -29,4 +29,17 @@
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
+ <property>
+ <name>yarn.log.server.url</name>
+ <value>http://localhost:19888/jobhistory/logs</value>
+ </property>
+ <!-- close yarn memory check -->
+ <property>
+ <name>yarn.nodemanager.pmem-check-enabled</name>
+ <value>false</value>
+ </property>
+ <property>
+ <name>yarn.nodemanager.vmem-check-enabled</name>
+ <value>false</value>
+ </property>
</configuration>## configure spark
$ diff -u spark-env.sh.template spark-env.sh
--- spark-env.sh.template 2022-06-24 09:16:18.000000000 +0800
+++ spark-env.sh 2022-06-24 18:49:42.000000000 +0800
@@ -71,3 +71,10 @@
# You might get better performance to enable these options if using native BLAS (see SPARK-21305).
# - MKL_NUM_THREADS=1 Disable multi-threading of Intel MKL
# - OPENBLAS_NUM_THREADS=1 Disable multi-threading of OpenBLAS
+
+JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home
+SPARK_LOCAL_IP=localhost
+SPARK_DIST_CLASSPATH=`hadoop classpath`
+HADOOP_CONF_DIR=~/work/hadoop/etc/hadoop
+YARN_CONF_DIR=$HADOOP_CONF_DIR
+SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://localhost:9000/user/spark/logs/ -Dspark.history.fs.cleaner.enabled=true"$ diff -u spark-defaults.conf.template spark-defaults.conf
--- spark-defaults.conf.template 2022-06-24 09:16:18.000000000 +0800
+++ spark-defaults.conf 2022-06-24 16:19:02.000000000 +0800
@@ -25,3 +25,8 @@
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory 5g
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
+
+spark.eventLog.enabled true
+spark.eventLog.dir hdfs://localhost:9000/user/spark/logs
+spark.yarn.historyServer.address localhost:18080
+spark.yarn.jars hdfs://localhost:9000/user/spark/jars/*## create dirs and upload spark jars to HDFS
$ hdfs dfs -mkdir -p /user/spark
$ hdfs dfs -put jars /user/spark
$ hdfs dfs -mkdir -p /user/spark/logs
## restart yarn with JobHistoryServer and spark HistoryServer
$ start-yarn.sh
$ mr-jobhistory-daemon.sh start historyserver
$ sbin/start-history-server.sh
$ jps
5696 SecondaryNameNode
5955 JobHistoryServer
5509 NameNode
5813 ResourceManager
5899 NodeManager
6683 HistoryServer
5597 DataNode
6702 Jps## test
$ cat test.sh
#!/bin/sh
run() {
bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 512m \
--executor-memory 512m \
--num-executors 1 \
--class org.apache.spark.examples.SparkPi \
examples/jars/spark-examples_2.12-3.2.1.jar 10
}
## main ##
run
appid=`grep "APPID" $HADOOP_HOME/logs/yarn*.log | tail -1 | awk 'pirnt $NF'`
appid=${appid#*APPID=}
echo $appid
$HADOOP_HOME/bin/yarn logs -applicationId $appid$ test.sh
22/06/25 09:42:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
application_1656115668743_0003
22/06/25 09:43:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/06/25 09:43:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Container: container_1656115668743_0003_01_000001 on 192.168.124.7_52592
==========================================================================
LogType:stderr
Log Upload Time:Sat Jun 25 09:43:05 +0800 2022
LogLength:379
Log Contents:
22/06/25 09:42:54 WARN Utils: Your hostname, sun-xo.local resolves to a loopback address: 127.0.0.1; using 192.168.124.7 instead (on interface en0)
22/06/25 09:42:54 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/06/25 09:42:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
End of LogType:stderr
LogType:stdout
Log Upload Time:Sat Jun 25 09:43:05 +0800 2022
LogLength:33
Log Contents:
Pi is roughly 3.1423911423911424
End of LogType:stdout
Container: container_1656115668743_0003_01_000002 on 192.168.124.7_52592
==========================================================================
LogType:stderr
Log Upload Time:Sat Jun 25 09:43:05 +0800 2022
LogLength:379
Log Contents:
22/06/25 09:43:00 WARN Utils: Your hostname, sun-xo.local resolves to a loopback address: 127.0.0.1; using 192.168.124.7 instead (on interface en0)
22/06/25 09:43:00 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/06/25 09:43:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
End of LogType:stderr
LogType:stdout
Log Upload Time:Sat Jun 25 09:43:05 +0800 2022
LogLength:0
Log Contents:
End of LogType:stdoutActually the output of program is "Pi is roughly 3.1423911423911424"
or you can see same result from http://localhost:8088/cluster -> appid -> logs
reference Overview - Spark 3.2.1 Documentation
边栏推荐
- Suivi du mode de méthode de l'usine
- Determine whether two arrays are exactly equal
- 博客里《DSAA》相关文章的代码
- [vernacular analog 1] PN junction and diode
- bag of visual words(BoVW)视觉词袋 个人理解
- Mxnet network model (IV) Gan neural network
- gdb+vscode进行调试7——程序出现segmentation default/段错误,如何进行调试?
- Valgrind detailed tutorial (1) MemCheck
- On the properties and methods of list < t >
- Oozie 集成 Sqoop
猜你喜欢
Mxnet network model (IV) Gan neural network

03 design of urban road dedusting and cooling system based on ZigBee

二阶边缘检测 - Laplacian of Guassian 高斯拉普拉斯算子

禁止自作聪明的Safari打开网页时自动播放

How to understand volatile and how to use it

Yolov5训练建议

MATLAB :Warning: the font “Times” is not available

递推与递归学习笔记

Opengauss Developer Day 2022 dongfangtong sincerely invites you to visit the "dongfangtong ecological tools sub forum"

One vs One Mitigation of Intersectional Bias
随机推荐
Fisher线性判别分析Fisher Linear Distrimination
Oozie 集成 Shell
vscode+ros2环境配置
Windbos download and install openssh
[cute new problem solving] sum of three numbers
Integrated learning
Ace download address
Pointer constant and constant pointer love and hate
Recursive and recursive learning notes
集成学习
随机森林的理解
Startup mode of activity
频率派和贝叶斯派
Saber Pspice simulink电源仿真软件的区别
动手学深度学习---深度学习计算篇
PCA主成分分析(降维)过程推导
Fairness in Deep Learning: A Computational Perspective
The following packages have unmet dependencies: deepin. com. wechat:i386 : Depends: deepin-wine:i386
Powerful chart component library scottplot
Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination