当前位置:网站首页>Spark Structured Streaming HelloWorld
Spark Structured Streaming HelloWorld
2022-07-26 04:29:00 【Actually, I have a true disposition】
Spark Structured Streaming HelloWorld
Preface
Spark Structured Streaming+Kafka+Hbase Scala Version tutorial , The whole entrance .
Text
1.Spark Version selection
Select the corresponding version of your server ; Document address :
https://spark.apache.org/docs/
This address is opened with the version number , Choose your own environment Spark That's all right. ;
So here I'm going to use theta 2.4.5; The latest version of the document is 3.3.3
2. The official example
After entering the corresponding version, you can find Spark The main function of , Here's the picture
Spark Streaming It has been clearly marked as old API 了 , new API Namely Structured Streaming, The picture is circled in red , So what I am currently using is new API.Structured Streaming
HelloWorld Code
The official one is simple word count Example
// Create DataFrame representing the stream of input lines from connection to localhost:9999
val lines = spark.readStream
.format("socket")
.option("host", "localhost")
.option("port", 9999)
.load()
// Split the lines into words
val words = lines.as[String].flatMap(_.split(" "))
// Generate running word count
val wordCounts = words.groupBy("value").count()
Batch code example
The official example , Here is my understanding ,streamingDF Is the data of a batch ;foreachBatch Is to cycle each batch ; The data in the batch is batchDF, Print batch number batchId You can see that this batch number is a self increasing number ;
streamingDF.writeStream.foreachBatch {
(batchDF: DataFrame, batchId: Long) =>
// This line is for caching , In this way, subsequent operations will not repeat the previous transform Operation
batchDF.persist()
// Operate the data in a batch , The specific basis is what operation is written differently
batchDF.write.format(...).save(...) // location 1
batchDF.write.format(...).save(...) // location 2
// The cache must be released after completion
batchDF.unpersist()
}
边栏推荐
猜你喜欢

Use Baidu PaddlePaddle easydl to complete garbage classification

Recommendation | scholar's art and Tao: writing papers is a skill

How to download the supplementary literature?

idea插件离线安装(持续更新)

Which websites can I visit to check the latest medical literature?

数组排序3

MATLAB绘图

人脸数据库收集总结

How does win11 set the theme color of the status bar? Win11 method of setting theme color of status bar

Optimization analysis and efficiency execution of MySQL
随机推荐
Steam science education endows classroom teaching with creativity
TIA botu WinCC Pro controls the display and hiding of layers through scripts
Tutorial on using the one click upgrade function of the rtsp/onvif protocol video platform easynvr service
UE4 靠近物体时显示文字,远离时文字消失
Phaser(一):平台跳跃收集游戏
Use of anonymous functions
生活相关——一个华科研究生导师的肺腑之言(主要适用于理工科)
自动化测试框架该如何搭建?
Keil V5 installation and use
Sweet butter
数组排序3
机器学习之桑基图(用于用户行为分析)
10、 Interceptor
低成本、快速、高效搭建数字藏品APP、H5系统,扇贝科技专业开发更放心!
Segger embedded studio cannot find xxx.c or xxx.h file
1. Excel的IF函数
p-范数(2-范数 即 欧几里得范数)
Js手写函数之节流防抖函数
Recommendation | scholar's art and Tao: writing papers is a skill
Integrated architecture of performance and cost: modular architecture