当前位置：网站首页>Flink introduction to practice - phase II (illustrated runtime architecture)

Flink introduction to practice - phase II (illustrated runtime architecture)

2022-07-19 07:45:00 【Top master cultivation plan】

System architecture

Submit the workflow

Advanced Abstract Perspective

Independent mode

Yarn colony

Conversational mode

1. First for yarn Apply for one JobManager

2.JobManager Processing tasks

Single operation mode

Data flow chart

be-all Flink Procedures can be summarized into three parts ：Source、Transformation and Sink.

Source Express “ The source operator ”, Responsible for reading data source .

Transformation Express “ Conversion operator ”, Processing with various operators .

Sink Express “ Sinking operator ”, Responsible for the output of data .

Parallelism

flow chart

experiment

public class FlinkSoctet {
    public static void main(String[] args) throws Exception {
        // Get the execution environment 
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStreamSource<String> initData = env.socketTextStream("master",9997);

        SingleOutputStreamOperator<Tuple2<String, Integer>> map = initData.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public void flatMap(String item, Collector<String> out) throws Exception {
                String[] resItem = item.split(" ");
                for (String s : resItem) {
                    out.collect(s);
                }
            }
        }).setParallelism(1).map(new MapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public Tuple2<String, Integer> map(String item) throws Exception {
                return Tuple2.of(item, 1);
            }
        }).setParallelism(2);

        // For the stream data of the resulting tuple , Group aggregation 
        map.keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
            @Override
            public String getKey(Tuple2<String, Integer> value) throws Exception {
                return value.f0;
            }
        }).sum(1).setParallelism(3).print();

        // Because it is a stream handler , So here we need to constantly implement 
        env.execute();
    }
}

The result

Conclusion

It can be seen that Flink Parallelism and Spark The concept of repartition is very similar to

Chain of operators

experiment

public class FlinkSoctet {
    public static void main(String[] args) throws Exception {
        // Get the execution environment 
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStreamSource<String> initData = env.socketTextStream("master",9997);
        env.setParallelism(1);

        SingleOutputStreamOperator<Tuple2<String, Integer>> map = initData.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public void flatMap(String item, Collector<String> out) throws Exception {
                String[] resItem = item.split(" ");
                for (String s : resItem) {
                    out.collect(s);
                }
            }
        }).map(new MapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public Tuple2<String, Integer> map(String item) throws Exception {
                return Tuple2.of(item, 1);
            }
        });

        // For the stream data of the resulting tuple , Group aggregation 
        map.keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
            @Override
            public String getKey(Tuple2<String, Integer> value) throws Exception {
                return value.f0;
            }
        }).sum(1).print();

        // Because it is a stream handler , So here we need to constantly implement 
        env.execute();
    }
}

result

Conclusion

You can see the operators of one-to-one relationship. They execute in a task
If it's one to many , Then it is to separate two tasks
This and Spark Inside Shuffle Split Stage It's like

Execution diagram

In the process of transformation , There are several different stages , Different levels of graphs will be generated , One of the most important is the job map

（JobGraph） And execution chart （ExecutionGraph）.Flink Diagram of task scheduling execution in , According to the generation order, it can be divided into

four layers ：

Logic flow diagram （StreamGraph）→ Job map （JobGraph）→ Execution diagram （ExecutionGraph）→ Physics

chart （Physical Graph）.

Mission （Tasks） And task slots （Task Slots）

theory

Task slot （Task Slots）

As mentioned before ,Flink Every one of them worker( That is to say TaskManager) It's all one JVM process , It can start multiple independent threads , To execute multiple subtasks in parallel （subtask）. So if you want to execute 5 A mission , You don't have to 5 individual TaskManager, We can get TaskManager Multithreading tasks . If you can run at the same time 5 Threads , Then just one TaskManager Can satisfy our

The operation requirements of the previous program .

Task to task slot sharing

The following is an example of allocation Relationship between task slot and parallelism

experiment

Preparation

The current situation is 3 individual task slots, If we set the parallelism to 5 What's the situation

Experimental code

public class FlinkSoctet {
    public static void main(String[] args) throws Exception {
        // Get the execution environment 
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStreamSource<String> initData = env.socketTextStream("master",9997);
        env.setParallelism(5);

        SingleOutputStreamOperator<Tuple2<String, Integer>> map = initData.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public void flatMap(String item, Collector<String> out) throws Exception {
                String[] resItem = item.split(" ");
                for (String s : resItem) {
                    out.collect(s);
                }
            }
        }).map(new MapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public Tuple2<String, Integer> map(String item) throws Exception {
                return Tuple2.of(item, 1);
            }
        });

        // For the stream data of the resulting tuple , Group aggregation 
        map.keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
            @Override
            public String getKey(Tuple2<String, Integer> value) throws Exception {
                return value.f0;
            }
        }).sum(1).print();

        // Because it is a stream handler , So here we need to constantly implement 
        env.execute();
    }
}

result

Report errors

Task scheduling

According to your own requirements , To distribute reasonably task In different taskslots, Understanding can , because taskslots Sharing is an optimization of task execution

原网站

版权声明
本文为[Top master cultivation plan]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/200/202207170521047864.html