当前位置:网站首页>Dpdk flow filter summary (flow director/ rte_flow)
Dpdk flow filter summary (flow director/ rte_flow)
2022-07-18 13:37:00 【Lazy little】
Why use this function of network card can refer to dpvs This article of https://www.jianshu.com/p/e41dfe7c9a6c
Read the above article first to better understand this article ! This article borrows the pictures from the above article
In this paper, the cpu Representing one dpvs The worker thread (worker)
intel Network card pair flow filter You need to check relevant data for support https://www.intel.com/content/www/us/en/support/articles/000031907/ethernet-products/700-series-controllers-up-to-40gbe.html
One 、dpvs analysis
Our focus is on ip + port add to flow The way , If you want to ip Of 65535 Ports ( remove 1024 The following ports ) Distribute equally to different cpu Of worker,dpvs The following methods are used :
Two arm mode

flow filter It is to make the incoming and outgoing packets of the same data flow the same cpu(worker) Handle , In order to reduce miss cache Or the use of locks .
dpvs The usage scenario of is full-nat , Actually snat Scenes can also be used flow filter Function to improve packet forwarding efficiency . This article is still based on full-nat For example, analyze .
client ip:100.100.1.2
dpvs wan mouth ip:100.100.1.1
dpvs local ip:192.168.1.2
server ip:192.168.1.1
- client send out tcp Packet to dpvs:
100.100.1.2:1111->100.100.1.1:80
- dpvs Modify the data package sip by 192.168.1.2 sport by 2222 , modify dip 192.168.1.1 , dport 80 Then forward it to server:
192.168.1.2:2222->192.168.1.1:80
- server Reply to dpvs
192.168.1.1:80->192.168.1.2:2222
- dpvs modify sip by 100.100.1.1 sport 80,dip 100.100.1.2 dport 1111, Then reply to client
100.100.1.1:80->100.100.1.2:1111
from client To dpvs The data flow of is generally based on rss Evenly distribute to each queue of the network card ,dpvs flow filter The core function is to allocate a local ip And the port , The focus is on ports !dpvs The way to do this is to add each local ip when , Put this local ip Of 65535 Ports ( remove 1024 Ports within ) according to cpu The number of (worker Number ) Distribute equally to each cpu, And the corresponding flow Send it to the network card , When establishing the internal session corresponding to the external session , Select this cpu With the ip port resources . This satisfies that all incoming and outgoing data flows are in the same cpu 了 . that dpvs How to make a local ip Ports are evenly distributed ? Look at the picture below

This picture shows :8 individual cpu The core corresponds to the network card 8 A queue ,8 strip flow
0cpu flow0:port_base 0,port_mask 0x7
1cpu flow1:port_base 1,port_mask 0x7
2cpu flow2:port_base 2,port_mask 0x7
3cpu flow3:port_base 3,port_mask 0x7
4cpu flow4:port_base 4,port_mask 0x7
5cpu flow5:port_base 5,port_mask 0x7
6cpu flow6:port_base 6,port_mask 0x7
7cpu flow7:port_base 7,port_mask 0x7
int netif_sapool_flow_add(struct netif_port *dev, lcoreid_t cid,
int af, const union inet_addr *addr,
__be16 port_base, __be16 port_mask,
netif_flow_handler_param_t *flows)
{
int err, ret = EDPVS_OK, nflows = 0;
char ipbuf[64];
struct rte_flow_attr attr = {
.group = NETIF_FLOW_GROUP,
.priority = NETIF_FLOW_PRIO_SAPOOL,
.ingress = 1,
.egress = 0,
//.transfer = 0,
};
struct rte_flow_item pattern[SAPOOL_PATTERN_NUM];
struct rte_flow_action action[SAPOOL_ACTION_NUM];
netif_flow_handler_param_t resp;
struct rte_flow_item_ipv4 ip_spec, ip_mask;
struct rte_flow_item_ipv6 ip6_spec, ip6_mask;
struct rte_flow_item_tcp tcp_spec, tcp_mask;
struct rte_flow_item_udp udp_spec, udp_mask;
queueid_t queue_id;
struct rte_flow_action_queue queue;
if (unlikely(!dev || !addr || !flows))
return EDPVS_INVAL;
if (unlikely(flows->size < 4 || !flows->handlers))
return EDPVS_INVAL;
memset(pattern, 0, sizeof(pattern));
memset(action, 0, sizeof(action));
/* create action stack */
err = netif_get_queue(dev, cid, &queue_id);
if (unlikely(err != EDPVS_OK))
return err;
queue.index = queue_id;
action[0].type = RTE_FLOW_ACTION_TYPE_QUEUE;
action[0].conf = &queue;
action[1].type = RTE_FLOW_ACTION_TYPE_END;
/* create pattern stack */
pattern[0].type = RTE_FLOW_ITEM_TYPE_ETH;
if (af == AF_INET) {
memset(&ip_spec, 0, sizeof(struct rte_flow_item_ipv4));
memset(&ip_mask, 0, sizeof(struct rte_flow_item_ipv4));
ip_spec.hdr.dst_addr = addr->in.s_addr;
ip_mask.hdr.dst_addr = htonl(0xffffffff);
pattern[1].type = RTE_FLOW_ITEM_TYPE_IPV4;
pattern[1].spec = &ip_spec;
pattern[1].mask = &ip_mask;
} else if (af == AF_INET6) {
memset(&ip6_spec, 0, sizeof(struct rte_flow_item_ipv6));
memset(&ip6_mask, 0, sizeof(struct rte_flow_item_ipv6));
memcpy(&ip6_spec.hdr.dst_addr, &addr->in6, sizeof(ip6_spec.hdr.dst_addr));
memset(&ip6_mask.hdr.dst_addr, 0xff, sizeof(ip6_mask.hdr.dst_addr));
pattern[1].type = RTE_FLOW_ITEM_TYPE_IPV6;
pattern[1].spec = &ip6_spec;
pattern[1].mask = &ip6_mask;
} else {
return EDPVS_INVAL;
}
memset(&tcp_spec, 0, sizeof(struct rte_flow_item_tcp));
memset(&tcp_mask, 0, sizeof(struct rte_flow_item_tcp));
tcp_spec.hdr.dst_port = port_base;
tcp_mask.hdr.dst_port = port_mask;
pattern[2].type = RTE_FLOW_ITEM_TYPE_TCP;
pattern[2].spec = &tcp_spec;
pattern[2].mask = &tcp_mask;
pattern[3].type = RTE_FLOW_ITEM_TYPE_END;
......
}To understand the above code segment, you need to understand https://doc.dpdk.org/guides-20.05/prog_guide/rte_flow.html
Here we only focus on tcp Of port_base ,port_mask,rte_flow It can match exactly tcp/udp Some port , It can also be set mask Match a range of ports .
dpvs Adding local ip It will be in local ip Where lan network card (nic1) Add the one shown above 8 strip flow. About dpvs Every cpu Assign each local ip Port algorithm reference dpvs sa_pool_create function . The picture above shows clearly , Every cpu It doesn't have a continuous port range , But according to fdir->mask && ((uint16_t)port & fdir->mask) != ntohs(fdir->port_base) This algorithm will port according to cpu The number of is divided into n Port segments , Each port segment has cpu Several ports are allocated to each cpu, For example, above cpu yes 8, Each port segment has 8 Ports .
rte_flow tcp The use of ports and port masks is similar to ip and ip Use of mask .
Analysis above dpvs flow filter There are two parts to focus on , The first is to understand each cpu Use each local Which ports of ( Allocation algorithm ), The second is to understand flow How to send it to the network card , And the use of masks .
Two 、vpp flow filter Realization
vpp Single arm mode ( The following 4 individual cpu(worker) For example , Network card is intel E810 series )

vpp The usage scenario is snat,client stay vpp Inside ,server stay vpp external .
client ip:192.168.1.2
vpp wan mouth ip:100.100.1.1
server ip:100.100.1.2
- client send out tcp Packet to vpp:
192.168.1.2:1111->100.100.1.2:80
- vpp Modify the data package sip by 100.100.1.1 sport by 2222 Then forward it to server:
100.100.1.1:2222->100.100.1.2:80
- server Reply to vpp
100.100.1.2:80->100.100.1.1:2222
- vpp modify sip by dip 192.168.1.2 dport 1111, Then reply to client
100.100.1.1:80->192.168.1.2:1111
In this scenario, we need to put server Reply to vpp The packet of is assigned to rx A queue of , This line and client Send to vpp Streaming rx The queue must belong to the same cpu.
vpp It realizes a set of snat wan ip Port and worker The allocation algorithm of ( Remove from the algorithm 1024 Previous ports , Ignore here ), Allocate and dpvs The difference is that each worker What you get is a ip A continuous port . With 4 individual worker For example :
worker0 :0-16383
worker1:16384-32767
worker2:32768-49151
worker3:49152-65535
In theory, each wan ip We need to issue to the network card 4 Bar flow table .
worker0 flow0: port_base 0,port_mask 0xC000
worker1 flow1: port_base 16384,port_mask 0xC000
worker2 flow2: port_base 32768,port_mask 0xC000
worker3 flow3: port_base 49152,port_mask 0xC000
however wan ip Need to be excluded 1024 And previous ports , Then we need to flow0 To break up :

The port range is 0-65535,16bit, The port mask is the same as the port .
Because there is 4 individual worker, Then mask 14,15 Two bit Is used to 65535 The average is divided into 4 Share .1024 Need to occupy 10 individual bit(0-9), Splitting can only use 10-13 This 4 individual bit.
Because after the port and mask 10 Stay still , The first two are fixed , We only study the middle 4bit( The blue part )
worker0 :0-16383
001xxx 8192/0xE000
0001xx 4096/0xF000
00001x 2048/0xF800
000001 1024/0xFc00
This eliminates 1024 Ports within ,1024 The port within is not within the scope of sending streams , The default is to use the network card rss.2 individual worker,8 individual worker And so on .
The flow table finally issued is as follows :
worker0 flow0: port_base 8192,port_mask 0xE000
flow1: port_base 4096,port_mask 0xF000
flow2: port_base 2048,port_mask 0xF800
flow3: port_base 1024,port_mask 0xC000
worker1 flow4: port_base 16384,port_mask 0xFc00
worker2 flow5: port_base 32768,port_mask 0xC000
worker3 flow6: port_base 49152,port_mask 0xC000
The realization of the above functions is based on the fact that the network card can support flow filter function (flow director)
The test found the latest dpdk-21.11 example/flow_iltering In the network card E810 Port masks are supported in , But in dpdk-20.05 But it doesn't support , The final discovery is ice Driven issues ,ice Of flow filter There are three engine(hash,fd,switch), Find out 20.05 and 21.11 Is the difference between the 21.11 It's using switch engine,20.05 With fd engine.
边栏推荐
- Sword finger offer 18 Delete the node of the linked list
- 『HarmonyOS』探索HarmonyOS应用
- ES (6/7/8/9/10/11) notes
- 静态路由技术
- [RT thread] NXP rt10xx device driver framework -- UART construction and use
- 【深度学习】《动手学深度学习》环境配置
- Mysql5.7 create user error: error 1364 (HY000): field 'SSL_ Cipher 'doesn't have a default value solution
- 为什么 Qt Creator 的编译如此之慢?
- MySQL 窗函数 流动平均数 running average
- Problems encountered in deploying edusoho on the Intranet
猜你喜欢

EXCEL,选择数据如何选择合适的图表?

The second day of MATLAB learning (basic grammar, variables, commands and creating your own files)
![[TinyML]APQ:Joint Search for Network Architecture, Pruning and Quantization Policy](/img/6f/5f0e16ae3ddaa45b3c2c0e81c50470.png)
[TinyML]APQ:Joint Search for Network Architecture, Pruning and Quantization Policy

After Jay Chou's co branded model, Fang Wenshan brought online shopping to promote the collection of "Hualiu" top-level co branded dolls

How to restore the files deleted from the U disk
![[machine learning] decision tree](/img/ae/7dac0bddc7f55ecabe49ab5ef4b429.png)
[machine learning] decision tree

Time consuming evaluation of image pixel values accessed by opencv, emgucv and opencvsharp pointers (with source code)

How to choose a desktop multimeter?

内网部署EDUSOHO遇到的问题

【独立站运营】一分钟读懂跨境电商运营四大技巧!
随机推荐
Tools to measure the gap between two distributions: cross entropy and KL divergence
July 2022 information theory Wu Jun
Sword finger offer 04 Search in two-dimensional array
Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing
TCP/IP之常用协议
山东省中小企业数字化转型论坛成功举办,九州云赋能中小企业数智升级
求水仙花数
[TinyML]APQ:Joint Search for Network Architecture, Pruning and Quantization Policy
DPR-34、AC220V双位置继电器
How to generate non repeated random numbers in Excel, multi method + principle
EXCEL图表的绘制的一些注意点
Detailed explanation of some functions with similar functions in MySQL
西山居如何用 ONES 打造游戏工业流水线?|ONES 行业实践
Community summit pulsar summit old golden peak conference topic highlights exposure!
Clickhouse (04) how to build a Clickhouse cluster
5、 Basic composition & assertion of JMeter script
Rust中的函数function与方法method的区别
Sword finger offer 42 Maximum sum of continuous subarrays
Get started quickly, Jupiter notebook
进程的状态