当前位置:网站首页>Scenarios that must be considered when designing a stable microservice system
Scenarios that must be considered when designing a stable microservice system
2022-07-18 09:17:00 【JavaShark】
Our production environment is often unstable , Such as :
- The system exceeds the maximum load due to the instantaneous peak flow during the large acceleration ,load soar , The system crashed and the user couldn't place an order
- “ Dark horse ” Hot commodity breakdown cache ,DB Be defeated , Squeeze the normal flow
- The caller is dragged down by unstable services , Thread pool is full , Cause the whole call link to be stuck
These unstable scenarios can lead to serious consequences . You may want to ask : How to achieve uniform and smooth user access ? How to prevent the impact of excessive traffic or service instability ?
Introduce
The following two methods are common solutions in the face of traffic instability , It is also two capabilities that we have to consider before designing a highly available system , It is a key link in service traffic management .
flow control
Traffic is very random 、 unpredictable . The last second may be calm , There may be a flood peak in the next second ( For example, the scene of double eleven o'clock ). Each system 、 Each service has its maximum capacity , If the sudden flow exceeds the capacity of the system , It may cause the request to be unable to process , The processing of stacked requests is slow ,CPU/Load soar , Finally, the system crashes . therefore , We need to limit this burst of traffic , Handle requests as much as possible while ensuring that services are not overwhelmed , This is flow control .
Fusing the drop
A service often calls other modules , Maybe another remote service 、 database , Or a third party API etc. . for example , At the time of payment , You may need to call the API; Query the price of a product , Database queries may be required . However , The stability of this dependent service is not guaranteed . If the dependent service is unstable , Request response times get longer , Then the response time of the method calling the service will be longer , Threads will pile up , Finally, the thread pool of the business itself may be exhausted , The service itself becomes unavailable .

Modern microservice architecture is distributed , It's made up of a lot of services . Different services call each other , Make up a complex call link . The above problems will produce amplification effect in the link call . A link on a complex link is unstable , It could cascade , Eventually, the whole link is unavailable . Therefore, we need to fuse and downgrade unstable weak dependent services , Temporarily cut off unstable calls , Avoid avalanche caused by local instability .
Q: Many students are asking , So is it not necessary to carry out flow control and current limiting protection because the service level is very small ? Is the microservice architecture relatively simple without the introduction of fuse protection mechanism ?
A: Actually , This is related to the magnitude of the request 、 The complexity of the architecture has nothing to do with . A lot of times , It may be a very marginal service failure that causes the overall business to be affected , work great damage . We need to be aware of design for failure , Do a good job of capacity planning and sorting out the strong and weak dependence at ordinary times , Reasonable configuration of flow control degradation rules , Do well in advance protection , Instead of fixing problems online later .
In flow control 、 Degraded and fault tolerant scenarios , We have many ways to describe our governance plan , Now I will introduce a set of open 、 General purpose 、 Distributed service oriented architecture 、 Service governance standards covering the whole link isomerization ecosystem OpenSergo, Let's see. OpenSergo How to define the criteria of flow control degradation and fault tolerance , And what support the implementation of these Standards , What problems can help us solve ?
OpenSergo Flow control degradation and fault tolerance v1alpha1 standard
stay OpenSergo in , We combine Sentinel The scenario practice of the framework abstracts the standard for the implementation of flow control degradation and fault-tolerant scenarios CRD. We can think of a fault-tolerant governance rule (FaultToleranceRule) It consists of three parts :
- Target: For what kind of request
- Strategy: Fault tolerance or control strategy , Such as flow control 、 Fuse 、 concurrency control 、 Adaptive overload protection 、 Outlier instance removal, etc
- FallbackAction: After triggering fallback Behavior , Such as returning an error or status code

Let's take a look at the commonly used flow control degradation scenarios ,OpenSergo What is the specific standard definition , How did he solve our problem ?
First mentioned , As long as the microservice framework is adapted OpenSergo, Through unification CRD Flow control degradation and other governance . Whether it's Java still Go still Mesh service , Whether it's HTTP Request or RPC call , Or database SQL visit , We can all use this unified fault-tolerant governance rule CRD To configure fault-tolerant governance for each ring in the microservice architecture , To ensure the stability of our service link . Let's take a closer look at OpenSergo A configuration under each specific scenario .
flow control
The following example defines a cluster flow control strategy , The overall dimension of the cluster does not exceed 180 A request . Example CR YAML:
apiVersion: fault-tolerance.opensergo.io/v1alpha1 kind: RateLimitStrategy metadata: name: rate-limit-foo spec: metricType: RequestAmount limitMode: Global threshold: 180 statDuration: "1s" Copy code
It's a simple one CR We can configure our system with a flow control capability , The flow control capability is equivalent to an airbag applied , Requests beyond the service capacity of the system will be rejected , The specific logic can be customized by us ( For example, return to the specified content or jump to the page ).

Fuse protection
The following example defines a slow call proportional fusing strategy , Example CR YAML:
apiVersion: fault-tolerance.opensergo.io/v1alpha1
kind: CircuitBreakerStrategy
metadata:
name: circuit-breaker-slow-foo
spec:
strategy: SlowRequestRatio
triggerRatio: '60%'
statDuration: '30s'
recoveryTimeout: '5s'
minRequestAmount: 5
slowConditions:
maxAllowedRt: '500ms'
Copy code This CR The meaning of is : stay 30s Internal requests exceed 500ms The proportion of 60% when , And the number of requests reaches 5 individual , It will automatically trigger the fuse , The fuse recovery time is 5s.

Imagine , At the peak of business . When some downstream service providers encounter performance bottlenecks , Even affect the business . We configure such a rule for some non critical service consumers , When the slow call ratio or error ratio within a period of time reaches a certain condition, the fuse will be automatically triggered , In the following period, the service call returns directly Mock Result , This can ensure that the caller is not dragged down by unstable services , It can also serve the unstable downstream “ Breathe ” Time for , At the same time, it can ensure the normal operation of the entire service link .
Implementation of flow control degradation and fault tolerance standard
Sentinel Introduce
Here is a support OpenSergo Flow control degradation and fault tolerance standard project Sentinel .
Sentinel Alibaba open source , Traffic control components for distributed service architecture , The main pointcut is traffic , Slave flow control 、 Traffic shaping 、 Fusing the drop 、 System adaptive protection and other dimensions help developers to ensure the stability of microservices .
Sentinel Technology highlights :
- Highly scalable : Basic core + SPI Interface expansion capabilities , Users can easily expand the flow control 、 signal communication 、 Monitoring and other functions
- Diversified flow control strategies ( Resource granularity 、 Call relationship 、 Flow control index 、 Flow control effect and other dimensions ), Provide the ability of distributed cluster flow control
- Hot spot flow detection and protection
- Fuse degradation and isolation of unstable Services
- System load adaptive protection in global dimension , Adjust the flow according to the water level of the system in real time
- Cover API Gateway scene , by Spring Cloud Gateway、Zuul Provide gateway traffic control capabilities
- Cloud native scene provides Envoy Service grid cluster traffic control capabilities
- Real time monitoring and rule dynamic configuration management capabilities

Some common usage scenarios :
At service provider (Service Provider) In the scene of , We need to protect the service provider itself from the flow peaks . At this time, the flow control is usually based on the service capability of the service provider , Or for specific service callers . We can evaluate the bearing capacity of the core interface in combination with the previous pressure test , To configure QPS Current limiting of the mode , When the number of requests per second exceeds the set threshold , Will automatically reject redundant requests .
To avoid being dragged down by unstable services when invoking other services , We need to be on the service invocation side (Service Consumer) Isolation and fusing of unstable service dependence . Means include semaphore isolation 、 The abnormal proportion is degraded 、RT Demotion and other means .
When the system is at low water level for a long time , When the flow suddenly increases , Directly raising the system to high water level may crush the system in an instant . At this time, we can use Sentinel Of WarmUp The flow control mode controls the slow increase of the flow through , Gradually increase to the upper limit of the threshold within a certain period of time , Not in a moment . This gives the cooling system a warm-up time , Avoid crushing the cold system .
utilize Sentinel The uniform queuing mode of “ Peak shaving and valley filling ”, Spread out the request to a certain period of time , Keep the system load within the request processing level , And handle as many requests as possible .
utilize Sentinel The gateway flow control characteristics of , Traffic protection at gateway entrance , Or limit API Call frequency of .
Alibaba cloud microservice solution
Alibaba cloud provides a fully compliant OpenSergo Enterprise level products of microservice standard MSE,MSE We can understand the traffic governance capability in the enterprise version of service governance as a commercial version Sentinel , We also briefly summarize MSE A comparison between traffic management and community solutions in the scenario of flow control degradation and fault tolerance .

Next, I will be based on MSE Let's demonstrate , How to protect our system through flow control and fuse degradation , You can calmly face uncertain traffic and a series of unstable scenes .
Configure flow control rules
We can view the real-time monitoring of each interface on the monitoring details page .

We can click “ New protection rules ” Button , Add a flow control rule :

We can configure the simplest QPS The flow control rules of the pattern , For example, the above example limits the single machine debugging amount of the interface per second to no more than 80 Time .
Monitor and check the flow control effect
After configuring rules , After a while, you can see the current limiting effect on the monitoring page :

The rejected traffic will also return an error message .MSE The embedded points of the built-in framework have default flow control processing logic , Such as Web The interface returns after being current Limited 429 Too Many Requests,DAO Throw an exception after the layer is restricted . If users want to customize the flow control processing logic of each layer more flexibly , Can pass SDK Access and configure customized flow control processing logic .
summary
Flow control degradation and fault tolerance are the scenarios we have to consider when designing a stable microservice system , If we design each system, we need to spend a lot of effort to design the flow control degradation and fault tolerance of the system , This will become a headache for every developer . So we have contacted and designed so many flow control degradation systems , Is there any general scenario 、 Best practices 、 Design standards and specifications and even reference implementation can precipitate ?
This article briefly introduces OpenSergo Flow control and fuse protection standard , It also introduces Sentinel Background and means of flow protection , Finally, an example is given to show how to use MSE The traffic protection capability of service governance is Your application escorts .
边栏推荐
- 积累少儿编程的学时经验与实践
- Inftnews | NFT tickets will change the way you participate in activities
- 增额终身寿险收益怎么样?可以当养老理财产品吗?
- 量子保密通信浅析
- Leetcode 151. 颠倒字符串中的单词
- Chinese garbled code caused by Grails
- Steam课堂教育理念的延伸与扩展
- Is it safe for Huatai Securities to open an account online and what materials are needed
- 2022 极术通讯-安谋科技开启商业化新篇章
- 用户登录和注册功能带验证码
猜你喜欢

【开源可信隐私计算框架 “隐语”】蚂蚁宣布面向全球开发者正式开源

设计稳定的微服务系统时不得不考虑的场景

Accumulate class hour experience and practice of children's programming

(2021 Niuke multi school V) B-boxes (probability expectation)

Pgbadger offline log analysis tool
![[Halcon] WriteImage保存图像崩溃问题](/img/c6/fdfe031f0836a61dbc3cf3634e244e.png)
[Halcon] WriteImage保存图像崩溃问题

Without training code, the reasoning performance is improved by 1.4 ~ 7.1 times, and the industry's first automatic model compression tool is open source

MySQL version upgrade experience

Analyzing the starting point of modern maker Education

SQL也能玩转AI ?没错!MLOps Meetup V3 回顾|OpenMLBD+SQLFlow+Byzer
随机推荐
浅析电子签章应用安全与技术
美国对人脸识别技术的法律规制及启示
SQL usage (1): how to use SQL statements to query the second highest value
Win10定时运行程序
Without training code, the reasoning performance is improved by 1.4 ~ 7.1 times, and the industry's first automatic model compression tool is open source!
语言AI原来知道自己的回答是否正确!伯克利等高校新研究火了,网友:危险危险危险
Using builderoot to learn and drive development
洞悉数据库迷局,2022金仓创新产品发布会召开
CVPR 2022 | 提高小数据集利用效率,复旦等提出分层级联ViT网络
(2021 Niuke multi school V) k-king of range (monotonous queue /st table)
解析结合劳动教育的steam新课程
(2021 Niuke multi school V) B-boxes (probability expectation)
信息检索顶会SIGIR2022最佳论文奖出炉,墨尔本理工大学最佳论文,UMass大学等最佳短论文
(codeforce453) A.little pony and expected maximum (mathematical expectation)
小目标检测2_OHEM
Manthan, Codefest 19 (open for everyone, rated, Div. 1 + Div. 2) - B, C, D
腾势全新豪华中大型MPV曝光,安全、舒适一个不落
Logu p4052 [jsoi2007] text generator solution
Without training code, the reasoning performance is improved by 1.4 ~ 7.1 times, and the industry's first automatic model compression tool is open source
请问政务网服务器的mysql数据已经与dataworks开通网络策略,但是测试还是说数据库测试连通性