当前位置：网站首页>Scenarios that must be considered when designing a stable microservice system

Scenarios that must be considered when designing a stable microservice system

2022-07-18 09:17:00 【JavaShark】

Our production environment is often unstable , Such as ：

The system exceeds the maximum load due to the instantaneous peak flow during the large acceleration ,load soar , The system crashed and the user couldn't place an order
“ Dark horse ” Hot commodity breakdown cache ,DB Be defeated , Squeeze the normal flow
The caller is dragged down by unstable services , Thread pool is full , Cause the whole call link to be stuck

These unstable scenarios can lead to serious consequences . You may want to ask ： How to achieve uniform and smooth user access ？ How to prevent the impact of excessive traffic or service instability ？

Introduce

The following two methods are common solutions in the face of traffic instability , It is also two capabilities that we have to consider before designing a highly available system , It is a key link in service traffic management .

flow control

Traffic is very random 、 unpredictable . The last second may be calm , There may be a flood peak in the next second （ For example, the scene of double eleven o'clock ）. Each system 、 Each service has its maximum capacity , If the sudden flow exceeds the capacity of the system , It may cause the request to be unable to process , The processing of stacked requests is slow ,CPU/Load soar , Finally, the system crashes . therefore , We need to limit this burst of traffic , Handle requests as much as possible while ensuring that services are not overwhelmed , This is flow control .

Fusing the drop

A service often calls other modules , Maybe another remote service 、 database , Or a third party API etc. . for example , At the time of payment , You may need to call the API; Query the price of a product , Database queries may be required . However , The stability of this dependent service is not guaranteed . If the dependent service is unstable , Request response times get longer , Then the response time of the method calling the service will be longer , Threads will pile up , Finally, the thread pool of the business itself may be exhausted , The service itself becomes unavailable .

Modern microservice architecture is distributed , It's made up of a lot of services . Different services call each other , Make up a complex call link . The above problems will produce amplification effect in the link call . A link on a complex link is unstable , It could cascade , Eventually, the whole link is unavailable . Therefore, we need to fuse and downgrade unstable weak dependent services , Temporarily cut off unstable calls , Avoid avalanche caused by local instability .

Q： Many students are asking , So is it not necessary to carry out flow control and current limiting protection because the service level is very small ？ Is the microservice architecture relatively simple without the introduction of fuse protection mechanism ？

A： Actually , This is related to the magnitude of the request 、 The complexity of the architecture has nothing to do with . A lot of times , It may be a very marginal service failure that causes the overall business to be affected , work great damage . We need to be aware of design for failure , Do a good job of capacity planning and sorting out the strong and weak dependence at ordinary times , Reasonable configuration of flow control degradation rules , Do well in advance protection , Instead of fixing problems online later .

In flow control 、 Degraded and fault tolerant scenarios , We have many ways to describe our governance plan , Now I will introduce a set of open 、 General purpose 、 Distributed service oriented architecture 、 Service governance standards covering the whole link isomerization ecosystem OpenSergo, Let's see. OpenSergo How to define the criteria of flow control degradation and fault tolerance , And what support the implementation of these Standards , What problems can help us solve ？

OpenSergo Flow control degradation and fault tolerance v1alpha1 standard

stay OpenSergo in , We combine Sentinel The scenario practice of the framework abstracts the standard for the implementation of flow control degradation and fault-tolerant scenarios CRD. We can think of a fault-tolerant governance rule (FaultToleranceRule) It consists of three parts ：

Target: For what kind of request
Strategy: Fault tolerance or control strategy , Such as flow control 、 Fuse 、 concurrency control 、 Adaptive overload protection 、 Outlier instance removal, etc
FallbackAction: After triggering fallback Behavior , Such as returning an error or status code

Let's take a look at the commonly used flow control degradation scenarios ,OpenSergo What is the specific standard definition , How did he solve our problem ？

First mentioned , As long as the microservice framework is adapted OpenSergo, Through unification CRD Flow control degradation and other governance . Whether it's Java still Go still Mesh service , Whether it's HTTP Request or RPC call , Or database SQL visit , We can all use this unified fault-tolerant governance rule CRD To configure fault-tolerant governance for each ring in the microservice architecture , To ensure the stability of our service link . Let's take a closer look at OpenSergo A configuration under each specific scenario .

flow control

The following example defines a cluster flow control strategy , The overall dimension of the cluster does not exceed 180 A request . Example CR YAML:

apiVersion: fault-tolerance.opensergo.io/v1alpha1
kind: RateLimitStrategy
metadata:
  name: rate-limit-foo
spec:
  metricType: RequestAmount
  limitMode: Global
  threshold: 180
  statDuration: "1s"
 Copy code

It's a simple one CR We can configure our system with a flow control capability , The flow control capability is equivalent to an airbag applied , Requests beyond the service capacity of the system will be rejected , The specific logic can be customized by us （ For example, return to the specified content or jump to the page ）.

Fuse protection

The following example defines a slow call proportional fusing strategy , Example CR YAML:

apiVersion: fault-tolerance.opensergo.io/v1alpha1
kind: CircuitBreakerStrategy
metadata:
  name: circuit-breaker-slow-foo
spec:
  strategy: SlowRequestRatio
  triggerRatio: '60%'
  statDuration: '30s'
  recoveryTimeout: '5s'
  minRequestAmount: 5
  slowConditions:
    maxAllowedRt: '500ms'
 Copy code

This CR The meaning of is ： stay 30s Internal requests exceed 500ms The proportion of 60% when , And the number of requests reaches 5 individual , It will automatically trigger the fuse , The fuse recovery time is 5s.

Imagine , At the peak of business . When some downstream service providers encounter performance bottlenecks , Even affect the business . We configure such a rule for some non critical service consumers , When the slow call ratio or error ratio within a period of time reaches a certain condition, the fuse will be automatically triggered , In the following period, the service call returns directly Mock Result , This can ensure that the caller is not dragged down by unstable services , It can also serve the unstable downstream “ Breathe ” Time for , At the same time, it can ensure the normal operation of the entire service link .

Implementation of flow control degradation and fault tolerance standard

Sentinel Introduce

Here is a support OpenSergo Flow control degradation and fault tolerance standard project Sentinel .

Sentinel Alibaba open source , Traffic control components for distributed service architecture , The main pointcut is traffic , Slave flow control 、 Traffic shaping 、 Fusing the drop 、 System adaptive protection and other dimensions help developers to ensure the stability of microservices .

Sentinel Technology highlights ：

Highly scalable ： Basic core + SPI Interface expansion capabilities , Users can easily expand the flow control 、 signal communication 、 Monitoring and other functions
Diversified flow control strategies （ Resource granularity 、 Call relationship 、 Flow control index 、 Flow control effect and other dimensions ）, Provide the ability of distributed cluster flow control
Hot spot flow detection and protection
Fuse degradation and isolation of unstable Services
System load adaptive protection in global dimension , Adjust the flow according to the water level of the system in real time
Cover API Gateway scene , by Spring Cloud Gateway、Zuul Provide gateway traffic control capabilities
Cloud native scene provides Envoy Service grid cluster traffic control capabilities
Real time monitoring and rule dynamic configuration management capabilities

Some common usage scenarios ：

At service provider （Service Provider） In the scene of , We need to protect the service provider itself from the flow peaks . At this time, the flow control is usually based on the service capability of the service provider , Or for specific service callers . We can evaluate the bearing capacity of the core interface in combination with the previous pressure test , To configure QPS Current limiting of the mode , When the number of requests per second exceeds the set threshold , Will automatically reject redundant requests .
To avoid being dragged down by unstable services when invoking other services , We need to be on the service invocation side （Service Consumer） Isolation and fusing of unstable service dependence . Means include semaphore isolation 、 The abnormal proportion is degraded 、RT Demotion and other means .
When the system is at low water level for a long time , When the flow suddenly increases , Directly raising the system to high water level may crush the system in an instant . At this time, we can use Sentinel Of WarmUp The flow control mode controls the slow increase of the flow through , Gradually increase to the upper limit of the threshold within a certain period of time , Not in a moment . This gives the cooling system a warm-up time , Avoid crushing the cold system .
utilize Sentinel The uniform queuing mode of “ Peak shaving and valley filling ”, Spread out the request to a certain period of time , Keep the system load within the request processing level , And handle as many requests as possible .
utilize Sentinel The gateway flow control characteristics of , Traffic protection at gateway entrance , Or limit API Call frequency of .

Alibaba cloud microservice solution

Alibaba cloud provides a fully compliant OpenSergo Enterprise level products of microservice standard MSE,MSE We can understand the traffic governance capability in the enterprise version of service governance as a commercial version Sentinel , We also briefly summarize MSE A comparison between traffic management and community solutions in the scenario of flow control degradation and fault tolerance .

Next, I will be based on MSE Let's demonstrate , How to protect our system through flow control and fuse degradation , You can calmly face uncertain traffic and a series of unstable scenes .

Configure flow control rules

We can view the real-time monitoring of each interface on the monitoring details page .

We can click “ New protection rules ” Button , Add a flow control rule ：

We can configure the simplest QPS The flow control rules of the pattern , For example, the above example limits the single machine debugging amount of the interface per second to no more than 80 Time .

Monitor and check the flow control effect

After configuring rules , After a while, you can see the current limiting effect on the monitoring page ：

The rejected traffic will also return an error message .MSE The embedded points of the built-in framework have default flow control processing logic , Such as Web The interface returns after being current Limited 429 Too Many Requests,DAO Throw an exception after the layer is restricted . If users want to customize the flow control processing logic of each layer more flexibly , Can pass SDK Access and configure customized flow control processing logic .

summary

Flow control degradation and fault tolerance are the scenarios we have to consider when designing a stable microservice system , If we design each system, we need to spend a lot of effort to design the flow control degradation and fault tolerance of the system , This will become a headache for every developer . So we have contacted and designed so many flow control degradation systems , Is there any general scenario 、 Best practices 、 Design standards and specifications and even reference implementation can precipitate ？

This article briefly introduces OpenSergo Flow control and fuse protection standard , It also introduces Sentinel Background and means of flow protection , Finally, an example is given to show how to use MSE The traffic protection capability of service governance is Your application escorts .

原网站

版权声明
本文为[JavaShark]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/199/202207151914375269.html