当前位置:网站首页>Accident caused by the combination of annotation transaction + distributed lock at the read committed level -- disorder of activity purchase opportunities
Accident caused by the combination of annotation transaction + distributed lock at the read committed level -- disorder of activity purchase opportunities
2022-07-19 11:01:00 【The name is random】
background : We have a purchase restriction activity here, which can restrict the purchase of some goods , Users can get purchase opportunities by actively participating in platform games or shopping . I suddenly received a system alarm today , There are a lot of abnormal error codes . The accident phenomenon : Read the record for 17 Million users each added two purchase opportunities , And does everyone add enough opportunities at one time , Instead, the business test adopts the form of adding one opportunity for each interface adjustment ... The business layer is divided 8 Ten thousand sets of data , One user per group , Each group has two concurrent calls to increase the interface , The accident caused 17 Among 10000 users 350 More than users cannot place orders normally , There are relatively few damaged users , Before the problem was reported, an email from the alarm center came , Solve the problem before the customer complains ; The probable cause of the accident : Check it out , It was found that this was a
Mysql Read COMMIT Level+Annotate the business+Distributed lock, When the system receives extremely high concurrency (μs level ) Accidents caused by . The three combine to produce a special bug. Let me explain it in detail
One . Business simple pseudo code posted :
/**
* Opportunities to increase interfaces
XXXXXXX And other symbols are my manual coding behavior
*/
@Transactional(rollbackFor = Exception.class) // Be careful , There is a problem here
@PostMapping("chanceAdd")
public XxxDto chanceAdd(@RequestBody xxxReq req) {
// Get rid of the weight quickly \ Fast failure mechanism ( reference AQS Of addWaiter)---- In addition, there is only one key in the database to ensure the minimum and remove the weight for a long time
if (!redisUtils.setExNx(REPEAT_CHECK_PRE +XXX orderNo XXXXX)) {// Business order number judgment , The same transaction can only increase one opportunity
throw new CommonException(ApplicationCode.REPEAT_SUBMIT," Add opportunities repeatedly ");
}
// By person + merchants + The activity assigns a lock
RLock lock = redissonClient.getLock(REPEAT_CHECK_PRE +XXX people , merchants id, Activities idXXXXX);
lock.lock();
try {
// Activity add record increase
final boolean saveRes = quotaExtChanceAddRecordService.save(ExtChanceAddRecordMapping.INSTANCE.toQuotaAddRecordPojo(req));
if (saveRes) {
// The total opportunity of this person increases
// Query whether the user total opportunity record already exists
UserExtChance userExtChance = service.getUserExtChance(req.getUserId(), req.getMallId(), req.getActivityId());
if (userExtChance==null){// If the user purchase record does not exist
// Generate the total opportunity record of the user for this activity
}else {// Already exists
// Add existing opportunity records
}
}
} catch (Exception e) {
log.error("chanceAdd,data:{},errorMsg:{}",req.toString(),e.getMessage());
throw new CommonException(ApplicationCode.REPEAT_SUBMIT);
} finally {
lock.unlock();
}
return new XxxDto();
}Two . Error cause analysis
We analyze according to the code line , Simulate abnormal conditions
- There is no problem in starting the transaction
- The red lock here can also ensure the serialization of adding opportunities to single person, single business and single activity under the distributed situation
- But suppose there are two threads A,B Simultaneously adjust this interface , May appear A Release lock uncommitted transaction ,B Acquire lock due to A Uncommitted transactions , What you get is A Commit the previous snapshot , So I made a wrong judgment
- thus A,B Both generate two total opportunity records for the same user . Or there is a problem of data coverage ( Other possibilities ).
Error process simulation , analysis
3、 ... and . summary
The reason for this error is that although we use the red lock to protect a specific opportunity (( user , merchants , Activities ) dimension ) Increased serialization , But our business here is used Annotate the business Cause the transaction to be committed after the method ends , therefore Read COMMIT Below grade , Concurrency may read unchanged data , Lead to wrong judgment
Four . solve
Change to declarative transaction , Commit a transaction or roll back a transaction abnormally after the business ends , The key is before the end of serialization ( Here is before getting the red lock ) Complete the operation of the whole transaction ;
Thanks to various alarm configurations of the system .... The problem was exposed before the user found it , Completed the problem exposure in one day , Find the cause , Test recurrence , Develop solutions , Release tests , go online , Brush data , Retest and verify the whole process ;
It is recommended that only very simple transactions be annotated , It's better to do complex business manually . In addition, as long as we take the initiative to lock, we generally know that there must be potential concurrency problems here , When testers test, they must test dozens of groups , Make sure our anti concurrency is ok ; Our business has been tested by testers before , It was used 30 Group 30qps Concurrent , But it's really accidental here , So there's no problem ... This time it is 1W There is a problem with the concurrency of multiple groups ;
边栏推荐
- Data Guard Broker的概念和Data Guard Broker的配置过程
- Category imbalance in classification tasks
- 英伟达用AI设计GPU:最新H100已经用上,比传统EDA减少25%芯片面积
- Connected graph (union search set)
- [Huawei cloud IOT] reading notes, "Internet of things: core technology and security of the Internet of things", Chapter 3 (2)
- Input number pure digital input limit length limit maximum value
- Satellite network capacity improvement method based on network coding
- 2022/7/16
- [csp-j 2021] summary
- Pytoch realizes multi-layer perceptron manually
猜你喜欢

军品研制过程所需文件-进阶版

人大、微软等提出InclusiveFL:异构设备上的包容性联邦学习

树链剖分思想讲解 + AcWing 2568. 树链剖分(dfs序 + 爬山法 + 线段树)

ThreadLocal变量使用及原理

vulnhub inclusiveness: 1

ENVI_ Idl: use the inverse distance weight method to select the nearest n points for interpolation (bottom implementation) and output them to GeoTIFF format (the effect is equivalent to the inverse di

Svn learning

Win10安装Apache Jena 3.17

一个报错, Uncaught TypeError: ModalFactory is not a constructor

Establishment of redis cluster, one master, two slave and three Sentinels
随机推荐
如何在双链笔记软件中建立仪表盘和知识库?以嵌入式小组件库 NotionPet 为例
数据库锁的介绍与InnoDB共享,排他锁
OpenCV编程:OpenCV3.X训练自己的分类器
Openfoam heat flow boundary condition
After summarizing the surface based knowledge of the database
E-commerce sales data analysis and prediction (date data statistics, daily statistics, monthly statistics)
Game theory (Depu) and investment (40/100)
[Huawei cloud IOT] reading notes, "Internet of things: core technology and security of the Internet of things", Chapter 3 (2)
Google Earth engine app (GEE) - set up a nighttime lighting timing analysis app in China
How to build dashboard and knowledge base in double chain note taking software? Take the embedded widget library notionpet as an example
Documents required for military product development process - advanced version
今日睡眠质量记录79分
Svn learning
Thinking about the integrated communication of air, space and earth based on the "7.20 Zhengzhou rainstorm"
Nombre d'entrées nombre d'entrées numériques pures limite de longueur maximale
vSphere 下借助 vDS 或 NSX 做端口镜像的方法总结
Modify the default path of jupyter see this article!
Pytoch and weight decay (L2 norm)
ue4对动画蓝图的理解
Evaluation method of machine learning model