最近生产上遇到了一个死锁问题,看日志应该是和使用了 ON DUPLICATE KEY UPDATE 语法有关系。查阅了一些资料,这个语法的确在并发高时容易引起死锁问题,但主要都是 gap 锁和插入意图锁引起的。但我实际遇到的是和唯一索引、主键索引有关,在测试环境也无法复现,有没有大佬帮助一下。 以下是相关环境信息和日志信息, tb 是主要更新的表,tb1 和 tb2 是用于关联查询出插入内容
Mysql 5.7 RC 事务隔离级别
CREATE TABLE `tb` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`bid` varchar(38) NOT NULL,
`sid` varchar(20) NOT NULL,
`oid` varchar(20) NOT NULL,
`emark` char(2) NOT NULL DEFAULT '00',
`amark` char(2) NOT NULL DEFAULT '00',
`bmark` char(2) NOT NULL DEFAULT '00',
`cmark` char(2) NOT NULL DEFAULT '00',
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_bid_sid_oid` (`bid`,`sid`,`oid`),
KEY `idx_oid` (`oid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
CREATE TABLE `tb1` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`sid` varchar(20) NOT NULL,
`bid` varchar(38) NOT NULL,
`oid` varchar(20) DEFAULT NULL,
`status` char(2) NOT NULL DEFAULT '1',
`update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_bid_sid` (`bid`,`sid`),
KEY `idx_sid` (`sid`),
KEY `idx_oid` (`oid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `tb2` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`bid` varchar(38) NOT NULL,
`expire_time` datetime DEFAULT NULL,
`status` char(2) NOT NULL,
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_name_list_batch_id` (`batch_id`),
KEY `idx_name_list_expire_time` (`expire_time`),
KEY `index_name_list_market_no` (`market_case_no`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
InnoDB: *** (1) TRANSACTION:
TRANSACTION 7367657071, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 3180164, OS thread handle 140056880891648, query id 2058483539
tb update
insert into tb
(
bid, sid, oid, emark, amark, bmark, cmark, update_time
)
values ('444444', '555555', '666666', '00', '00', '00', '00', now())
ON DUPLICATE KEY UPDATE
amark = values(amark),
update_time = now()
2023-12-28T19:00:05.105996+08:00 3180165 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db
.tb
trx id 7367657071 lock_mode X waiting
Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
1: len 10; hex 30313939373333303331; asc 555555;;
2: len 6; hex 363733353839; asc 666666;;
3: len 8; hex 00000000000910e7; asc ;;
2023-12-28T19:00:05.106368+08:00 3180165 [Note]
InnoDB: *** (2) TRANSACTION:
TRANSACTION 7367656999, ACTIVE 2 sec inserting, thread declared inside InnoDB 236
mysql tables in use 3, locked 1
465 lock struct(s), heap size 57552, 38900 row lock(s), undo log entries 19415
MySQL thread id 3180165, OS thread handle 140056037168896, query id 2058483278
user Sending data
insert into tb
(
bid, sid, oid, emark, amark, bmark, update_time
)
select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now()
from tb1 left join tb2
on tb1.bid = tb2.bid
where tb2.expire_time < '2023-12-31 19:00:03.963' and tb2.expire_time > now()
and tb1.status IN
(
'2'
,
'3'
)
ON DUPLICATE KEY UPDATE
emark = values(emark),
update_time = now()
2023-12-28T19:00:05.106420+08:00 3180165 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db
.tb
trx id 7367656999 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
1: len 10; hex 30313939373333303331; asc 555555;;
2: len 6; hex 363733353839; asc 666666;;
3: len 8; hex 00000000000910e7; asc ;;
2023-12-28T19:00:05.139592+08:00 3180165 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 270 page no 7273 n bits 88 index PRIMARY of table db
.tb
trx id 7367656999 lock_mode X locks rec but not gap waiting
Record lock, heap no 14 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
0: len 8; hex 00000000000aaf6c; asc l;;
1: len 6; hex 0001b725866f; asc % o;;
2: len 7; hex e20000118b0110; asc ;;
3: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
4: len 10; hex 30313939373333303331; asc 555555;;
5: len 6; hex 363733353839; asc 666666;;
6: len 2; hex 3030; asc 00;;
7: len 2; hex 3030; asc 00;;
8: len 2; hex 3030; asc 00;;
9: len 2; hex 3030; asc 00;;
10: len 5; hex 99b1f93004; asc 0 ;;
11: len 5; hex 99b1f93004; asc 0 ;;
2023-12-28T19:00:05.140247+08:00 3180165 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)
1
alexfarm OP 求大神给点排查思路啊~
|
2
alexfarm OP |
3
h0099 344 天前
https://www.v2ex.com/t/908047#r_14219204
https://www.v2ex.com/t/1005536#r_14219181 我的评价是:建议先大脑升级`最新最热最凶最恶`mysql8.2 |
5
h0099 344 天前
事务 2`INSERT INTO tb SELECT ...`的 subquery
```sql select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now() from tb1 left join tb2 on tb1.bid = tb2.bid where tb2.expire_time < '2023-12-31 19:00:03.963' and tb2.expire_time > now() and tb1.status IN ('2', '3') ``` 返回中有没有`(bid, sid, oid)`=`('444444', '555555', '666666')`的行? |
6
h0099 344 天前
@h0099 #3 实际上前天刚发布 8.3 https://dev.mysql.com/doc/relnotes/mysql/8.3/en/news-8-3-0.html
何谓 8.1 8.2 https://z.n0099.net/#narrow/near/71001 |
7
h0099 344 天前
首先您 tb2 删漏了
```diff CREATE TABLE `tb2` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `bid` varchar(38) NOT NULL, `expire_time` datetime DEFAULT NULL, `status` char(2) NOT NULL, `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), - UNIQUE KEY `idx_name_list_batch_id` (`batch_id`), + UNIQUE KEY `idx_name_bid` (`bid`), KEY `idx_name_list_expire_time` (`expire_time`), - KEY `index_name_list_market_no` (`market_case_no`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ``` 在 mysql8.0.35 下创建表后 ```sql INSERT INTO tb1(sid, bid, oid, status) VALUES(0, 1, 2, 3); INSERT INTO tb2(bid, expire_time, status) VALUES(1, ADDDATE(NOW(), 2), 3); ``` 默认 RR 事务隔离级别下 ```sql -- session1 START TRANSACTION; insert into tb(bid, sid, oid, emark, amark, bmark, cmark, update_time) values ('444444', '555555', '666666', '00', '00', '00', '00', now()) ON DUPLICATE KEY UPDATE amark = values(amark), update_time = now(); -- 重复执行 ROLLBACK; ``` ```sql -- session2 您的`'2023-12-31 19:00:03.963'`我改成了`ADDDATE(NOW(), 3)`以便 reprod START TRANSACTION; insert into tb(bid, sid, oid, emark, amark, bmark, update_time) select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now() from tb1 left join tb2 on tb1.bid = tb2.bid where tb2.expire_time < ADDDATE(NOW(), 3) and tb2.expire_time > now() and tb1.status IN ('2', '3') ON DUPLICATE KEY UPDATE emark = values(emark), update_time = now(); -- 重复执行 ROLLBACK; ``` 不论两个谁先执行执行几次它们都不会等待 acquire 另一事务占有的锁 而在 RC 下 |session1|session2| |-|-| |`SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;`|Ibid.| |`START TRANSACTION;`|Ibid.| |`SHOW SESSION VARIABLES LIKE '%iso%'\G`|Ibid.| |`insert into tb(bid, sid, oid, emark, amark, bmark, cmark, update_time) values ('444444', '555555', '666666', '00', '00', '00', '00', now()) ON DUPLICATE KEY UPDATE amark = values(amark), update_time = now();`|| ||`insert into tb(bid, sid, oid, emark, amark, bmark, update_time) select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now() from tb1 left join tb2 on tb1.bid = tb2.bid where tb2.expire_time < ADDDATE(NOW(), 3) and tb2.expire_time > now() and tb1.status IN ('2', '3') ON DUPLICATE KEY UPDATE emark = values(emark), update_time = now();`| ||等待 session1 释放某个锁| |`ROLLBACK;`|| ||继续执行并返回| 而如果 session2 先执行那后续它们以任意顺序执行都不会有等待 这可能是因为`tb`此时作为空表(`CREATE`后没有`INSERT`过) session1`INSERT`后需要表级锁以保证其他事务能够 RC (看到其刚`INSERT`的这行,但由于不是`READ UNCOMMITTED`RU 级别所以需要等待 session1`ROLLBACK`/`COMMIT`) https://stackoverflow.com/questions/56434878/mysql-deadlock-using-an-index-with-a-new-value/56435470#56435470 > In an empty table, the place that the 1 would have been is "anywhere in the table" (or anywhere from the start to the "supremum" mentioned in the deadlock) - which is consequently gaplocked by the delete. The same is true for the 2. And those locks do not conflict each other by definition. 而在 session2 先`INSERT`后`tb`不是空表所以 session1 无需表级锁而可以在索引`idx_bid_sid`上锁`('444444', '555555')`,而 session2`INSERT SELECT`subquery 所返回行`(bid, sid)=(1, 0)`中也没有跟`('444444', '555555')`冲突因而无需`ON DUPLICATE KEY UPDATE`从而避免等待 session1 释放对那行的锁 所以您的`tb`表在 session1 执行`INSERT`时是空的吗? |
8
h0099 344 天前
如果您能够修改逻辑避免在 sql 中用到`ON DUPLICATE KEY UPDATE`aka 您不是只能加索引调参优化的 dba 而是透过 orm 层层抽象隔离写 java 业务的 dev 那您也可以像您 at 我过来的那帖中那样换成`RC 事务隔离级别`+`SELECT ... WHERE unique 约束或 PK FOR UPDATE`产生`表级 IX 锁` https://github.com/n0099/open-tbm/issues/32#issuecomment-1401223094 对`SELECT`出既有的 record 来`UPDATE ... SET a|emark = ... WHERE unique 约束或 PK`再对不存在的 record 进行`INSERT`此时可以保证`INSERT`不会返回`ERROR 1062 (23000): Duplicate entry`因为此前的`SELECT ... FOR UPDATE`已经上了`表级 IX 锁` https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-intention-locks 而此时其他`INSERT`需要`行级 X 锁`就会阻塞等待`表级 IX 锁`释放
> Before a transaction can acquire an exclusive lock on a row in a table, it must first acquire an IX lock on the table. |
9
h0099 344 天前
如果单独只看您发的那坨`SHOW ENGINE INNODB STATUS`deadlock detection log 的话那当时场景就是
session2 先执行因而持有着`(bid, sid, oid)` ``` RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db.tb trx id 7367656999 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 ``` 上的末端`(某值, +∞)`[`nextkey`]( https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-next-key-locks)X 锁因为有[`hex 73757072656d756d; asc supremum`]( https://en.wikipedia.org/wiki/Infimum_and_supremum) https://dev.mysql.com/blog-archive/innodb-data-locking-part-2-5-locks-deeper-dive/ 而后执行的 session1 也在尝试 acquire`(bid, sid, oid)=('444444', '555555', '666666')`上的 recordlockX 锁但由于其位于 session2 的`nextkey`X 锁范围中而等待 ``` RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db.tb trx id 7367657071 lock_mode X waiting Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 ``` 当 session1 在等待时 session2 也开始等待 PK (但下面的约束不是在`PRIMARY KEY (id)`上而是`INSERT`中的其他字段可能因为其是`AUTO_INCREMENT`AI 而您也没显式指定其值所以无法`WHERE`)`(bid, sid, oid, emark, amark, bmark, cmark)=('444444', '555555', '666666', '00', '00', '00', '00')`上的[recordlock(有`but not gap`)]( https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-record-locks)X 锁 ``` RECORD LOCKS space id 270 page no 7273 n bits 88 index PRIMARY of table db.tb trx id 7367656999 lock_mode X locks rec but not gap waiting [...] 3: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;; 4: len 10; hex 30313939373333303331; asc 555555;; 5: len 6; hex 363733353839; asc 666666;; [...] ``` 所以是互相等待对方而被 deadlock detection 并决定 rollback 后来的 session1 让 session2 先过 ``` 2023-12-28T19:00:05.140247+08:00 3180165 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1) ``` |