Skip to content

Commit ae5e070

Browse files
adam900710kdave
authored andcommitted
btrfs: qgroup: don't try to wait flushing if we're already holding a transaction
There is a chance of racing for qgroup flushing which may lead to deadlock: Thread A | Thread B (not holding trans handle) | (holding a trans handle) --------------------------------+-------------------------------- __btrfs_qgroup_reserve_meta() | __btrfs_qgroup_reserve_meta() |- try_flush_qgroup() | |- try_flush_qgroup() |- QGROUP_FLUSHING bit set | | | | |- test_and_set_bit() | | |- wait_event() |- btrfs_join_transaction() | |- btrfs_commit_transaction()| !!! DEAD LOCK !!! Since thread A wants to commit transaction, but thread B is holding a transaction handle, blocking the commit. At the same time, thread B is waiting for thread A to finish its commit. This is just a hot fix, and would lead to more EDQUOT when we're near the qgroup limit. The proper fix would be to make all metadata/data reservations happen without holding a transaction handle. CC: [email protected] # 5.9+ Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 9a66497 commit ae5e070

File tree

1 file changed

+20
-10
lines changed

1 file changed

+20
-10
lines changed

fs/btrfs/qgroup.c

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3530,16 +3530,6 @@ static int try_flush_qgroup(struct btrfs_root *root)
35303530
int ret;
35313531
bool can_commit = true;
35323532

3533-
/*
3534-
* We don't want to run flush again and again, so if there is a running
3535-
* one, we won't try to start a new flush, but exit directly.
3536-
*/
3537-
if (test_and_set_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state)) {
3538-
wait_event(root->qgroup_flush_wait,
3539-
!test_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state));
3540-
return 0;
3541-
}
3542-
35433533
/*
35443534
* If current process holds a transaction, we shouldn't flush, as we
35453535
* assume all space reservation happens before a transaction handle is
@@ -3554,6 +3544,26 @@ static int try_flush_qgroup(struct btrfs_root *root)
35543544
current->journal_info != BTRFS_SEND_TRANS_STUB)
35553545
can_commit = false;
35563546

3547+
/*
3548+
* We don't want to run flush again and again, so if there is a running
3549+
* one, we won't try to start a new flush, but exit directly.
3550+
*/
3551+
if (test_and_set_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state)) {
3552+
/*
3553+
* We are already holding a transaction, thus we can block other
3554+
* threads from flushing. So exit right now. This increases
3555+
* the chance of EDQUOT for heavy load and near limit cases.
3556+
* But we can argue that if we're already near limit, EDQUOT is
3557+
* unavoidable anyway.
3558+
*/
3559+
if (!can_commit)
3560+
return 0;
3561+
3562+
wait_event(root->qgroup_flush_wait,
3563+
!test_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state));
3564+
return 0;
3565+
}
3566+
35573567
ret = btrfs_start_delalloc_snapshot(root);
35583568
if (ret < 0)
35593569
goto out;

0 commit comments

Comments
 (0)