Skip to content

Commit b58d1a9

Browse files
fdmananamasoncl
authored andcommitted
Btrfs: fix race between start dirty bg cache writeout and bg deletion
While running xfstests I ran into the following: [20892.242791] ------------[ cut here ]------------ [20892.243776] WARNING: CPU: 0 PID: 13299 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]() [20892.245874] BTRFS: Transaction aborted (error -2) [20892.247329] Modules linked in: btrfs dm_snapshot dm_bufio dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse$ [20892.258488] CPU: 0 PID: 13299 Comm: fsstress Tainted: G W 4.0.0-rc5-btrfs-next-9+ #2 [20892.262011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [20892.264738] 0000000000000009 ffff880427f8bc18 ffffffff8142fa46 ffffffff8108b6a2 [20892.266244] ffff880427f8bc68 ffff880427f8bc58 ffffffff81045ea5 ffff880427f8bc48 [20892.267761] ffffffffa0509a6d 00000000fffffffe ffff8803545d6f40 ffffffffa05a15a0 [20892.269378] Call Trace: [20892.269915] [<ffffffff8142fa46>] dump_stack+0x4f/0x7b [20892.271097] [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad [20892.272173] [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb [20892.273386] [<ffffffffa0509a6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs] [20892.274857] [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48 [20892.275851] [<ffffffffa0509a6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs] [20892.277341] [<ffffffffa0515e10>] write_one_cache_group+0x68/0xaf [btrfs] [20892.278628] [<ffffffffa052088a>] btrfs_start_dirty_block_groups+0x18d/0x29b [btrfs] [20892.280191] [<ffffffffa052f077>] btrfs_commit_transaction+0x130/0x9c9 [btrfs] [20892.281781] [<ffffffff8107d33d>] ? trace_hardirqs_on+0xd/0xf [20892.282873] [<ffffffffa054163b>] btrfs_sync_file+0x313/0x387 [btrfs] [20892.284111] [<ffffffff8117acad>] vfs_fsync_range+0x95/0xa4 [20892.285203] [<ffffffff810e603f>] ? time_hardirqs_on+0x15/0x28 [20892.286290] [<ffffffff8123960b>] ? trace_hardirqs_on_thunk+0x3a/0x3f [20892.287469] [<ffffffff8117acd8>] vfs_fsync+0x1c/0x1e [20892.288412] [<ffffffff8117ae54>] do_fsync+0x34/0x4e [20892.289348] [<ffffffff8117b07c>] SyS_fsync+0x10/0x14 [20892.290255] [<ffffffff81435b32>] system_call_fastpath+0x12/0x17 [20892.291316] ---[ end trace 597f77e664245373 ]--- [20892.293955] BTRFS: error (device sdg) in write_one_cache_group:3184: errno=-2 No such entry [20892.297390] BTRFS info (device sdg): forced readonly This happens because in btrfs_start_dirty_block_groups() we splice the transaction's list of dirty block groups into a local list and then we keep extracting the first element of the list without holding the cache_write_mutex mutex. This means that before we acquire that mutex the first block group on the list might be removed by a conurrent task running btrfs_remove_block_group(). So make sure we extract the first element (and test the list emptyness) while holding that mutex. Fixes: 1bbc621 ("Btrfs: allow block group cache writeout outside critical section in commit") Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>
1 parent a3bdccc commit b58d1a9

File tree

1 file changed

+27
-17
lines changed

1 file changed

+27
-17
lines changed

fs/btrfs/extent-tree.c

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3408,17 +3408,14 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
34083408
int loops = 0;
34093409

34103410
spin_lock(&cur_trans->dirty_bgs_lock);
3411-
if (!list_empty(&cur_trans->dirty_bgs)) {
3412-
list_splice_init(&cur_trans->dirty_bgs, &dirty);
3411+
if (list_empty(&cur_trans->dirty_bgs)) {
3412+
spin_unlock(&cur_trans->dirty_bgs_lock);
3413+
return 0;
34133414
}
3415+
list_splice_init(&cur_trans->dirty_bgs, &dirty);
34143416
spin_unlock(&cur_trans->dirty_bgs_lock);
34153417

34163418
again:
3417-
if (list_empty(&dirty)) {
3418-
btrfs_free_path(path);
3419-
return 0;
3420-
}
3421-
34223419
/*
34233420
* make sure all the block groups on our dirty list actually
34243421
* exist
@@ -3431,18 +3428,16 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
34313428
return -ENOMEM;
34323429
}
34333430

3431+
/*
3432+
* cache_write_mutex is here only to save us from balance or automatic
3433+
* removal of empty block groups deleting this block group while we are
3434+
* writing out the cache
3435+
*/
3436+
mutex_lock(&trans->transaction->cache_write_mutex);
34343437
while (!list_empty(&dirty)) {
34353438
cache = list_first_entry(&dirty,
34363439
struct btrfs_block_group_cache,
34373440
dirty_list);
3438-
3439-
/*
3440-
* cache_write_mutex is here only to save us from balance
3441-
* deleting this block group while we are writing out the
3442-
* cache
3443-
*/
3444-
mutex_lock(&trans->transaction->cache_write_mutex);
3445-
34463441
/*
34473442
* this can happen if something re-dirties a block
34483443
* group that is already under IO. Just wait for it to
@@ -3495,15 +3490,23 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
34953490
}
34963491
if (!ret)
34973492
ret = write_one_cache_group(trans, root, path, cache);
3498-
mutex_unlock(&trans->transaction->cache_write_mutex);
34993493

35003494
/* if its not on the io list, we need to put the block group */
35013495
if (should_put)
35023496
btrfs_put_block_group(cache);
35033497

35043498
if (ret)
35053499
break;
3500+
3501+
/*
3502+
* Avoid blocking other tasks for too long. It might even save
3503+
* us from writing caches for block groups that are going to be
3504+
* removed.
3505+
*/
3506+
mutex_unlock(&trans->transaction->cache_write_mutex);
3507+
mutex_lock(&trans->transaction->cache_write_mutex);
35063508
}
3509+
mutex_unlock(&trans->transaction->cache_write_mutex);
35073510

35083511
/*
35093512
* go through delayed refs for all the stuff we've just kicked off
@@ -3514,8 +3517,15 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
35143517
loops++;
35153518
spin_lock(&cur_trans->dirty_bgs_lock);
35163519
list_splice_init(&cur_trans->dirty_bgs, &dirty);
3520+
/*
3521+
* dirty_bgs_lock protects us from concurrent block group
3522+
* deletes too (not just cache_write_mutex).
3523+
*/
3524+
if (!list_empty(&dirty)) {
3525+
spin_unlock(&cur_trans->dirty_bgs_lock);
3526+
goto again;
3527+
}
35173528
spin_unlock(&cur_trans->dirty_bgs_lock);
3518-
goto again;
35193529
}
35203530

35213531
btrfs_free_path(path);

0 commit comments

Comments
 (0)