Skip to content

Commit 6cb4b96

Browse files
naotagregkh
authored andcommitted
btrfs: replace BTRFS_MAX_EXTENT_SIZE with fs_info->max_extent_size
[ Upstream commit f7b12a6 ] On zoned filesystem, data write out is limited by max_zone_append_size, and a large ordered extent is split according the size of a bio. OTOH, the number of extents to be written is calculated using BTRFS_MAX_EXTENT_SIZE, and that estimated number is used to reserve the metadata bytes to update and/or create the metadata items. The metadata reservation is done at e.g, btrfs_buffered_write() and then released according to the estimation changes. Thus, if the number of extent increases massively, the reserved metadata can run out. The increase of the number of extents easily occurs on zoned filesystem if BTRFS_MAX_EXTENT_SIZE > max_zone_append_size. And, it causes the following warning on a small RAM environment with disabling metadata over-commit (in the following patch). [75721.498492] ------------[ cut here ]------------ [75721.505624] BTRFS: block rsv 1 returned -28 [75721.512230] WARNING: CPU: 24 PID: 2327559 at fs/btrfs/block-rsv.c:537 btrfs_use_block_rsv+0x560/0x760 [btrfs] [75721.581854] CPU: 24 PID: 2327559 Comm: kworker/u64:10 Kdump: loaded Tainted: G W 5.18.0-rc2-BTRFS-ZNS+ raspberrypi#109 [75721.597200] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 [75721.607310] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] [75721.616209] RIP: 0010:btrfs_use_block_rsv+0x560/0x760 [btrfs] [75721.646649] RSP: 0018:ffffc9000fbdf3e0 EFLAGS: 00010286 [75721.654126] RAX: 0000000000000000 RBX: 0000000000004000 RCX: 0000000000000000 [75721.663524] RDX: 0000000000000004 RSI: 0000000000000008 RDI: fffff52001f7be6e [75721.672921] RBP: ffffc9000fbdf420 R08: 0000000000000001 R09: ffff889f8d1fc6c7 [75721.682493] R10: ffffed13f1a3f8d8 R11: 0000000000000001 R12: ffff88980a3c0e28 [75721.692284] R13: ffff889b66590000 R14: ffff88980a3c0e40 R15: ffff88980a3c0e8a [75721.701878] FS: 0000000000000000(0000) GS:ffff889f8d000000(0000) knlGS:0000000000000000 [75721.712601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [75721.720726] CR2: 000055d12e05c018 CR3: 0000800193594000 CR4: 0000000000350ee0 [75721.730499] Call Trace: [75721.735166] <TASK> [75721.739886] btrfs_alloc_tree_block+0x1e1/0x1100 [btrfs] [75721.747545] ? btrfs_alloc_logged_file_extent+0x550/0x550 [btrfs] [75721.756145] ? btrfs_get_32+0xea/0x2d0 [btrfs] [75721.762852] ? btrfs_get_32+0xea/0x2d0 [btrfs] [75721.769520] ? push_leaf_left+0x420/0x620 [btrfs] [75721.776431] ? memcpy+0x4e/0x60 [75721.781931] split_leaf+0x433/0x12d0 [btrfs] [75721.788392] ? btrfs_get_token_32+0x580/0x580 [btrfs] [75721.795636] ? push_for_double_split.isra.0+0x420/0x420 [btrfs] [75721.803759] ? leaf_space_used+0x15d/0x1a0 [btrfs] [75721.811156] btrfs_search_slot+0x1bc3/0x2790 [btrfs] [75721.818300] ? lock_downgrade+0x7c0/0x7c0 [75721.824411] ? free_extent_buffer.part.0+0x107/0x200 [btrfs] [75721.832456] ? split_leaf+0x12d0/0x12d0 [btrfs] [75721.839149] ? free_extent_buffer.part.0+0x14f/0x200 [btrfs] [75721.846945] ? free_extent_buffer+0x13/0x20 [btrfs] [75721.853960] ? btrfs_release_path+0x4b/0x190 [btrfs] [75721.861429] btrfs_csum_file_blocks+0x85c/0x1500 [btrfs] [75721.869313] ? rcu_read_lock_sched_held+0x16/0x80 [75721.876085] ? lock_release+0x552/0xf80 [75721.881957] ? btrfs_del_csums+0x8c0/0x8c0 [btrfs] [75721.888886] ? __kasan_check_write+0x14/0x20 [75721.895152] ? do_raw_read_unlock+0x44/0x80 [75721.901323] ? _raw_write_lock_irq+0x60/0x80 [75721.907983] ? btrfs_global_root+0xb9/0xe0 [btrfs] [75721.915166] ? btrfs_csum_root+0x12b/0x180 [btrfs] [75721.921918] ? btrfs_get_global_root+0x820/0x820 [btrfs] [75721.929166] ? _raw_write_unlock+0x23/0x40 [75721.935116] ? unpin_extent_cache+0x1e3/0x390 [btrfs] [75721.942041] btrfs_finish_ordered_io.isra.0+0xa0c/0x1dc0 [btrfs] [75721.949906] ? try_to_wake_up+0x30/0x14a0 [75721.955700] ? btrfs_unlink_subvol+0xda0/0xda0 [btrfs] [75721.962661] ? rcu_read_lock_sched_held+0x16/0x80 [75721.969111] ? lock_acquire+0x41b/0x4c0 [75721.974982] finish_ordered_fn+0x15/0x20 [btrfs] [75721.981639] btrfs_work_helper+0x1af/0xa80 [btrfs] [75721.988184] ? _raw_spin_unlock_irq+0x28/0x50 [75721.994643] process_one_work+0x815/0x1460 [75722.000444] ? pwq_dec_nr_in_flight+0x250/0x250 [75722.006643] ? do_raw_spin_trylock+0xbb/0x190 [75722.013086] worker_thread+0x59a/0xeb0 [75722.018511] kthread+0x2ac/0x360 [75722.023428] ? process_one_work+0x1460/0x1460 [75722.029431] ? kthread_complete_and_exit+0x30/0x30 [75722.036044] ret_from_fork+0x22/0x30 [75722.041255] </TASK> [75722.045047] irq event stamp: 0 [75722.049703] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [75722.057610] hardirqs last disabled at (0): [<ffffffff8118a94a>] copy_process+0x1c1a/0x66b0 [75722.067533] softirqs last enabled at (0): [<ffffffff8118a989>] copy_process+0x1c59/0x66b0 [75722.077423] softirqs last disabled at (0): [<0000000000000000>] 0x0 [75722.085335] ---[ end trace 0000000000000000 ]--- To fix the estimation, we need to introduce fs_info->max_extent_size to replace BTRFS_MAX_EXTENT_SIZE, which allow setting the different size for regular vs zoned filesystem. Set fs_info->max_extent_size to BTRFS_MAX_EXTENT_SIZE by default. On zoned filesystem, it is set to fs_info->max_zone_append_size. CC: [email protected] # 5.12+ Fixes: d8e3fb1 ("btrfs: zoned: use ZONE_APPEND write for zoned mode") Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
1 parent c1f4c40 commit 6cb4b96

File tree

5 files changed

+19
-4
lines changed

5 files changed

+19
-4
lines changed

fs/btrfs/ctree.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1032,6 +1032,12 @@ struct btrfs_fs_info {
10321032
u32 csums_per_leaf;
10331033
u32 stripesize;
10341034

1035+
/*
1036+
* Maximum size of an extent. BTRFS_MAX_EXTENT_SIZE on regular
1037+
* filesystem, on zoned it depends on the device constraints.
1038+
*/
1039+
u64 max_extent_size;
1040+
10351041
/* Block groups and devices containing active swapfiles. */
10361042
spinlock_t swapfile_pins_lock;
10371043
struct rb_root swapfile_pins;

fs/btrfs/disk-io.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3246,6 +3246,8 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
32463246
fs_info->sectorsize_bits = ilog2(4096);
32473247
fs_info->stripesize = 4096;
32483248

3249+
fs_info->max_extent_size = BTRFS_MAX_EXTENT_SIZE;
3250+
32493251
spin_lock_init(&fs_info->swapfile_pins_lock);
32503252
fs_info->swapfile_pins = RB_ROOT;
32513253

fs/btrfs/extent_io.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1992,10 +1992,12 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
19921992
struct page *locked_page, u64 *start,
19931993
u64 *end)
19941994
{
1995+
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
19951996
struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
19961997
const u64 orig_start = *start;
19971998
const u64 orig_end = *end;
1998-
u64 max_bytes = BTRFS_MAX_EXTENT_SIZE;
1999+
/* The sanity tests may not set a valid fs_info. */
2000+
u64 max_bytes = fs_info ? fs_info->max_extent_size : BTRFS_MAX_EXTENT_SIZE;
19992001
u64 delalloc_start;
20002002
u64 delalloc_end;
20012003
bool found;

fs/btrfs/inode.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2102,14 +2102,15 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
21022102
void btrfs_split_delalloc_extent(struct inode *inode,
21032103
struct extent_state *orig, u64 split)
21042104
{
2105+
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
21052106
u64 size;
21062107

21072108
/* not delalloc, ignore it */
21082109
if (!(orig->state & EXTENT_DELALLOC))
21092110
return;
21102111

21112112
size = orig->end - orig->start + 1;
2112-
if (size > BTRFS_MAX_EXTENT_SIZE) {
2113+
if (size > fs_info->max_extent_size) {
21132114
u32 num_extents;
21142115
u64 new_size;
21152116

@@ -2138,6 +2139,7 @@ void btrfs_split_delalloc_extent(struct inode *inode,
21382139
void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new,
21392140
struct extent_state *other)
21402141
{
2142+
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
21412143
u64 new_size, old_size;
21422144
u32 num_extents;
21432145

@@ -2151,7 +2153,7 @@ void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new,
21512153
new_size = other->end - new->start + 1;
21522154

21532155
/* we're not bigger than the max, unreserve the space and go */
2154-
if (new_size <= BTRFS_MAX_EXTENT_SIZE) {
2156+
if (new_size <= fs_info->max_extent_size) {
21552157
spin_lock(&BTRFS_I(inode)->lock);
21562158
btrfs_mod_outstanding_extents(BTRFS_I(inode), -1);
21572159
spin_unlock(&BTRFS_I(inode)->lock);

fs/btrfs/zoned.c

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -731,8 +731,11 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
731731
}
732732

733733
fs_info->zone_size = zone_size;
734-
fs_info->max_zone_append_size = max_zone_append_size;
734+
fs_info->max_zone_append_size = ALIGN_DOWN(max_zone_append_size,
735+
fs_info->sectorsize);
735736
fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED;
737+
if (fs_info->max_zone_append_size < fs_info->max_extent_size)
738+
fs_info->max_extent_size = fs_info->max_zone_append_size;
736739

737740
/*
738741
* Check mount options here, because we might change fs_info->zoned

0 commit comments

Comments
 (0)