Skip to content

BCM270x_DT: Configure UART using DT #910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

BCM270x_DT: Configure UART using DT #910

wants to merge 1 commit into from

Conversation

pelwell
Copy link
Contributor

@pelwell pelwell commented Mar 24, 2015

@pelwell
Copy link
Contributor Author

pelwell commented Mar 24, 2015

@notro - any comments?

@notro
Copy link
Contributor

notro commented Mar 24, 2015

Any reason why you haven't just copied the node from bcm2835.dtsi?

See here for why arm,primecell-periphid is needed: https://lkml.org/lkml/2013/5/21/20

Looks like you missed bcm2708.c

@pelwell
Copy link
Contributor Author

pelwell commented Mar 24, 2015

Just copying bcm2835.dtsi didn't work - I needed to add the clock information, but it was an iterative process to get there, during which I accumulated some cruft. I then pruned it back until I got the minimum changeset that still worked. It does work without that line, at least sometimes.

I've corrected those issues.

@notro
Copy link
Contributor

notro commented Mar 24, 2015

I see now why bcm2835.dtsi doesn't contain clocks, they are added in the clock driver: http://lxr.free-electrons.com/source/drivers/clk/clk-bcm2835.c

The binding document states that when specified it should be 2 clocks: https://www.kernel.org/doc/Documentation/devicetree/bindings/serial/pl011.txt

What's the purpose of the reg-shift property?

@pelwell
Copy link
Contributor Author

pelwell commented Mar 24, 2015

The reg-shift property is supposed to specify the spacing between registers on the bus. It is used by many device drivers, and it used for pl011 nodes in many ARM .dtsi files, but it isn't mentioned in the bindings and it isn't clear that the pl011 driver uses it.

I'll review the clocking in the morning.

Phil

On 24/03/2015 20:09, notro wrote:

I see now why bcm2835.dtsi doesn't contain clocks, they are added in the clock driver: http://lxr.free-electrons.com/source/drivers/clk/clk-bcm2835.c

The binding document states that when specified it should be 2 clocks: https://www.kernel.org/doc/Documentation/devicetree/bindings/serial/pl011.txt

What's the purpose of the reg-shift property?


Reply to this email directly or view it on GitHub #910 (comment).

@notro
Copy link
Contributor

notro commented Mar 25, 2015

reg-shift

If I remove DT files and files not related to serial/tty, I'm left with these:

$ grep -Ir "reg-shift" .
drivers/tty/serial/8250/8250_dw.c:        if (!of_property_read_u32(np, "reg-shift", &val))
drivers/tty/serial/of_serial.c:   if (of_property_read_u32(np, "reg-shift", &prop) == 0)
[...]
arch/powerpc/kernel/legacy_serial.c:      rs = of_get_property(np, "reg-shift", NULL);
[...]
Documentation/devicetree/bindings/serial/nvidia,tegra20-hsuart.txt:       reg-shift = <2>;
Documentation/devicetree/bindings/serial/of-serial.txt:- reg-shift : quantity to shift the register offsets by.
Documentation/devicetree/bindings/serial/snps-dw-apb-uart.txt:- reg-shift : quantity to shift the register offsets by.  If this property is
[...]
Documentation/devicetree/bindings/xilinx.txt:       - reg-shift : A value of 2 is required

I can't find anything that should tie pl011 and reg-shift.
It could be that the DT files that have reg-shift also have one of the of_serial.c devices in it's compatible string.

BTW arm,primecell triggers some special handling:
of_platform_populate -> of_platform_bus_create -> of_amba_device_create
This is where arm,primecell-periphid is read.

@pelwell
Copy link
Contributor Author

pelwell commented Mar 25, 2015

This patch is targeting 3.18, where the binding documentation (https://github.com/raspberrypi/linux/blob/rpi-3.18.y/Documentation/devicetree/bindings/serial/pl011.txt) says:

* ARM AMBA Primecell PL011 serial UART

Required properties:
- compatible: must be "arm,primecell", "arm,pl011"
- reg: exactly one register range with length 0x1000
- interrupts: exactly one interrupt specifier

Optional properties:
- pinctrl: When present, must have one state named "sleep"
       and one state named "default"
- clocks:  When present, must refer to exactly one clock named
       "apb_pclk"
- dmas:    When present, may have one or two dma channels.
       The first one must be named "rx", the second one
       must be named "tx".

See also bindings/arm/primecell.txt

I've removed the reference to reg-shift, swapped the compatible strings and pushed again.

@notro
Copy link
Contributor

notro commented Mar 25, 2015

The 3.18 binding doc is wrong as this commit states: ARM: dt: fix up PL011 device tree bindings:

Make the map match the reality, the current binding text is
nonsense:

- The clock required for the clocking of the serial port
  must come first and is not optional (as the driver will
  otherwise proceed to grab and use the apb_pclk as uartclk),
  and the apb_pclk that clocks the logic must come second
  as the code will retrieve the first clock by index,
  whereas the PrimeCell but will explicitly look for
  "apb_pclk" so this can be specified later, as it is
  looked up by name.

I haven't understood the purpose of the clk-bcm2835 driver. Why not just put these clocks in the Device Tree?
But this PR has forced me to look closer into this, and I'm starting to think that it actually started as a stub driver for clocks, but later fixed DT clocks was used instead.
When the BCM2835 clock spec is released, this is where the code will go to control the clocks I guess.
(some details about amba and apb_pclk: https://lists.ozlabs.org/pipermail/devicetree-discuss/2011-May/005577.html)

So, since we want to close in on mainline, we can enable this driver and get the clocks that way.
Something like:

-obj-$(CONFIG_ARCH_BCM2835)     += clk-bcm2835.o
+obj-$(CONFIG_COMMON_CLK_BCM2835)   += clk-bcm2835.o

And then select COMMON_CLK_BCM2835 in the ARCH_BCM2xxx definitions.

Then we can just use the same uart node as in bcm2835.dtsi.
And maybe in the near future, we can include bcm2835.dtsi in bcm2708.dtsi and thus use more of the mainline code.

It's easier to add clocks to the DT, but at some point we will have to take this step, unless of course if the clocks will always be fixed (no clock control interface exposed in the future).

Another thing is that bcm2708.dtsi and bcm2709.dtsi are nearly identical, so the common part could be put in a file of it's own.

@notro
Copy link
Contributor

notro commented Apr 26, 2015

It seems the clk-bcm2835 driver is a relic of the past:

ARM: bcm2835: add stub clock driver

This patch adds a minimal stub clock driver for the BCM2835. Its sole
purpose is to allow the PL011 AMBA clk_get() API calls to provide
something that looks enough like a clock that the driver probes and
operates correctly.

[PATCH 9/9] ARM: BCM283x: Register fixed clocks for uart in the DT

We were previously relying on the fixed clock registration in
clk-bcm2835, but there doesn't seem to be any real reason to not just
define it in the DT (and for the 2836 port, I would have needed to
change the clock's physical address in clk-bcm2835.c).  Also, because
we weren't registering the apb_pclk in clk-bcm2835 as a clock device,
we were picking up the uart clock node as apb_pclk by accident.

Reply by Stephen Warren:

Actually, to maintain DT ABI (new kernels working with old or new DT),
we probably need to keep the code in clk-bcm2835.c, but make it
conditional upon whether there's a clock node in the DT file.

@notro
Copy link
Contributor

notro commented May 25, 2015

Based on what is going on upstream:

I have this suggestion:

diff --git a/arch/arm/boot/dts/bcm2708_common.dtsi b/arch/arm/boot/dts/bcm2708_common.dtsi
index 4bf6960..0fcc8ad 100644
--- a/arch/arm/boot/dts/bcm2708_common.dtsi
+++ b/arch/arm/boot/dts/bcm2708_common.dtsi
@@ -61,6 +61,16 @@
            #interrupt-cells = <2>;
        };

+       uart0: uart@7e201000 {
+           compatible = "brcm,bcm2835-pl011", "arm,pl011", "arm,primecell";
+           reg = <0x7e201000 0x1000>;
+           interrupts = <2 25>;
+           clocks = <&clk_uart0>, <&clk_apb_p>;
+           clock-names = "uartclk", "apb_pclk";
+//         arm,primecell-periphid = <0x00241011>;
+           status = "disabled";
+       };
+
        mmc: mmc@7e300000 {
            compatible = "brcm,bcm2835-mmc";
            reg = <0x7e300000 0x100>;
@@ -156,5 +166,21 @@
            clock-output-names = "spi";
            clock-frequency = <250000000>;
        };
+
+       clk_uart0: clock@3 {
+           compatible = "fixed-clock";
+           reg = <3>;
+           #clock-cells = <0>;
+           clock-output-names = "uart0_pclk";
+           clock-frequency = <3000000>;
+       };
+
+       clk_apb_p: clock@4 {
+           compatible = "fixed-clock";
+           reg = <4>;
+           #clock-cells = <0>;
+           clock-output-names = "apb_pclk";
+           clock-frequency = <126000000>;
+       };
    };
 };

About periphid

This is what we normally get without DT (rev3):

[    0.136106] dev:f1: ttyAMA0 at MMIO 0x20201000 (irq = 83, base_baud = 0) is a PL011 rev3

This is with arm,primecell-periphid = <0x00241011> (rev2):

[    0.136815] 20201000.uart: ttyAMA0 at MMIO 0x20201000 (irq = 83, base_baud = 0) is a PL011 rev2

This is DT without the arm,primecell-periphid property (rev3):

[    0.136840] 20201000.uart: ttyAMA0 at MMIO 0x20201000 (irq = 83, base_baud = 0) is a PL011 rev3

I don't know why mainline wants rev2.
Maybe this discussion is relevant: [PATCH] ARM: PL011: add support for extended FIFO-size of PL011-r1p5

ID probing happens in amba_device_add
DT: of_amba_device_create

About the uart alias you add, is it for the overlay machinery?
I ask because pl011_probe_dt_alias uses serial: of_alias_get_id(np, "serial");

@notro
Copy link
Contributor

notro commented May 25, 2015

Here's the reason for rev2:
ARM: bcm2835: override the HW UART periphid

Stephen Warren reported the recent commit 78506f2 (add support for
extended FIFO-size of PL011-r1p5) breaks the serial port on the
BCM2835 ARM SoC.

A UART compatible with the ARM PL011-r1p5 should have 32-deep FIFOs.
The BCM2835 UART just looks like an ARM PL011-r1p5, but has 16-deep
FIFOs just like PL011-r1p4 or earlier revisions. As a workaround for
this compatibility issue, this patch overrides the HW UART periphid
register values with the actually compatible UART periphid 0x00241011
(r1p3 or r1p4).

@notro
Copy link
Contributor

notro commented Jun 4, 2015

@pelwell This is the last device to be moved (except uart1). It would be nice to get this done before bumping the kernel release version, as this would (hopefully) give us a stable Device Tree where we don't break the ABI anymore.

If you have other thing on your plate I'll be happy to make a PR, but we need to address the revision/primecell-periphid issue first. Mainline uses a different setup/revison than the one we're using today and I don't have the necessary knowledge to say what we should choose. Both seem to work, but I haven't tried to flood the port, I have only used the console.

@popcornmix I have seen that you have worked with some patches for the uart, what's your opinion?

I side effect of getting this in DT, is that we should be able to address #887 by using dt-blob.bin to set the uart pins to inputs, and then in a DT overlay we use pinctrl to set the pins back to uart. This should stop the 'Uncompressing Linux..' message.

@pelwell
Copy link
Contributor Author

pelwell commented Jun 4, 2015

Superseded by PR #1005

@pelwell pelwell closed this Jun 4, 2015
popcornmix pushed a commit that referenced this pull request Jun 8, 2021
[ Upstream commit 9b76ead ]

If Qdisc_ops->init() is failed, Qdisc_ops->reset() would be called.
When dsmark_init(Qdisc_ops->init()) is failed, it possibly doesn't
initialize dsmark_qdisc_data->q. But dsmark_reset(Qdisc_ops->reset())
uses dsmark_qdisc_data->q pointer wihtout any null checking.
So, panic would occur.

Test commands:
    sysctl net.core.default_qdisc=dsmark -w
    ip link add dummy0 type dummy
    ip link add vw0 link dummy0 type virt_wifi
    ip link set vw0 up

Splat looks like:
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 3 PID: 684 Comm: ip Not tainted 5.12.0+ #910
RIP: 0010:qdisc_reset+0x2b/0x680
Code: 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 41 56 41 55 41 54
55 48 89 fd 48 83 c7 18 53 48 89 fa 48 c1 ea 03 48 83 ec 20 <80> 3c 02
00 0f 85 09 06 00 00 4c 8b 65 18 0f 1f 44 00 00 65 8b 1d
RSP: 0018:ffff88800fda6bf8 EFLAGS: 00010282
RAX: dffffc0000000000 RBX: ffff8880050ed800 RCX: 0000000000000000
RDX: 0000000000000003 RSI: ffffffff99e34100 RDI: 0000000000000018
RBP: 0000000000000000 R08: fffffbfff346b553 R09: fffffbfff346b553
R10: 0000000000000001 R11: fffffbfff346b552 R12: ffffffffc0824940
R13: ffff888109e83800 R14: 00000000ffffffff R15: ffffffffc08249e0
FS:  00007f5042287680(0000) GS:ffff888119800000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055ae1f4dbd90 CR3: 0000000006760002 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? rcu_read_lock_bh_held+0xa0/0xa0
 dsmark_reset+0x3d/0xf0 [sch_dsmark]
 qdisc_reset+0xa9/0x680
 qdisc_destroy+0x84/0x370
 qdisc_create_dflt+0x1fe/0x380
 attach_one_default_qdisc.constprop.41+0xa4/0x180
 dev_activate+0x4d5/0x8c0
 ? __dev_open+0x268/0x390
 __dev_open+0x270/0x390

Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this pull request Jun 8, 2021
[ Upstream commit 9b76ead ]

If Qdisc_ops->init() is failed, Qdisc_ops->reset() would be called.
When dsmark_init(Qdisc_ops->init()) is failed, it possibly doesn't
initialize dsmark_qdisc_data->q. But dsmark_reset(Qdisc_ops->reset())
uses dsmark_qdisc_data->q pointer wihtout any null checking.
So, panic would occur.

Test commands:
    sysctl net.core.default_qdisc=dsmark -w
    ip link add dummy0 type dummy
    ip link add vw0 link dummy0 type virt_wifi
    ip link set vw0 up

Splat looks like:
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 3 PID: 684 Comm: ip Not tainted 5.12.0+ #910
RIP: 0010:qdisc_reset+0x2b/0x680
Code: 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 41 56 41 55 41 54
55 48 89 fd 48 83 c7 18 53 48 89 fa 48 c1 ea 03 48 83 ec 20 <80> 3c 02
00 0f 85 09 06 00 00 4c 8b 65 18 0f 1f 44 00 00 65 8b 1d
RSP: 0018:ffff88800fda6bf8 EFLAGS: 00010282
RAX: dffffc0000000000 RBX: ffff8880050ed800 RCX: 0000000000000000
RDX: 0000000000000003 RSI: ffffffff99e34100 RDI: 0000000000000018
RBP: 0000000000000000 R08: fffffbfff346b553 R09: fffffbfff346b553
R10: 0000000000000001 R11: fffffbfff346b552 R12: ffffffffc0824940
R13: ffff888109e83800 R14: 00000000ffffffff R15: ffffffffc08249e0
FS:  00007f5042287680(0000) GS:ffff888119800000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055ae1f4dbd90 CR3: 0000000006760002 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? rcu_read_lock_bh_held+0xa0/0xa0
 dsmark_reset+0x3d/0xf0 [sch_dsmark]
 qdisc_reset+0xa9/0x680
 qdisc_destroy+0x84/0x370
 qdisc_create_dflt+0x1fe/0x380
 attach_one_default_qdisc.constprop.41+0xa4/0x180
 dev_activate+0x4d5/0x8c0
 ? __dev_open+0x268/0x390
 __dev_open+0x270/0x390

Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this pull request Jan 9, 2023
…data

commit 8111964 upstream.

Following concurrent processes:

          P1(drop cache)                P2(kworker)
drop_caches_sysctl_handler
 drop_slab
  shrink_slab
   down_read(&shrinker_rwsem)  - LOCK A
   do_shrink_slab
    super_cache_scan
     prune_icache_sb
      dispose_list
       evict
        ext4_evict_inode
	 ext4_clear_inode
	  ext4_discard_preallocations
	   ext4_mb_load_buddy_gfp
	    ext4_mb_init_cache
	     ext4_read_block_bitmap_nowait
	      ext4_read_bh_nowait
	       submit_bh
	        dm_submit_bio
		                 do_worker
				  process_deferred_bios
				   commit
				    metadata_operation_failed
				     dm_pool_abort_metadata
				      down_write(&pmd->root_lock) - LOCK B
		                      __destroy_persistent_data_objects
				       dm_block_manager_destroy
				        dm_bufio_client_destroy
				         unregister_shrinker
					  down_write(&shrinker_rwsem)
		 thin_map                            |
		  dm_thin_find_block                 ↓
		   down_read(&pmd->root_lock) --> ABBA deadlock

, which triggers hung task:

[   76.974820] INFO: task kworker/u4:3:63 blocked for more than 15 seconds.
[   76.976019]       Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910
[   76.978521] task:kworker/u4:3    state:D stack:0     pid:63    ppid:2
[   76.978534] Workqueue: dm-thin do_worker
[   76.978552] Call Trace:
[   76.978564]  __schedule+0x6ba/0x10f0
[   76.978582]  schedule+0x9d/0x1e0
[   76.978588]  rwsem_down_write_slowpath+0x587/0xdf0
[   76.978600]  down_write+0xec/0x110
[   76.978607]  unregister_shrinker+0x2c/0xf0
[   76.978616]  dm_bufio_client_destroy+0x116/0x3d0
[   76.978625]  dm_block_manager_destroy+0x19/0x40
[   76.978629]  __destroy_persistent_data_objects+0x5e/0x70
[   76.978636]  dm_pool_abort_metadata+0x8e/0x100
[   76.978643]  metadata_operation_failed+0x86/0x110
[   76.978649]  commit+0x6a/0x230
[   76.978655]  do_worker+0xc6e/0xd90
[   76.978702]  process_one_work+0x269/0x630
[   76.978714]  worker_thread+0x266/0x630
[   76.978730]  kthread+0x151/0x1b0
[   76.978772] INFO: task test.sh:2646 blocked for more than 15 seconds.
[   76.979756]       Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910
[   76.982111] task:test.sh         state:D stack:0     pid:2646  ppid:2459
[   76.982128] Call Trace:
[   76.982139]  __schedule+0x6ba/0x10f0
[   76.982155]  schedule+0x9d/0x1e0
[   76.982159]  rwsem_down_read_slowpath+0x4f4/0x910
[   76.982173]  down_read+0x84/0x170
[   76.982177]  dm_thin_find_block+0x4c/0xd0
[   76.982183]  thin_map+0x201/0x3d0
[   76.982188]  __map_bio+0x5b/0x350
[   76.982195]  dm_submit_bio+0x2b6/0x930
[   76.982202]  __submit_bio+0x123/0x2d0
[   76.982209]  submit_bio_noacct_nocheck+0x101/0x3e0
[   76.982222]  submit_bio_noacct+0x389/0x770
[   76.982227]  submit_bio+0x50/0xc0
[   76.982232]  submit_bh_wbc+0x15e/0x230
[   76.982238]  submit_bh+0x14/0x20
[   76.982241]  ext4_read_bh_nowait+0xc5/0x130
[   76.982247]  ext4_read_block_bitmap_nowait+0x340/0xc60
[   76.982254]  ext4_mb_init_cache+0x1ce/0xdc0
[   76.982259]  ext4_mb_load_buddy_gfp+0x987/0xfa0
[   76.982263]  ext4_discard_preallocations+0x45d/0x830
[   76.982274]  ext4_clear_inode+0x48/0xf0
[   76.982280]  ext4_evict_inode+0xcf/0xc70
[   76.982285]  evict+0x119/0x2b0
[   76.982290]  dispose_list+0x43/0xa0
[   76.982294]  prune_icache_sb+0x64/0x90
[   76.982298]  super_cache_scan+0x155/0x210
[   76.982303]  do_shrink_slab+0x19e/0x4e0
[   76.982310]  shrink_slab+0x2bd/0x450
[   76.982317]  drop_slab+0xcc/0x1a0
[   76.982323]  drop_caches_sysctl_handler+0xb7/0xe0
[   76.982327]  proc_sys_call_handler+0x1bc/0x300
[   76.982331]  proc_sys_write+0x17/0x20
[   76.982334]  vfs_write+0x3d3/0x570
[   76.982342]  ksys_write+0x73/0x160
[   76.982347]  __x64_sys_write+0x1e/0x30
[   76.982352]  do_syscall_64+0x35/0x80
[   76.982357]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Function metadata_operation_failed() is called when operations failed
on dm pool metadata, dm pool will destroy and recreate metadata. So,
shrinker will be unregistered and registered, which could down write
shrinker_rwsem under pmd_write_lock.

Fix it by allocating dm_block_manager before locking pmd->root_lock
and destroying old dm_block_manager after unlocking pmd->root_lock,
then old dm_block_manager is replaced with new dm_block_manager under
pmd->root_lock. So, shrinker register/unregister could be done without
holding pmd->root_lock.

Fetch a reproducer in [Link].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216676
Cc: [email protected] #v5.2+
Fixes: e49e582 ("dm thin: add read only and fail io modes")
Signed-off-by: Zhihao Cheng <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this pull request Jan 12, 2023
…data

commit 8111964 upstream.

Following concurrent processes:

          P1(drop cache)                P2(kworker)
drop_caches_sysctl_handler
 drop_slab
  shrink_slab
   down_read(&shrinker_rwsem)  - LOCK A
   do_shrink_slab
    super_cache_scan
     prune_icache_sb
      dispose_list
       evict
        ext4_evict_inode
	 ext4_clear_inode
	  ext4_discard_preallocations
	   ext4_mb_load_buddy_gfp
	    ext4_mb_init_cache
	     ext4_read_block_bitmap_nowait
	      ext4_read_bh_nowait
	       submit_bh
	        dm_submit_bio
		                 do_worker
				  process_deferred_bios
				   commit
				    metadata_operation_failed
				     dm_pool_abort_metadata
				      down_write(&pmd->root_lock) - LOCK B
		                      __destroy_persistent_data_objects
				       dm_block_manager_destroy
				        dm_bufio_client_destroy
				         unregister_shrinker
					  down_write(&shrinker_rwsem)
		 thin_map                            |
		  dm_thin_find_block                 ↓
		   down_read(&pmd->root_lock) --> ABBA deadlock

, which triggers hung task:

[   76.974820] INFO: task kworker/u4:3:63 blocked for more than 15 seconds.
[   76.976019]       Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910
[   76.978521] task:kworker/u4:3    state:D stack:0     pid:63    ppid:2
[   76.978534] Workqueue: dm-thin do_worker
[   76.978552] Call Trace:
[   76.978564]  __schedule+0x6ba/0x10f0
[   76.978582]  schedule+0x9d/0x1e0
[   76.978588]  rwsem_down_write_slowpath+0x587/0xdf0
[   76.978600]  down_write+0xec/0x110
[   76.978607]  unregister_shrinker+0x2c/0xf0
[   76.978616]  dm_bufio_client_destroy+0x116/0x3d0
[   76.978625]  dm_block_manager_destroy+0x19/0x40
[   76.978629]  __destroy_persistent_data_objects+0x5e/0x70
[   76.978636]  dm_pool_abort_metadata+0x8e/0x100
[   76.978643]  metadata_operation_failed+0x86/0x110
[   76.978649]  commit+0x6a/0x230
[   76.978655]  do_worker+0xc6e/0xd90
[   76.978702]  process_one_work+0x269/0x630
[   76.978714]  worker_thread+0x266/0x630
[   76.978730]  kthread+0x151/0x1b0
[   76.978772] INFO: task test.sh:2646 blocked for more than 15 seconds.
[   76.979756]       Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910
[   76.982111] task:test.sh         state:D stack:0     pid:2646  ppid:2459
[   76.982128] Call Trace:
[   76.982139]  __schedule+0x6ba/0x10f0
[   76.982155]  schedule+0x9d/0x1e0
[   76.982159]  rwsem_down_read_slowpath+0x4f4/0x910
[   76.982173]  down_read+0x84/0x170
[   76.982177]  dm_thin_find_block+0x4c/0xd0
[   76.982183]  thin_map+0x201/0x3d0
[   76.982188]  __map_bio+0x5b/0x350
[   76.982195]  dm_submit_bio+0x2b6/0x930
[   76.982202]  __submit_bio+0x123/0x2d0
[   76.982209]  submit_bio_noacct_nocheck+0x101/0x3e0
[   76.982222]  submit_bio_noacct+0x389/0x770
[   76.982227]  submit_bio+0x50/0xc0
[   76.982232]  submit_bh_wbc+0x15e/0x230
[   76.982238]  submit_bh+0x14/0x20
[   76.982241]  ext4_read_bh_nowait+0xc5/0x130
[   76.982247]  ext4_read_block_bitmap_nowait+0x340/0xc60
[   76.982254]  ext4_mb_init_cache+0x1ce/0xdc0
[   76.982259]  ext4_mb_load_buddy_gfp+0x987/0xfa0
[   76.982263]  ext4_discard_preallocations+0x45d/0x830
[   76.982274]  ext4_clear_inode+0x48/0xf0
[   76.982280]  ext4_evict_inode+0xcf/0xc70
[   76.982285]  evict+0x119/0x2b0
[   76.982290]  dispose_list+0x43/0xa0
[   76.982294]  prune_icache_sb+0x64/0x90
[   76.982298]  super_cache_scan+0x155/0x210
[   76.982303]  do_shrink_slab+0x19e/0x4e0
[   76.982310]  shrink_slab+0x2bd/0x450
[   76.982317]  drop_slab+0xcc/0x1a0
[   76.982323]  drop_caches_sysctl_handler+0xb7/0xe0
[   76.982327]  proc_sys_call_handler+0x1bc/0x300
[   76.982331]  proc_sys_write+0x17/0x20
[   76.982334]  vfs_write+0x3d3/0x570
[   76.982342]  ksys_write+0x73/0x160
[   76.982347]  __x64_sys_write+0x1e/0x30
[   76.982352]  do_syscall_64+0x35/0x80
[   76.982357]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Function metadata_operation_failed() is called when operations failed
on dm pool metadata, dm pool will destroy and recreate metadata. So,
shrinker will be unregistered and registered, which could down write
shrinker_rwsem under pmd_write_lock.

Fix it by allocating dm_block_manager before locking pmd->root_lock
and destroying old dm_block_manager after unlocking pmd->root_lock,
then old dm_block_manager is replaced with new dm_block_manager under
pmd->root_lock. So, shrinker register/unregister could be done without
holding pmd->root_lock.

Fetch a reproducer in [Link].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216676
Cc: [email protected] #v5.2+
Fixes: e49e582 ("dm thin: add read only and fail io modes")
Signed-off-by: Zhihao Cheng <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this pull request Jan 13, 2023
…data

commit 8111964 upstream.

Following concurrent processes:

          P1(drop cache)                P2(kworker)
drop_caches_sysctl_handler
 drop_slab
  shrink_slab
   down_read(&shrinker_rwsem)  - LOCK A
   do_shrink_slab
    super_cache_scan
     prune_icache_sb
      dispose_list
       evict
        ext4_evict_inode
	 ext4_clear_inode
	  ext4_discard_preallocations
	   ext4_mb_load_buddy_gfp
	    ext4_mb_init_cache
	     ext4_read_block_bitmap_nowait
	      ext4_read_bh_nowait
	       submit_bh
	        dm_submit_bio
		                 do_worker
				  process_deferred_bios
				   commit
				    metadata_operation_failed
				     dm_pool_abort_metadata
				      down_write(&pmd->root_lock) - LOCK B
		                      __destroy_persistent_data_objects
				       dm_block_manager_destroy
				        dm_bufio_client_destroy
				         unregister_shrinker
					  down_write(&shrinker_rwsem)
		 thin_map                            |
		  dm_thin_find_block                 ↓
		   down_read(&pmd->root_lock) --> ABBA deadlock

, which triggers hung task:

[   76.974820] INFO: task kworker/u4:3:63 blocked for more than 15 seconds.
[   76.976019]       Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910
[   76.978521] task:kworker/u4:3    state:D stack:0     pid:63    ppid:2
[   76.978534] Workqueue: dm-thin do_worker
[   76.978552] Call Trace:
[   76.978564]  __schedule+0x6ba/0x10f0
[   76.978582]  schedule+0x9d/0x1e0
[   76.978588]  rwsem_down_write_slowpath+0x587/0xdf0
[   76.978600]  down_write+0xec/0x110
[   76.978607]  unregister_shrinker+0x2c/0xf0
[   76.978616]  dm_bufio_client_destroy+0x116/0x3d0
[   76.978625]  dm_block_manager_destroy+0x19/0x40
[   76.978629]  __destroy_persistent_data_objects+0x5e/0x70
[   76.978636]  dm_pool_abort_metadata+0x8e/0x100
[   76.978643]  metadata_operation_failed+0x86/0x110
[   76.978649]  commit+0x6a/0x230
[   76.978655]  do_worker+0xc6e/0xd90
[   76.978702]  process_one_work+0x269/0x630
[   76.978714]  worker_thread+0x266/0x630
[   76.978730]  kthread+0x151/0x1b0
[   76.978772] INFO: task test.sh:2646 blocked for more than 15 seconds.
[   76.979756]       Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910
[   76.982111] task:test.sh         state:D stack:0     pid:2646  ppid:2459
[   76.982128] Call Trace:
[   76.982139]  __schedule+0x6ba/0x10f0
[   76.982155]  schedule+0x9d/0x1e0
[   76.982159]  rwsem_down_read_slowpath+0x4f4/0x910
[   76.982173]  down_read+0x84/0x170
[   76.982177]  dm_thin_find_block+0x4c/0xd0
[   76.982183]  thin_map+0x201/0x3d0
[   76.982188]  __map_bio+0x5b/0x350
[   76.982195]  dm_submit_bio+0x2b6/0x930
[   76.982202]  __submit_bio+0x123/0x2d0
[   76.982209]  submit_bio_noacct_nocheck+0x101/0x3e0
[   76.982222]  submit_bio_noacct+0x389/0x770
[   76.982227]  submit_bio+0x50/0xc0
[   76.982232]  submit_bh_wbc+0x15e/0x230
[   76.982238]  submit_bh+0x14/0x20
[   76.982241]  ext4_read_bh_nowait+0xc5/0x130
[   76.982247]  ext4_read_block_bitmap_nowait+0x340/0xc60
[   76.982254]  ext4_mb_init_cache+0x1ce/0xdc0
[   76.982259]  ext4_mb_load_buddy_gfp+0x987/0xfa0
[   76.982263]  ext4_discard_preallocations+0x45d/0x830
[   76.982274]  ext4_clear_inode+0x48/0xf0
[   76.982280]  ext4_evict_inode+0xcf/0xc70
[   76.982285]  evict+0x119/0x2b0
[   76.982290]  dispose_list+0x43/0xa0
[   76.982294]  prune_icache_sb+0x64/0x90
[   76.982298]  super_cache_scan+0x155/0x210
[   76.982303]  do_shrink_slab+0x19e/0x4e0
[   76.982310]  shrink_slab+0x2bd/0x450
[   76.982317]  drop_slab+0xcc/0x1a0
[   76.982323]  drop_caches_sysctl_handler+0xb7/0xe0
[   76.982327]  proc_sys_call_handler+0x1bc/0x300
[   76.982331]  proc_sys_write+0x17/0x20
[   76.982334]  vfs_write+0x3d3/0x570
[   76.982342]  ksys_write+0x73/0x160
[   76.982347]  __x64_sys_write+0x1e/0x30
[   76.982352]  do_syscall_64+0x35/0x80
[   76.982357]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Function metadata_operation_failed() is called when operations failed
on dm pool metadata, dm pool will destroy and recreate metadata. So,
shrinker will be unregistered and registered, which could down write
shrinker_rwsem under pmd_write_lock.

Fix it by allocating dm_block_manager before locking pmd->root_lock
and destroying old dm_block_manager after unlocking pmd->root_lock,
then old dm_block_manager is replaced with new dm_block_manager under
pmd->root_lock. So, shrinker register/unregister could be done without
holding pmd->root_lock.

Fetch a reproducer in [Link].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216676
Cc: [email protected] #v5.2+
Fixes: e49e582 ("dm thin: add read only and fail io modes")
Signed-off-by: Zhihao Cheng <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants