Skip to content

Commit a583117

Browse files
dthompsogregkh
authored andcommitted
mlxbf_gige: call request_irq() after NAPI initialized
[ Upstream commit f7442a6 ] The mlxbf_gige driver encounters a NULL pointer exception in mlxbf_gige_open() when kdump is enabled. The sequence to reproduce the exception is as follows: a) enable kdump b) trigger kdump via "echo c > /proc/sysrq-trigger" c) kdump kernel executes d) kdump kernel loads mlxbf_gige module e) the mlxbf_gige module runs its open() as the the "oob_net0" interface is brought up f) mlxbf_gige module will experience an exception during its open(), something like: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000086000004 EC = 0x21: IABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000086000004 [#1] SMP CPU: 0 PID: 812 Comm: NetworkManager Tainted: G OE 5.15.0-1035-bluefield #37-Ubuntu Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024 pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0x0 lr : __napi_poll+0x40/0x230 sp : ffff800008003e00 x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8 x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000 x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000 x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0 x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398 x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2 x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238 Call trace: 0x0 net_rx_action+0x178/0x360 __do_softirq+0x15c/0x428 __irq_exit_rcu+0xac/0xec irq_exit+0x18/0x2c handle_domain_irq+0x6c/0xa0 gic_handle_irq+0xec/0x1b0 call_on_irq_stack+0x20/0x2c do_interrupt_handler+0x5c/0x70 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x18/0x2c el1h_64_irq+0x7c/0x80 __setup_irq+0x4c0/0x950 request_threaded_irq+0xf4/0x1bc mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige] mlxbf_gige_open+0x5c/0x170 [mlxbf_gige] __dev_open+0x100/0x220 __dev_change_flags+0x16c/0x1f0 dev_change_flags+0x2c/0x70 do_setlink+0x220/0xa40 __rtnl_newlink+0x56c/0x8a0 rtnl_newlink+0x58/0x84 rtnetlink_rcv_msg+0x138/0x3c4 netlink_rcv_skb+0x64/0x130 rtnetlink_rcv+0x20/0x30 netlink_unicast+0x2ec/0x360 netlink_sendmsg+0x278/0x490 __sock_sendmsg+0x5c/0x6c ____sys_sendmsg+0x290/0x2d4 ___sys_sendmsg+0x84/0xd0 __sys_sendmsg+0x70/0xd0 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x78/0x100 el0_svc_common.constprop.0+0x54/0x184 do_el0_svc+0x30/0xac el0_svc+0x48/0x160 el0t_64_sync_handler+0xa4/0x12c el0t_64_sync+0x1a4/0x1a8 Code: bad PC value ---[ end trace 7d1c3f3bf9d81885 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Kernel Offset: 0x2870a7a00000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x0,000005c1,a3332a5a Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- The exception happens because there is a pending RX interrupt before the call to request_irq(RX IRQ) executes. Then, the RX IRQ handler fires immediately after this request_irq() completes. The RX IRQ handler runs "napi_schedule()" before NAPI is fully initialized via "netif_napi_add()" and "napi_enable()", both which happen later in the open() logic. The logic in mlxbf_gige_open() must fully initialize NAPI before any calls to request_irq() execute. Fixes: f92e186 ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson <[email protected]> Reviewed-by: Asmaa Mnebhi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
1 parent 85c410f commit a583117

File tree

1 file changed

+11
-7
lines changed

1 file changed

+11
-7
lines changed

drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -141,13 +141,10 @@ static int mlxbf_gige_open(struct net_device *netdev)
141141
control |= MLXBF_GIGE_CONTROL_PORT_EN;
142142
writeq(control, priv->base + MLXBF_GIGE_CONTROL);
143143

144-
err = mlxbf_gige_request_irqs(priv);
145-
if (err)
146-
return err;
147144
mlxbf_gige_cache_stats(priv);
148145
err = mlxbf_gige_clean_port(priv);
149146
if (err)
150-
goto free_irqs;
147+
return err;
151148

152149
/* Clear driver's valid_polarity to match hardware,
153150
* since the above call to clean_port() resets the
@@ -168,6 +165,10 @@ static int mlxbf_gige_open(struct net_device *netdev)
168165
napi_enable(&priv->napi);
169166
netif_start_queue(netdev);
170167

168+
err = mlxbf_gige_request_irqs(priv);
169+
if (err)
170+
goto napi_deinit;
171+
171172
/* Set bits in INT_EN that we care about */
172173
int_en = MLXBF_GIGE_INT_EN_HW_ACCESS_ERROR |
173174
MLXBF_GIGE_INT_EN_TX_CHECKSUM_INPUTS |
@@ -184,14 +185,17 @@ static int mlxbf_gige_open(struct net_device *netdev)
184185

185186
return 0;
186187

188+
napi_deinit:
189+
netif_stop_queue(netdev);
190+
napi_disable(&priv->napi);
191+
netif_napi_del(&priv->napi);
192+
mlxbf_gige_rx_deinit(priv);
193+
187194
tx_deinit:
188195
mlxbf_gige_tx_deinit(priv);
189196

190197
phy_deinit:
191198
phy_stop(phydev);
192-
193-
free_irqs:
194-
mlxbf_gige_free_irqs(priv);
195199
return err;
196200
}
197201

0 commit comments

Comments
 (0)