-
Notifications
You must be signed in to change notification settings - Fork 131
Clean up and document RCU-based object protection for XDP_REDIRECT #1316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The xchg() and cmpxchg() functions are sometimes used to carry out RCU updates. Unfortunately, this can result in sparse warnings for both the old-value and new-value arguments, as well as for the return value. The arguments can be dealt with using RCU_INITIALIZER(): old_p = xchg(&p, RCU_INITIALIZER(new_p)); But a sparse warning still remains due to assigning the __rcu pointer returned from xchg to the (most likely) non-__rcu pointer old_p. This commit therefore provides an unrcu_pointer() macro that strips the __rcu. This macro can be used as follows: old_p = unrcu_pointer(xchg(&p, RCU_INITIALIZER(new_p))); Reported-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
XDP programs are called from a NAPI poll context, which means the RCU reference liveness is ensured by local_bh_disable(). Add rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so lockdep understands that the dereferences are safe from inside *either* an rcu_read_lock() section *or* a local_bh_disable() section. This is done in preparation for removing the redundant rcu_read_lock()s from the drivers. Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
…dev ref Some of the XDP helpers (in particular, xdp_do_redirect()) will get a struct net_device reference using dev_get_by_index_rcu(). These are called from a NAPI poll context, which means the RCU reference liveness is ensured by local_bh_disable(). Add rcu_read_lock_bh_held() as a condition to the RCU list traversal in dev_get_by_index_rcu() so lockdep understands that the dereferences are safe from *both* an rcu_read_lock() *and* with local_bh_disable(). Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
XDP_REDIRECT works by a three-step process: the bpf_redirect() and bpf_redirect_map() helpers will lookup the target of the redirect and store it (along with some other metadata) in a per-CPU struct bpf_redirect_info. Next, when the program returns the XDP_REDIRECT return code, the driver will call xdp_do_redirect() which will use the information thus stored to actually enqueue the frame into a bulk queue structure (that differs slightly by map type, but shares the same principle). Finally, before exiting its NAPI poll loop, the driver will call xdp_do_flush(), which will flush all the different bulk queues, thus completing the redirect. Pointers to the map entries will be kept around for this whole sequence of steps, protected by RCU. However, there is no top-level rcu_read_lock() in the core code; instead drivers add their own rcu_read_lock() around the XDP portions of the code, but somewhat inconsistently as Martin discovered[0]. However, things still work because everything happens inside a single NAPI poll sequence, which means it's between a pair of calls to local_bh_disable()/local_bh_enable(). So Paul suggested[1] that we could document this intention by using rcu_dereference_check() with rcu_read_lock_bh_held() as a second parameter, thus allowing sparse and lockdep to verify that everything is done correctly. This patch does just that: we add an __rcu annotation to the map entry pointers and remove the various comments explaining the NAPI poll assurance strewn through devmap.c in favour of a longer explanation in filter.c. The goal is to have one coherent documentation of the entire flow, and rely on the RCU annotations as a "standard" way of communicating the flow in the map code (which can additionally be understood by sparse and lockdep). The RCU annotation replacements result in a fairly straight-forward replacement where READ_ONCE() becomes rcu_dereference_check(), WRITE_ONCE() becomes rcu_assign_pointer() and xchg() and cmpxchg() gets wrapped in the proper constructs to cast the pointer back and forth between __rcu and __kernel address space (for the benefit of sparse). The one complication is that xskmap has a few constructions where double-pointers are passed back and forth; these simply all gain __rcu annotations, and only the final reference/dereference to the inner-most pointer gets changed. With this, everything can be run through sparse without eliciting complaints, and lockdep can verify correctness even without the use of rcu_read_lock() in the drivers. Subsequent patches will clean these up from the drivers. [0] https://lore.kernel.org/bpf/[email protected]/ [1] https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/ Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The ena driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Guy Tzalik <[email protected]> Cc: Saeed Bishara <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The bnxt driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Michael Chan <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The thunderx driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Sunil Goutham <[email protected]> Cc: [email protected] Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The dpaa and dpaa2 drivers have rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Madalin Bucur <[email protected]> Cc: Ioana Ciornei <[email protected]> Cc: Ioana Radulescu <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Jesse Brandeburg <[email protected]> Cc: Tony Nguyen <[email protected]> Cc: [email protected] Tested-by: Jesper Dangaard Brouer <[email protected]> # i40e Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The mvneta and mvpp2 drivers have rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Thomas Petazzoni <[email protected]> Cc: Marcin Wojtas <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The mlx4 driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Also switch the RCU dereferences in the driver loop itself to the _bh variants. Cc: Tariq Toukan <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The nfp driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. While this is not actually an issue for the nfp driver because it doesn't support XDP_REDIRECT (and thus doesn't call xdp_do_flush()), the rcu_read_lock() is still unneeded. And With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Simon Horman <[email protected]> Cc: [email protected] Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The qede driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Ariel Elior <[email protected]> Cc: [email protected] Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The sfc driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Edward Cree <[email protected]> Cc: Martin Habets <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The netsec driver has a rcu_read_lock()/rcu_read_unlock() pair around the full RX loop, covering everything up to and including xdp_do_flush(). This is actually the correct behaviour, but because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), it is also technically redundant. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep the rcu_read_lock() around anymore, so let's just remove it. Cc: Jassi Brar <[email protected]> Cc: Ilias Apalodimas <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The stmmac driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Giuseppe Cavallaro <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Jose Abreu <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
The cpsw driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Cc: Grygorii Strashko <[email protected]> Cc: [email protected] Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Master branch: 380afe7 |
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=497075 expired. Closing PR. |
kuba-moo
pushed a commit
to linux-netdev/testing-bpf-ci
that referenced
this pull request
Apr 16, 2025
Fix checkpatch code style errors: ERROR: else should follow close brace '}' kernel-patches#1317: FILE: net/core/pktgen.c:1317: + } + else And checkpatch follow up code style check: CHECK: Unbalanced braces around else statement kernel-patches#1316: FILE: net/core/pktgen.c:1316: + } else Signed-off-by: Peter Seiderer <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kuba-moo
pushed a commit
to linux-netdev/testing-bpf-ci
that referenced
this pull request
Apr 16, 2025
Fix checkpatch code style errors: ERROR: else should follow close brace '}' kernel-patches#1317: FILE: net/core/pktgen.c:1317: + } + else And checkpatch follow up code style check: CHECK: Unbalanced braces around else statement kernel-patches#1316: FILE: net/core/pktgen.c:1316: + } else Signed-off-by: Peter Seiderer <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kuba-moo
pushed a commit
to linux-netdev/testing-bpf-ci
that referenced
this pull request
Apr 16, 2025
Fix checkpatch code style errors: ERROR: else should follow close brace '}' kernel-patches#1317: FILE: net/core/pktgen.c:1317: + } + else And checkpatch follow up code style check: CHECK: Unbalanced braces around else statement kernel-patches#1316: FILE: net/core/pktgen.c:1316: + } else Signed-off-by: Peter Seiderer <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kuba-moo
pushed a commit
to linux-netdev/testing-bpf-ci
that referenced
this pull request
Apr 16, 2025
Fix checkpatch code style errors: ERROR: else should follow close brace '}' kernel-patches#1317: FILE: net/core/pktgen.c:1317: + } + else And checkpatch follow up code style check: CHECK: Unbalanced braces around else statement kernel-patches#1316: FILE: net/core/pktgen.c:1316: + } else Signed-off-by: Peter Seiderer <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kuba-moo
pushed a commit
to linux-netdev/testing-bpf-ci
that referenced
this pull request
Apr 16, 2025
Fix checkpatch code style errors: ERROR: else should follow close brace '}' kernel-patches#1317: FILE: net/core/pktgen.c:1317: + } + else And checkpatch follow up code style check: CHECK: Unbalanced braces around else statement kernel-patches#1316: FILE: net/core/pktgen.c:1316: + } else Signed-off-by: Peter Seiderer <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kuba-moo
pushed a commit
to linux-netdev/testing-bpf-ci
that referenced
this pull request
Apr 16, 2025
Fix checkpatch code style errors: ERROR: else should follow close brace '}' kernel-patches#1317: FILE: net/core/pktgen.c:1317: + } + else And checkpatch follow up code style check: CHECK: Unbalanced braces around else statement kernel-patches#1316: FILE: net/core/pktgen.c:1316: + } else Signed-off-by: Peter Seiderer <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kuba-moo
pushed a commit
to linux-netdev/testing-bpf-ci
that referenced
this pull request
Apr 16, 2025
Fix checkpatch code style errors: ERROR: else should follow close brace '}' kernel-patches#1317: FILE: net/core/pktgen.c:1317: + } + else And checkpatch follow up code style check: CHECK: Unbalanced braces around else statement kernel-patches#1316: FILE: net/core/pktgen.c:1316: + } else Signed-off-by: Peter Seiderer <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kuba-moo
pushed a commit
to linux-netdev/testing-bpf-ci
that referenced
this pull request
Apr 17, 2025
Fix checkpatch code style errors: ERROR: else should follow close brace '}' kernel-patches#1317: FILE: net/core/pktgen.c:1317: + } + else And checkpatch follow up code style check: CHECK: Unbalanced braces around else statement kernel-patches#1316: FILE: net/core/pktgen.c:1316: + } else Signed-off-by: Peter Seiderer <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kuba-moo
pushed a commit
to linux-netdev/testing-bpf-ci
that referenced
this pull request
Apr 17, 2025
Fix checkpatch code style errors: ERROR: else should follow close brace '}' kernel-patches#1317: FILE: net/core/pktgen.c:1317: + } + else And checkpatch follow up code style check: CHECK: Unbalanced braces around else statement kernel-patches#1316: FILE: net/core/pktgen.c:1316: + } else Signed-off-by: Peter Seiderer <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kuba-moo
pushed a commit
to linux-netdev/testing-bpf-ci
that referenced
this pull request
Apr 17, 2025
Fix checkpatch code style errors: ERROR: else should follow close brace '}' kernel-patches#1317: FILE: net/core/pktgen.c:1317: + } + else And checkpatch follow up code style check: CHECK: Unbalanced braces around else statement kernel-patches#1316: FILE: net/core/pktgen.c:1316: + } else Signed-off-by: Peter Seiderer <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: Clean up and document RCU-based object protection for XDP_REDIRECT
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=497075