Skip to content

Commit de55c9a

Browse files
author
Alexei Starovoitov
committed
Merge branch 'Add support for transmitting packets using XDP in bpf_prog_run()'
Toke Høiland-Jørgensen says: ==================== This series adds support for transmitting packets using XDP in bpf_prog_run(), by enabling a new mode "live packet" mode which will handle the XDP program return codes and redirect the packets to the stack or other devices. The primary use case for this is testing the redirect map types and the ndo_xdp_xmit driver operation without an external traffic generator. But it turns out to also be useful for creating a programmable traffic generator in XDP, as well as injecting frames into the stack. A sample traffic generator, which was included in previous versions of the series, but now moved to xdp-tools, transmits up to 9 Mpps/core on my test machine. To transmit the frames, the new mode instantiates a page_pool structure in bpf_prog_run() and initialises the pages to contain XDP frames with the data passed in by userspace. These frames can then be handled as though they came from the hardware XDP path, and the existing page_pool code takes care of returning and recycling them. The setup is optimised for high performance with a high number of repetitions to support stress testing and the traffic generator use case; see patch 1 for details. v11: - Fix override of return code in xdp_test_run_batch() - Add Martin's ACKs to remaining patches v10: - Only propagate memory allocation errors from xdp_test_run_batch() - Get rid of BPF_F_TEST_XDP_RESERVED; batch_size can be used to probe - Check that batch_size is unset in non-XDP test_run funcs - Lower the number of repetitions in the selftest to 10k - Count number of recycled pages in the selftest - Fix a few other nits from Martin, carry forward ACKs v9: - XDP_DROP packets in the selftest to ensure pages are recycled - Fix a few issues reported by the kernel test robot - Rewrite the documentation of the batch size to make it a bit clearer - Rebase to newest bpf-next v8: - Make the batch size configurable from userspace - Don't interrupt the packet loop on errors in do_redirect (this can be caught from the tracepoint) - Add documentation of the feature - Add reserved flag userspace can use to probe for support (kernel didn't check flags previously) - Rebase to newest bpf-next, disallow live mode for jumbo frames v7: - Extend the local_bh_disable() to cover the full test run loop, to prevent running concurrently with the softirq. Fixes a deadlock with veth xmit. - Reinstate the forwarding sysctl setting in the selftest, and bump up the number of packets being transmitted to trigger the above bug. - Update commit message to make it clear that user space can select the ingress interface. v6: - Fix meta vs data pointer setting and add a selftest for it - Add local_bh_disable() around code passing packets up the stack - Create a new netns for the selftest and use a TC program instead of the forwarding hack to count packets being XDP_PASS'ed from the test prog. - Check for the correct ingress ifindex in the selftest - Rebase and drop patches 1-5 that were already merged v5: - Rebase to current bpf-next v4: - Fix a few code style issues (Alexei) - Also handle the other return codes: XDP_PASS builds skbs and injects them into the stack, and XDP_TX is turned into a redirect out the same interface (Alexei). - Drop the last patch adding an xdp_trafficgen program to samples/bpf; this will live in xdp-tools instead (Alexei). - Add a separate bpf_test_run_xdp_live() function to test_run.c instead of entangling the new mode in the existing bpf_test_run(). v3: - Reorder patches to make sure they all build individually (Patchwork) - Remove a couple of unused variables (Patchwork) - Remove unlikely() annotation in slow path and add back John's ACK that I accidentally dropped for v2 (John) v2: - Split up up __xdp_do_redirect to avoid passing two pointers to it (John) - Always reset context pointers before each test run (John) - Use get_mac_addr() from xdp_sample_user.h instead of rolling our own (Kumar) - Fix wrong offset for metadata pointer ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents 3399dd9 + 55fcacc commit de55c9a

File tree

14 files changed

+821
-105
lines changed

14 files changed

+821
-105
lines changed

Documentation/bpf/bpf_prog_run.rst

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
===================================
4+
Running BPF programs from userspace
5+
===================================
6+
7+
This document describes the ``BPF_PROG_RUN`` facility for running BPF programs
8+
from userspace.
9+
10+
.. contents::
11+
:local:
12+
:depth: 2
13+
14+
15+
Overview
16+
--------
17+
18+
The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to
19+
execute a BPF program in the kernel and return the results to userspace. This
20+
can be used to unit test BPF programs against user-supplied context objects, and
21+
as way to explicitly execute programs in the kernel for their side effects. The
22+
command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue
23+
to be defined in the UAPI header, aliased to the same value.
24+
25+
The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the
26+
following types:
27+
28+
- ``BPF_PROG_TYPE_SOCKET_FILTER``
29+
- ``BPF_PROG_TYPE_SCHED_CLS``
30+
- ``BPF_PROG_TYPE_SCHED_ACT``
31+
- ``BPF_PROG_TYPE_XDP``
32+
- ``BPF_PROG_TYPE_SK_LOOKUP``
33+
- ``BPF_PROG_TYPE_CGROUP_SKB``
34+
- ``BPF_PROG_TYPE_LWT_IN``
35+
- ``BPF_PROG_TYPE_LWT_OUT``
36+
- ``BPF_PROG_TYPE_LWT_XMIT``
37+
- ``BPF_PROG_TYPE_LWT_SEG6LOCAL``
38+
- ``BPF_PROG_TYPE_FLOW_DISSECTOR``
39+
- ``BPF_PROG_TYPE_STRUCT_OPS``
40+
- ``BPF_PROG_TYPE_RAW_TRACEPOINT``
41+
- ``BPF_PROG_TYPE_SYSCALL``
42+
43+
When using the ``BPF_PROG_RUN`` command, userspace supplies an input context
44+
object and (for program types operating on network packets) a buffer containing
45+
the packet data that the BPF program will operate on. The kernel will then
46+
execute the program and return the results to userspace. Note that programs will
47+
not have any side effects while being run in this mode; in particular, packets
48+
will not actually be redirected or dropped, the program return code will just be
49+
returned to userspace. A separate mode for live execution of XDP programs is
50+
provided, documented separately below.
51+
52+
Running XDP programs in "live frame mode"
53+
-----------------------------------------
54+
55+
The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs,
56+
which can be used to execute XDP programs in a way where packets will actually
57+
be processed by the kernel after the execution of the XDP program as if they
58+
arrived on a physical interface. This mode is activated by setting the
59+
``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to
60+
``BPF_PROG_RUN``.
61+
62+
The live packet mode is optimised for high performance execution of the supplied
63+
XDP program many times (suitable for, e.g., running as a traffic generator),
64+
which means the semantics are not quite as straight-forward as the regular test
65+
run mode. Specifically:
66+
67+
- When executing an XDP program in live frame mode, the result of the execution
68+
will not be returned to userspace; instead, the kernel will perform the
69+
operation indicated by the program's return code (drop the packet, redirect
70+
it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes
71+
in the syscall parameters when running in this mode will be rejected. In
72+
addition, not all failures will be reported back to userspace directly;
73+
specifically, only fatal errors in setup or during execution (like memory
74+
allocation errors) will halt execution and return an error. If an error occurs
75+
in packet processing, like a failure to redirect to a given interface,
76+
execution will continue with the next repetition; these errors can be detected
77+
via the same trace points as for regular XDP programs.
78+
79+
- Userspace can supply an ifindex as part of the context object, just like in
80+
the regular (non-live) mode. The XDP program will be executed as though the
81+
packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context
82+
object will point to that interface. Furthermore, if the XDP program returns
83+
``XDP_PASS``, the packet will be injected into the kernel networking stack as
84+
though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet
85+
will be transmitted *out* of that same interface. Do note, though, that
86+
because the program execution is not happening in driver context, an
87+
``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to
88+
that same interface (i.e., it will only work if the driver has support for the
89+
``ndo_xdp_xmit`` driver op).
90+
91+
- When running the program with multiple repetitions, the execution will happen
92+
in batches. The batch size defaults to 64 packets (which is same as the
93+
maximum NAPI receive batch size), but can be specified by userspace through
94+
the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch,
95+
the kernel executes the XDP program repeatedly, each invocation getting a
96+
separate copy of the packet data. For each repetition, if the program drops
97+
the packet, the data page is immediately recycled (see below). Otherwise, the
98+
packet is buffered until the end of the batch, at which point all packets
99+
buffered this way during the batch are transmitted at once.
100+
101+
- When setting up the test run, the kernel will initialise a pool of memory
102+
pages of the same size as the batch size. Each memory page will be initialised
103+
with the initial packet data supplied by userspace at ``BPF_PROG_RUN``
104+
invocation. When possible, the pages will be recycled on future program
105+
invocations, to improve performance. Pages will generally be recycled a full
106+
batch at a time, except when a packet is dropped (by return code or because
107+
of, say, a redirection error), in which case that page will be recycled
108+
immediately. If a packet ends up being passed to the regular networking stack
109+
(because the XDP program returns ``XDP_PASS``, or because it ends up being
110+
redirected to an interface that injects it into the stack), the page will be
111+
released and a new one will be allocated when the pool is empty.
112+
113+
When recycling, the page content is not rewritten; only the packet boundary
114+
pointers (``data``, ``data_end`` and ``data_meta``) in the context object will
115+
be reset to the original values. This means that if a program rewrites the
116+
packet contents, it has to be prepared to see either the original content or
117+
the modified version on subsequent invocations.

Documentation/bpf/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ that goes into great technical depth about the BPF Architecture.
2121
helpers
2222
programs
2323
maps
24+
bpf_prog_run
2425
classic_vs_extended.rst
2526
bpf_licensing
2627
test_debug

include/uapi/linux/bpf.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1232,6 +1232,8 @@ enum {
12321232

12331233
/* If set, run the test on the cpu specified by bpf_attr.test.cpu */
12341234
#define BPF_F_TEST_RUN_ON_CPU (1U << 0)
1235+
/* If set, XDP frames will be transmitted after processing */
1236+
#define BPF_F_TEST_XDP_LIVE_FRAMES (1U << 1)
12351237

12361238
/* type for BPF_ENABLE_STATS */
12371239
enum bpf_stats_type {
@@ -1393,6 +1395,7 @@ union bpf_attr {
13931395
__aligned_u64 ctx_out;
13941396
__u32 flags;
13951397
__u32 cpu;
1398+
__u32 batch_size;
13961399
} test;
13971400

13981401
struct { /* anonymous struct used by BPF_*_GET_*_ID */

kernel/bpf/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ config BPF_SYSCALL
3030
select TASKS_TRACE_RCU
3131
select BINARY_PRINTF
3232
select NET_SOCK_MSG if NET
33+
select PAGE_POOL if NET
3334
default n
3435
help
3536
Enable the bpf() system call that allows to manipulate BPF programs

kernel/bpf/syscall.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3336,7 +3336,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
33363336
}
33373337
}
33383338

3339-
#define BPF_PROG_TEST_RUN_LAST_FIELD test.cpu
3339+
#define BPF_PROG_TEST_RUN_LAST_FIELD test.batch_size
33403340

33413341
static int bpf_prog_test_run(const union bpf_attr *attr,
33423342
union bpf_attr __user *uattr)

0 commit comments

Comments
 (0)