|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 |
| 2 | +
|
| 3 | +=================================== |
| 4 | +Running BPF programs from userspace |
| 5 | +=================================== |
| 6 | + |
| 7 | +This document describes the ``BPF_PROG_RUN`` facility for running BPF programs |
| 8 | +from userspace. |
| 9 | + |
| 10 | +.. contents:: |
| 11 | + :local: |
| 12 | + :depth: 2 |
| 13 | + |
| 14 | + |
| 15 | +Overview |
| 16 | +-------- |
| 17 | + |
| 18 | +The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to |
| 19 | +execute a BPF program in the kernel and return the results to userspace. This |
| 20 | +can be used to unit test BPF programs against user-supplied context objects, and |
| 21 | +as way to explicitly execute programs in the kernel for their side effects. The |
| 22 | +command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue |
| 23 | +to be defined in the UAPI header, aliased to the same value. |
| 24 | + |
| 25 | +The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the |
| 26 | +following types: |
| 27 | + |
| 28 | +- ``BPF_PROG_TYPE_SOCKET_FILTER`` |
| 29 | +- ``BPF_PROG_TYPE_SCHED_CLS`` |
| 30 | +- ``BPF_PROG_TYPE_SCHED_ACT`` |
| 31 | +- ``BPF_PROG_TYPE_XDP`` |
| 32 | +- ``BPF_PROG_TYPE_SK_LOOKUP`` |
| 33 | +- ``BPF_PROG_TYPE_CGROUP_SKB`` |
| 34 | +- ``BPF_PROG_TYPE_LWT_IN`` |
| 35 | +- ``BPF_PROG_TYPE_LWT_OUT`` |
| 36 | +- ``BPF_PROG_TYPE_LWT_XMIT`` |
| 37 | +- ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` |
| 38 | +- ``BPF_PROG_TYPE_FLOW_DISSECTOR`` |
| 39 | +- ``BPF_PROG_TYPE_STRUCT_OPS`` |
| 40 | +- ``BPF_PROG_TYPE_RAW_TRACEPOINT`` |
| 41 | +- ``BPF_PROG_TYPE_SYSCALL`` |
| 42 | + |
| 43 | +When using the ``BPF_PROG_RUN`` command, userspace supplies an input context |
| 44 | +object and (for program types operating on network packets) a buffer containing |
| 45 | +the packet data that the BPF program will operate on. The kernel will then |
| 46 | +execute the program and return the results to userspace. Note that programs will |
| 47 | +not have any side effects while being run in this mode; in particular, packets |
| 48 | +will not actually be redirected or dropped, the program return code will just be |
| 49 | +returned to userspace. A separate mode for live execution of XDP programs is |
| 50 | +provided, documented separately below. |
| 51 | + |
| 52 | +Running XDP programs in "live frame mode" |
| 53 | +----------------------------------------- |
| 54 | + |
| 55 | +The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs, |
| 56 | +which can be used to execute XDP programs in a way where packets will actually |
| 57 | +be processed by the kernel after the execution of the XDP program as if they |
| 58 | +arrived on a physical interface. This mode is activated by setting the |
| 59 | +``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to |
| 60 | +``BPF_PROG_RUN``. |
| 61 | + |
| 62 | +The live packet mode is optimised for high performance execution of the supplied |
| 63 | +XDP program many times (suitable for, e.g., running as a traffic generator), |
| 64 | +which means the semantics are not quite as straight-forward as the regular test |
| 65 | +run mode. Specifically: |
| 66 | + |
| 67 | +- When executing an XDP program in live frame mode, the result of the execution |
| 68 | + will not be returned to userspace; instead, the kernel will perform the |
| 69 | + operation indicated by the program's return code (drop the packet, redirect |
| 70 | + it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes |
| 71 | + in the syscall parameters when running in this mode will be rejected. In |
| 72 | + addition, not all failures will be reported back to userspace directly; |
| 73 | + specifically, only fatal errors in setup or during execution (like memory |
| 74 | + allocation errors) will halt execution and return an error. If an error occurs |
| 75 | + in packet processing, like a failure to redirect to a given interface, |
| 76 | + execution will continue with the next repetition; these errors can be detected |
| 77 | + via the same trace points as for regular XDP programs. |
| 78 | + |
| 79 | +- Userspace can supply an ifindex as part of the context object, just like in |
| 80 | + the regular (non-live) mode. The XDP program will be executed as though the |
| 81 | + packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context |
| 82 | + object will point to that interface. Furthermore, if the XDP program returns |
| 83 | + ``XDP_PASS``, the packet will be injected into the kernel networking stack as |
| 84 | + though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet |
| 85 | + will be transmitted *out* of that same interface. Do note, though, that |
| 86 | + because the program execution is not happening in driver context, an |
| 87 | + ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to |
| 88 | + that same interface (i.e., it will only work if the driver has support for the |
| 89 | + ``ndo_xdp_xmit`` driver op). |
| 90 | + |
| 91 | +- When running the program with multiple repetitions, the execution will happen |
| 92 | + in batches. The batch size defaults to 64 packets (which is same as the |
| 93 | + maximum NAPI receive batch size), but can be specified by userspace through |
| 94 | + the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch, |
| 95 | + the kernel executes the XDP program repeatedly, each invocation getting a |
| 96 | + separate copy of the packet data. For each repetition, if the program drops |
| 97 | + the packet, the data page is immediately recycled (see below). Otherwise, the |
| 98 | + packet is buffered until the end of the batch, at which point all packets |
| 99 | + buffered this way during the batch are transmitted at once. |
| 100 | + |
| 101 | +- When setting up the test run, the kernel will initialise a pool of memory |
| 102 | + pages of the same size as the batch size. Each memory page will be initialised |
| 103 | + with the initial packet data supplied by userspace at ``BPF_PROG_RUN`` |
| 104 | + invocation. When possible, the pages will be recycled on future program |
| 105 | + invocations, to improve performance. Pages will generally be recycled a full |
| 106 | + batch at a time, except when a packet is dropped (by return code or because |
| 107 | + of, say, a redirection error), in which case that page will be recycled |
| 108 | + immediately. If a packet ends up being passed to the regular networking stack |
| 109 | + (because the XDP program returns ``XDP_PASS``, or because it ends up being |
| 110 | + redirected to an interface that injects it into the stack), the page will be |
| 111 | + released and a new one will be allocated when the pool is empty. |
| 112 | + |
| 113 | + When recycling, the page content is not rewritten; only the packet boundary |
| 114 | + pointers (``data``, ``data_end`` and ``data_meta``) in the context object will |
| 115 | + be reset to the original values. This means that if a program rewrites the |
| 116 | + packet contents, it has to be prepared to see either the original content or |
| 117 | + the modified version on subsequent invocations. |
0 commit comments