Skip to content

Commit 98b972d

Browse files
author
Alexei Starovoitov
committed
Merge branch 'bpf: add helpers to support BTF-based kernel'
Alan Maguire says: ==================== This series attempts to provide a simple way for BPF programs (and in future other consumers) to utilize BPF Type Format (BTF) information to display kernel data structures in-kernel. The use case this functionality is applied to here is to support a snprintf()-like helper to copy a BTF representation of kernel data to a string, and a BPF seq file helper to display BTF data for an iterator. There is already support in kernel/bpf/btf.c for "show" functionality; the changes here generalize that support from seq-file specific verifier display to the more generic case and add another specific use case; rather than seq_printf()ing the show data, it is copied to a supplied string using a snprintf()-like function. Other future consumers of the show functionality could include a bpf_printk_btf() function which printk()ed the data instead. Oops messaging in particular would be an interesting application for such functionality. The above potential use case hints at a potential reply to a reasonable objection that such typed display should be solved by tracing programs, where the in-kernel tracing records data and the userspace program prints it out. While this is certainly the recommended approach for most cases, I believe having an in-kernel mechanism would be valuable also. Critically in BPF programs it greatly simplifies debugging and tracing of such data to invoking a simple helper. One challenge raised in an earlier iteration of this work - where the BTF printing was implemented as a printk() format specifier - was that the amount of data printed per printk() was large, and other format specifiers were far simpler. Here we sidestep that concern by printing components of the BTF representation as we go for the seq file case, and in the string case the snprintf()-like operation is intended to be a basis for perf event or ringbuf output. The reasons for avoiding bpf_trace_printk are that 1. bpf_trace_printk() strings are restricted in size and cannot display anything beyond trivial data structures; and 2. bpf_trace_printk() is for debugging purposes only. As Alexei suggested, a bpf_trace_puts() helper could solve this in the future but it still would be limited by the 1000 byte limit for traced strings. Default output for an sk_buff looks like this (zeroed fields are omitted): (struct sk_buff){ .transport_header = (__u16)65535, .mac_header = (__u16)65535, .end = (sk_buff_data_t)192, .head = (unsigned char *)0x000000007524fd8b, .data = (unsigned char *)0x000000007524fd8b, .truesize = (unsigned int)768, .users = (refcount_t){ .refs = (atomic_t){ .counter = (int)1, }, }, } Flags can modify aspects of output format; see patch 3 for more details. Changes since v6: - Updated safe data size to 32, object name size to 80. This increases the number of safe copies done, but performance is not a key goal here. WRT name size the largest type name length in bpf-next according to "pahole -s" is 64 bytes, so that still gives room for additional type qualifiers, parens etc within the name limit (Alexei, patch 2) - Remove inlines and converted as many #defines to functions as was possible. In a few cases - btf_show_type_value[s]() specifically - I left these as macros as btf_show_type_value[s]() prepends and appends format strings to the format specifier (in order to include indentation, delimiters etc so a macro makes that simpler (Alexei, patch 2) - Handle btf_resolve_size() error in btf_show_obj_safe() (Alexei, patch 2) - Removed clang loop unroll in BTF snprintf test (Alexei) - switched to using bpf_core_type_id_kernel(type) as suggested by Andrii, and Alexei noted that __builtin_btf_type_id(,1) should be used (patch 4) - Added skip logic if __builtin_btf_type_id is not available (patches 4,8) - Bumped limits on bpf iters to support printing larger structures (Alexei, patch 5) - Updated overflow bpf_iter tests to reflect new iter max size (patch 6) - Updated seq helper to use type id only (Alexei, patch 7) - Updated BTF task iter test to use task struct instead of struct fs_struct since new limits allow a task_struct to be displayed (patch 8) - Fixed E2BIG handling in iter task (Alexei, patch 8) Changes since v5: - Moved btf print prepare into patch 3, type show seq with flags into patch 2 (Alexei, patches 2,3) - Fixed build bot warnings around static declarations and printf attributes - Renamed functions to snprintf_btf/seq_printf_btf (Alexei, patches 3-6) Changes since v4: - Changed approach from a BPF trace event-centric design to one utilizing a snprintf()-like helper and an iter helper (Alexei, patches 3,5) - Added tests to verify BTF output (patch 4) - Added support to tests for verifying BTF type_id-based display as well as type name via __builtin_btf_type_id (Andrii, patch 4). - Augmented task iter tests to cover the BTF-based seq helper. Because a task_struct's BTF-based representation would overflow the PAGE_SIZE limit on iterator data, the "struct fs_struct" (task->fs) is displayed for each task instead (Alexei, patch 6). Changes since v3: - Moved to RFC since the approach is different (and bpf-next is closed) - Rather than using a printk() format specifier as the means of invoking BTF-enabled display, a dedicated BPF helper is used. This solves the issue of printk() having to output large amounts of data using a complex mechanism such as BTF traversal, but still provides a way for the display of such data to be achieved via BPF programs. Future work could include a bpf_printk_btf() function to invoke display via printk() where the elements of a data structure are printk()ed one at a time. Thanks to Petr Mladek, Andy Shevchenko and Rasmus Villemoes who took time to look at the earlier printk() format-specifier-focused version of this and provided feedback clarifying the problems with that approach. - Added trace id to the bpf_trace_printk events as a means of separating output from standard bpf_trace_printk() events, ensuring it can be easily parsed by the reader. - Added bpf_trace_btf() helper tests which do simple verification of the various display options. Changes since v2: - Alexei and Yonghong suggested it would be good to use probe_kernel_read() on to-be-shown data to ensure safety during operation. Safe copy via probe_kernel_read() to a buffer object in "struct btf_show" is used to support this. A few different approaches were explored including dynamic allocation and per-cpu buffers. The downside of dynamic allocation is that it would be done during BPF program execution for bpf_trace_printk()s using %pT format specifiers. The problem with per-cpu buffers is we'd have to manage preemption and since the display of an object occurs over an extended period and in printk context where we'd rather not change preemption status, it seemed tricky to manage buffer safety while considering preemption. The approach of utilizing stack buffer space via the "struct btf_show" seemed like the simplest approach. The stack size of the associated functions which have a "struct btf_show" on their stack to support show operation (btf_type_snprintf_show() and btf_type_seq_show()) stays under 500 bytes. The compromise here is the safe buffer we use is small - 256 bytes - and as a result multiple probe_kernel_read()s are needed for larger objects. Most objects of interest are smaller than this (e.g. "struct sk_buff" is 224 bytes), and while task_struct is a notable exception at ~8K, performance is not the priority for BTF-based display. (Alexei and Yonghong, patch 2). - safe buffer use is the default behaviour (and is mandatory for BPF) but unsafe display - meaning no safe copy is done and we operate on the object itself - is supported via a 'u' option. - pointers are prefixed with 0x for clarity (Alexei, patch 2) - added additional comments and explanations around BTF show code, especially around determining whether objects such zeroed. Also tried to comment safe object scheme used. (Yonghong, patch 2) - added late_initcall() to initialize vmlinux BTF so that it would not have to be initialized during printk operation (Alexei, patch 5) - removed CONFIG_BTF_PRINTF config option as it is not needed; CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and determining behaviour of type-based printk can be done via retrieval of BTF data; if it's not there BTF was unavailable or broken (Alexei, patches 4,6) - fix bpf_trace_printk test to use vmlinux.h and globals via skeleton infrastructure, removing need for perf events (Andrii, patch 8) Changes since v1: - changed format to be more drgn-like, rendering indented type info along with type names by default (Alexei) - zeroed values are omitted (Arnaldo) by default unless the '0' modifier is specified (Alexei) - added an option to print pointer values without obfuscation. The reason to do this is the sysctls controlling pointer display are likely to be irrelevant in many if not most tracing contexts. Some questions on this in the outstanding questions section below... - reworked printk format specifer so that we no longer rely on format %pT<type> but instead use a struct * which contains type information (Rasmus). This simplifies the printk parsing, makes use more dynamic and also allows specification by BTF id as well as name. - removed incorrect patch which tried to fix dereferencing of resolved BTF info for vmlinux; instead we skip modifiers for the relevant case (array element type determination) (Alexei). - fixed issues with negative snprintf format length (Rasmus) - added test cases for various data structure formats; base types, typedefs, structs, etc. - tests now iterate through all typedef, enum, struct and unions defined for vmlinux BTF and render a version of the target dummy value which is either all zeros or all 0xff values; the idea is this exercises the "skip if zero" and "print everything" cases. - added support in BPF for using the %pT format specifier in bpf_trace_printk() - added BPF tests which ensure %pT format specifier use works (Alexei). ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents a871b04 + b72091b commit 98b972d

File tree

15 files changed

+1659
-117
lines changed

15 files changed

+1659
-117
lines changed

include/linux/bpf.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1364,6 +1364,8 @@ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr,
13641364
union bpf_attr __user *uattr);
13651365
void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
13661366

1367+
struct btf *bpf_get_btf_vmlinux(void);
1368+
13671369
/* Map specifics */
13681370
struct xdp_buff;
13691371
struct sk_buff;
@@ -1820,6 +1822,7 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto;
18201822
extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto;
18211823
extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto;
18221824
extern const struct bpf_func_proto bpf_copy_from_user_proto;
1825+
extern const struct bpf_func_proto bpf_snprintf_btf_proto;
18231826

18241827
const struct bpf_func_proto *bpf_tracing_func_proto(
18251828
enum bpf_func_id func_id, const struct bpf_prog *prog);

include/linux/btf.h

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,15 @@
66

77
#include <linux/types.h>
88
#include <uapi/linux/btf.h>
9+
#include <uapi/linux/bpf.h>
910

1011
#define BTF_TYPE_EMIT(type) ((void)(type *)0)
1112

1213
struct btf;
1314
struct btf_member;
1415
struct btf_type;
1516
union bpf_attr;
17+
struct btf_show;
1618

1719
extern const struct file_operations btf_fops;
1820

@@ -46,8 +48,45 @@ int btf_get_info_by_fd(const struct btf *btf,
4648
const struct btf_type *btf_type_id_size(const struct btf *btf,
4749
u32 *type_id,
4850
u32 *ret_size);
51+
52+
/*
53+
* Options to control show behaviour.
54+
* - BTF_SHOW_COMPACT: no formatting around type information
55+
* - BTF_SHOW_NONAME: no struct/union member names/types
56+
* - BTF_SHOW_PTR_RAW: show raw (unobfuscated) pointer values;
57+
* equivalent to %px.
58+
* - BTF_SHOW_ZERO: show zero-valued struct/union members; they
59+
* are not displayed by default
60+
* - BTF_SHOW_UNSAFE: skip use of bpf_probe_read() to safely read
61+
* data before displaying it.
62+
*/
63+
#define BTF_SHOW_COMPACT BTF_F_COMPACT
64+
#define BTF_SHOW_NONAME BTF_F_NONAME
65+
#define BTF_SHOW_PTR_RAW BTF_F_PTR_RAW
66+
#define BTF_SHOW_ZERO BTF_F_ZERO
67+
#define BTF_SHOW_UNSAFE (1ULL << 4)
68+
4969
void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
5070
struct seq_file *m);
71+
int btf_type_seq_show_flags(const struct btf *btf, u32 type_id, void *obj,
72+
struct seq_file *m, u64 flags);
73+
74+
/*
75+
* Copy len bytes of string representation of obj of BTF type_id into buf.
76+
*
77+
* @btf: struct btf object
78+
* @type_id: type id of type obj points to
79+
* @obj: pointer to typed data
80+
* @buf: buffer to write to
81+
* @len: maximum length to write to buf
82+
* @flags: show options (see above)
83+
*
84+
* Return: length that would have been/was copied as per snprintf, or
85+
* negative error.
86+
*/
87+
int btf_type_snprintf_show(const struct btf *btf, u32 type_id, void *obj,
88+
char *buf, int len, u64 flags);
89+
5190
int btf_get_fd_by_id(u32 id);
5291
u32 btf_id(const struct btf *btf);
5392
bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,

include/uapi/linux/bpf.h

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3594,6 +3594,50 @@ union bpf_attr {
35943594
* the data in *dst*. This is a wrapper of **copy_from_user**\ ().
35953595
* Return
35963596
* 0 on success, or a negative error in case of failure.
3597+
*
3598+
* long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags)
3599+
* Description
3600+
* Use BTF to store a string representation of *ptr*->ptr in *str*,
3601+
* using *ptr*->type_id. This value should specify the type
3602+
* that *ptr*->ptr points to. LLVM __builtin_btf_type_id(type, 1)
3603+
* can be used to look up vmlinux BTF type ids. Traversing the
3604+
* data structure using BTF, the type information and values are
3605+
* stored in the first *str_size* - 1 bytes of *str*. Safe copy of
3606+
* the pointer data is carried out to avoid kernel crashes during
3607+
* operation. Smaller types can use string space on the stack;
3608+
* larger programs can use map data to store the string
3609+
* representation.
3610+
*
3611+
* The string can be subsequently shared with userspace via
3612+
* bpf_perf_event_output() or ring buffer interfaces.
3613+
* bpf_trace_printk() is to be avoided as it places too small
3614+
* a limit on string size to be useful.
3615+
*
3616+
* *flags* is a combination of
3617+
*
3618+
* **BTF_F_COMPACT**
3619+
* no formatting around type information
3620+
* **BTF_F_NONAME**
3621+
* no struct/union member names/types
3622+
* **BTF_F_PTR_RAW**
3623+
* show raw (unobfuscated) pointer values;
3624+
* equivalent to printk specifier %px.
3625+
* **BTF_F_ZERO**
3626+
* show zero-valued struct/union members; they
3627+
* are not displayed by default
3628+
*
3629+
* Return
3630+
* The number of bytes that were written (or would have been
3631+
* written if output had to be truncated due to string size),
3632+
* or a negative error in cases of failure.
3633+
*
3634+
* long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr, u32 ptr_size, u64 flags)
3635+
* Description
3636+
* Use BTF to write to seq_write a string representation of
3637+
* *ptr*->ptr, using *ptr*->type_id as per bpf_snprintf_btf().
3638+
* *flags* are identical to those used for bpf_snprintf_btf.
3639+
* Return
3640+
* 0 on success or a negative error in case of failure.
35973641
*/
35983642
#define __BPF_FUNC_MAPPER(FN) \
35993643
FN(unspec), \
@@ -3745,6 +3789,8 @@ union bpf_attr {
37453789
FN(inode_storage_delete), \
37463790
FN(d_path), \
37473791
FN(copy_from_user), \
3792+
FN(snprintf_btf), \
3793+
FN(seq_printf_btf), \
37483794
/* */
37493795

37503796
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -4853,4 +4899,34 @@ struct bpf_sk_lookup {
48534899
__u32 local_port; /* Host byte order */
48544900
};
48554901

4902+
/*
4903+
* struct btf_ptr is used for typed pointer representation; the
4904+
* type id is used to render the pointer data as the appropriate type
4905+
* via the bpf_snprintf_btf() helper described above. A flags field -
4906+
* potentially to specify additional details about the BTF pointer
4907+
* (rather than its mode of display) - is included for future use.
4908+
* Display flags - BTF_F_* - are passed to bpf_snprintf_btf separately.
4909+
*/
4910+
struct btf_ptr {
4911+
void *ptr;
4912+
__u32 type_id;
4913+
__u32 flags; /* BTF ptr flags; unused at present. */
4914+
};
4915+
4916+
/*
4917+
* Flags to control bpf_snprintf_btf() behaviour.
4918+
* - BTF_F_COMPACT: no formatting around type information
4919+
* - BTF_F_NONAME: no struct/union member names/types
4920+
* - BTF_F_PTR_RAW: show raw (unobfuscated) pointer values;
4921+
* equivalent to %px.
4922+
* - BTF_F_ZERO: show zero-valued struct/union members; they
4923+
* are not displayed by default
4924+
*/
4925+
enum {
4926+
BTF_F_COMPACT = (1ULL << 0),
4927+
BTF_F_NONAME = (1ULL << 1),
4928+
BTF_F_PTR_RAW = (1ULL << 2),
4929+
BTF_F_ZERO = (1ULL << 3),
4930+
};
4931+
48564932
#endif /* _UAPI__LINUX_BPF_H__ */

kernel/bpf/bpf_iter.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,8 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
8888
mutex_lock(&seq->lock);
8989

9090
if (!seq->buf) {
91-
seq->size = PAGE_SIZE;
92-
seq->buf = kmalloc(seq->size, GFP_KERNEL);
91+
seq->size = PAGE_SIZE << 3;
92+
seq->buf = kvmalloc(seq->size, GFP_KERNEL);
9393
if (!seq->buf) {
9494
err = -ENOMEM;
9595
goto done;

0 commit comments

Comments
 (0)