Skip to content

Commit 296e90a

Browse files
anakryikoKernel Patches Daemon
authored and
Kernel Patches Daemon
committed
bpf: support precision propagation in the presence of subprogs
Add support precision backtracking in the presence of subprogram frames in jump history. This means supporting a few different kinds of subprogram invocation situations, all requiring a slightly different handling in precision backtracking handling logic: - static subprogram calls; - global subprogram calls; - callback-calling helpers/kfuncs. For each of those we need to handle a few precision propagation cases: - what to do with precision of subprog returns (r0); - what to do with precision of input arguments; - for all of them callee-saved registers in caller function should be propagated ignoring subprog/callback part of jump history. N.B. Async callback-calling helpers (currently only bpf_timer_set_callback()) are transparent to all this because they set a separate async callback environment and thus callback's history is not shared with main program's history. So as far as all the changes in this commit goes, such helper is just a regular helper. Let's look at all these situation in more details. Let's start with static subprogram being called, using an exxerpt of a simple main program and its static subprog, indenting subprog's frame slightly to make everything clear. frame 0 frame 1 precision set ======= ======= ============= 9: r6 = 456; 10: r1 = 123; r6 11: call pc+10; r1, r6 22: r0 = r1; r1 23: exit r0 12: r1 = <map_pointer> r0, r6 13: r1 += r0; r0, r6 14: r1 += r6; r6; 15: exit As can be seen above main function is passing 123 as single argument to an identity (`return x;`) subprog. Returned value is used to adjust map pointer offset, which forces r0 to be marked as precise. Then instruction #14 does the same for callee-saved r6, which will have to be backtracked all the way to instruction #9. For brevity, precision sets for instruction #13 and #14 are combined in the diagram above. First, for subprog calls, r0 returned from subprog (in frame 0) has to go into subprog's frame 1, and should be cleared from frame 0. So we go back into subprog's frame knowing we need to mark r0 precise. We then see that insn #22 sets r0 from r1, so now we care about marking r1 precise. When we pop up from subprog's frame back into caller at insn #11 we keep r1, as it's an argument-passing register, so we eventually find `10: r1 = 123;` and satify precision propagation chain for insn #13. This example demonstrates two sets of rules: - r0 returned after subprog call has to be moved into subprog's r0 set; - *static* subprog arguments (r1-r5) are moved back to caller precision set. Let's look at what happens with callee-saved precision propagation. Insn #14 mark r6 as precise. When we get into subprog's frame, we keep r6 in frame 0's precision set *only*. Subprog itself has its own set of independent r6-r10 registers and is not affected. When we eventually made our way out of subprog frame we keep r6 in precision set until we reach `9: r6 = 456;`, satisfying propagation. r6-r10 propagation is perhaps the simplest aspect, it always stays in its original frame. That's pretty much all we have to do to support precision propagation across *static subprog* invocation. Let's look at what happens when we have global subprog invocation. frame 0 frame 1 precision set ======= ======= ============= 9: r6 = 456; 10: r1 = 123; r6 11: call pc+10; # global subprog r6 12: r1 = <map_pointer> r0, r6 13: r1 += r0; r0, r6 14: r1 += r6; r6; 15: exit Starting from insn #13, r0 has to be precise. We backtrack all the way to insn #11 (call pc+10) and see that subprog is global, so was already validated in isolation. As opposed to static subprog, global subprog always returns unknown scalar r0, so that satisfies precision propagation and we drop r0 from precision set. We are done for insns #13. Now for insn #14. r6 is in precision set, we backtrack to `call pc+10;`. Here we need to recognize that this is effectively both exit and entry to global subprog, which means we stay in caller's frame. So we carry on with r6 still in precision set, until we satisfy it at insn #9. The only hard part with global subprogs is just knowing when it's a global func. Lastly, callback-calling helpers and kfuncs do simulate subprog calls, so jump history will have subprog instructions in between caller program's instructions, but the rules of propagating r0 and r1-r5 differ, because we don't actually directly call callback. We actually call helper/kfunc, which at runtime will call subprog, so the only difference between normal helper/kfunc handling is that we need to make sure to skip callback simulatinog part of jump history. Let's look at an example to make this clearer. frame 0 frame 1 precision set ======= ======= ============= 8: r6 = 456; 9: r1 = 123; r6 10: r2 = &callback; r6 11: call bpf_loop; r6 22: r0 = r1; 23: exit 12: r1 = <map_pointer> r0, r6 13: r1 += r0; r0, r6 14: r1 += r6; r6; 15: exit Again, insn #13 forces r0 to be precise. As soon as we get to `23: exit` we see that this isn't actually a static subprog call (it's `call bpf_loop;` helper call instead). So we clear r0 from precision set. For callee-saved register, there is no difference: it stays in frame 0's precision set, we go through insn #22 and #23, ignoring them until we get back to caller frame 0, eventually satisfying precision backtrack logic at insn #8 (`r6 = 456;`). Assuming callback needed to set r0 as precise at insn #23, we'd backtrack to insn #22, switching from r0 to r1, and then at the point when we pop back to frame 0 at insn #11, we'll clear r1-r5 from precision set, as we don't really do a subprog call directly, so there is no input argument precision propagation. That's pretty much it. With these changes, it seems like the only still unsupported situation for precision backpropagation is the case when program is accessing stack through registers other than r10. This is still left as unsupported (though rare) case for now. As for results. For selftests, few positive changes for bigger programs, cls_redirect in dynptr variant benefitting the most: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results.csv ~/subprog-precise-after-results.csv -f @veristat.cfg -e file,prog,insns -f 'insns_diff!=0' File Program Insns (A) Insns (B) Insns (DIFF) ---------------------------------------- ------------- --------- --------- ---------------- pyperf600_bpf_loop.bpf.linked1.o on_event 2060 2002 -58 (-2.82%) test_cls_redirect_dynptr.bpf.linked1.o cls_redirect 15660 2914 -12746 (-81.39%) test_cls_redirect_subprogs.bpf.linked1.o cls_redirect 61620 59088 -2532 (-4.11%) xdp_synproxy_kern.bpf.linked1.o syncookie_tc 109980 86278 -23702 (-21.55%) xdp_synproxy_kern.bpf.linked1.o syncookie_xdp 97716 85147 -12569 (-12.86%) Cilium progress don't really regress. They don't use subprogs and are mostly unaffected, but some other fixes and improvements could have changed something. This doesn't appear to be the case: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-cilium.csv ~/subprog-precise-after-results-cilium.csv -e file,prog,insns -f 'insns_diff!=0' File Program Insns (A) Insns (B) Insns (DIFF) ------------- ------------------------------ --------- --------- ------------ bpf_host.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_lxc.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_overlay.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_xdp.o tail_handle_nat_fwd_ipv6 12475 12504 +29 (+0.23%) bpf_xdp.o tail_nodeport_nat_ingress_ipv6 6363 6371 +8 (+0.13%) Looking at (somewhat anonymized) Meta production programs, we see mostly insignificant variation in number of instructions, with one program (syar_bind6_protect6) benefitting the most at -17%. [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-fbcode.csv ~/subprog-precise-after-results-fbcode.csv -e prog,insns -f 'insns_diff!=0' Program Insns (A) Insns (B) Insns (DIFF) ------------------------ --------- --------- ---------------- on_request_context_event 597 585 -12 (-2.01%) read_async_py_stack 43789 43657 -132 (-0.30%) read_sync_py_stack 35041 37599 +2558 (+7.30%) rrm_usdt 946 940 -6 (-0.63%) sysarmor_inet6_bind 28863 28249 -614 (-2.13%) sysarmor_inet_bind 28845 28240 -605 (-2.10%) syar_bind4_protect4 154145 147640 -6505 (-4.22%) syar_bind6_protect6 165242 137088 -28154 (-17.04%) syar_task_exit_setgid 21289 19720 -1569 (-7.37%) syar_task_exit_setuid 21290 19721 -1569 (-7.37%) do_uprobe 19967 19413 -554 (-2.77%) tw_twfw_ingress 215877 204833 -11044 (-5.12%) tw_twfw_tc_in 215877 204833 -11044 (-5.12%) But checking duration (wall clock) differences, that is the actual time taken by verifier to validate programs, we see a sometimes dramatic improvements, all the way to about 16x improvements: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-meta.csv ~/subprog-precise-after-results-meta.csv -e prog,duration -s duration_diff^ | head -n20 Program Duration (us) (A) Duration (us) (B) Duration (us) (DIFF) ---------------------------------------- ----------------- ----------------- -------------------- tw_twfw_ingress 4488374 272836 -4215538 (-93.92%) tw_twfw_tc_in 4339111 268175 -4070936 (-93.82%) tw_twfw_egress 3521816 270751 -3251065 (-92.31%) tw_twfw_tc_eg 3472878 284294 -3188584 (-91.81%) balancer_ingress 343119 291391 -51728 (-15.08%) syar_bind6_protect6 78992 64782 -14210 (-17.99%) ttls_tc_ingress 11739 8176 -3563 (-30.35%) kprobe__security_inode_link 13864 11341 -2523 (-18.20%) read_sync_py_stack 21927 19442 -2485 (-11.33%) read_async_py_stack 30444 28136 -2308 (-7.58%) syar_task_exit_setuid 10256 8440 -1816 (-17.71%) Signed-off-by: Andrii Nakryiko <[email protected]>
1 parent a602e6c commit 296e90a

File tree

1 file changed

+129
-20
lines changed

1 file changed

+129
-20
lines changed

kernel/bpf/verifier.c

Lines changed: 129 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,12 @@ static void bpf_map_key_store(struct bpf_insn_aux_data *aux, u64 state)
240240
(poisoned ? BPF_MAP_KEY_POISON : 0ULL);
241241
}
242242

243+
static bool bpf_helper_call(const struct bpf_insn *insn)
244+
{
245+
return insn->code == (BPF_JMP | BPF_CALL) &&
246+
insn->src_reg == 0;
247+
}
248+
243249
static bool bpf_pseudo_call(const struct bpf_insn *insn)
244250
{
245251
return insn->code == (BPF_JMP | BPF_CALL) &&
@@ -469,6 +475,13 @@ static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
469475
return rec;
470476
}
471477

478+
static bool subprog_is_global(const struct bpf_verifier_env *env, int subprog)
479+
{
480+
struct bpf_func_info_aux *aux = env->prog->aux->func_info_aux;
481+
482+
return aux && aux[subprog].linkage == BTF_FUNC_GLOBAL;
483+
}
484+
472485
static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
473486
{
474487
return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK);
@@ -516,6 +529,8 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
516529
return func_id == BPF_FUNC_dynptr_data;
517530
}
518531

532+
static bool is_callback_calling_kfunc(u32 btf_id);
533+
519534
static bool is_callback_calling_function(enum bpf_func_id func_id)
520535
{
521536
return func_id == BPF_FUNC_for_each_map_elem ||
@@ -525,6 +540,11 @@ static bool is_callback_calling_function(enum bpf_func_id func_id)
525540
func_id == BPF_FUNC_user_ringbuf_drain;
526541
}
527542

543+
static bool is_async_callback_calling_function(enum bpf_func_id func_id)
544+
{
545+
return func_id == BPF_FUNC_timer_set_callback;
546+
}
547+
528548
static bool is_storage_get_function(enum bpf_func_id func_id)
529549
{
530550
return func_id == BPF_FUNC_sk_storage_get ||
@@ -3359,8 +3379,13 @@ static void fmt_stack_mask(char *buf, ssize_t buf_sz, u64 stack_mask)
33593379
/* For given verifier state backtrack_insn() is called from the last insn to
33603380
* the first insn. Its purpose is to compute a bitmask of registers and
33613381
* stack slots that needs precision in the parent verifier state.
3382+
*
3383+
* @idx is an index of the instruction we are currently processing;
3384+
* @subseq_idx is an index of the subsequent instruction that:
3385+
* - *would be* executed next, if jump history is viewed in forward order;
3386+
* - *was* processed previously during backtracking.
33623387
*/
3363-
static int backtrack_insn(struct bpf_verifier_env *env, int idx,
3388+
static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
33643389
struct backtrack_state *bt)
33653390
{
33663391
const struct bpf_insn_cbs cbs = {
@@ -3374,7 +3399,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx,
33743399
u8 mode = BPF_MODE(insn->code);
33753400
u32 dreg = insn->dst_reg;
33763401
u32 sreg = insn->src_reg;
3377-
u32 spi;
3402+
u32 spi, i;
33783403

33793404
if (insn->code == 0)
33803405
return 0;
@@ -3466,14 +3491,72 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx,
34663491
if (class == BPF_STX)
34673492
bt_set_reg(bt, sreg);
34683493
} else if (class == BPF_JMP || class == BPF_JMP32) {
3469-
if (opcode == BPF_CALL) {
3470-
if (insn->src_reg == BPF_PSEUDO_CALL)
3471-
return -ENOTSUPP;
3472-
/* BPF helpers that invoke callback subprogs are
3473-
* equivalent to BPF_PSEUDO_CALL above
3494+
if (bpf_pseudo_call(insn)) {
3495+
int subprog_insn_idx, subprog;
3496+
bool is_global;
3497+
3498+
subprog_insn_idx = idx + insn->imm + 1;
3499+
subprog = find_subprog(env, subprog_insn_idx);
3500+
if (subprog < 0)
3501+
return -EFAULT;
3502+
is_global = subprog_is_global(env, subprog);
3503+
3504+
if (is_global) {
3505+
/* r1-r5 are invalidated after subprog call,
3506+
* so for global func call it shouldn't be set
3507+
* anymore
3508+
*/
3509+
if (bt_reg_mask(bt) & BPF_REGMASK_ARGS)
3510+
return -EFAULT;
3511+
/* global subprog always sets R0 */
3512+
bt_clear_reg(bt, BPF_REG_0);
3513+
return 0;
3514+
} else {
3515+
/* static subprog call instruction, which
3516+
* means that we are exiting current subprog,
3517+
* so only r1-r5 could be still requested as
3518+
* precise, r0 and r6-r10 or any stack slot in
3519+
* the current frame should be zero by now
3520+
*/
3521+
if (bt_reg_mask(bt) & ~BPF_REGMASK_ARGS)
3522+
return -EFAULT;
3523+
/* we don't track register spills perfectly,
3524+
* so fallback to force-precise instead of failing */
3525+
if (bt_stack_mask(bt) != 0)
3526+
return -ENOTSUPP;
3527+
/* propagate r1-r5 to the caller */
3528+
for (i = BPF_REG_1; i <= BPF_REG_5; i++) {
3529+
if (bt_is_reg_set(bt, i)) {
3530+
bt_clear_reg(bt, i);
3531+
bt_set_frame_reg(bt, bt->frame - 1, i);
3532+
}
3533+
}
3534+
if (bt_subprog_exit(bt))
3535+
return -EFAULT;
3536+
return 0;
3537+
}
3538+
} else if ((bpf_helper_call(insn) &&
3539+
is_callback_calling_function(insn->imm) &&
3540+
!is_async_callback_calling_function(insn->imm)) ||
3541+
(bpf_pseudo_kfunc_call(insn) && is_callback_calling_kfunc(insn->imm))) {
3542+
/* callback-calling helper or kfunc call, which means
3543+
* we are exiting from subprog, but unlike the subprog
3544+
* call handling above, we shouldn't propagate
3545+
* precision of r1-r5 (if any requested), as they are
3546+
* not actually arguments passed directly to callback
3547+
* subprogs
34743548
*/
3475-
if (insn->src_reg == 0 && is_callback_calling_function(insn->imm))
3549+
if (bt_reg_mask(bt) & ~BPF_REGMASK_ARGS)
3550+
return -EFAULT;
3551+
if (bt_stack_mask(bt) != 0)
34763552
return -ENOTSUPP;
3553+
/* clear r1-r5 in callback subprog's mask */
3554+
for (i = BPF_REG_1; i <= BPF_REG_5; i++)
3555+
bt_clear_reg(bt, i);
3556+
if (bt_subprog_exit(bt))
3557+
return -EFAULT;
3558+
return 0;
3559+
} else if (opcode == BPF_CALL) {
34773560
/* kfunc with imm==0 is invalid and fixup_kfunc_call will
34783561
* catch this error later. Make backtracking conservative
34793562
* with ENOTSUPP.
@@ -3491,7 +3574,39 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx,
34913574
return -EFAULT;
34923575
}
34933576
} else if (opcode == BPF_EXIT) {
3494-
return -ENOTSUPP;
3577+
bool r0_precise;
3578+
3579+
if (bt_reg_mask(bt) & BPF_REGMASK_ARGS) {
3580+
/* if backtracing was looking for registers R1-R5
3581+
* they should have been found already.
3582+
*/
3583+
verbose(env, "BUG regs %x\n", bt_reg_mask(bt));
3584+
WARN_ONCE(1, "verifier backtracking bug");
3585+
return -EFAULT;
3586+
}
3587+
3588+
/* BPF_EXIT in subprog or callback always jump right
3589+
* after the call instruction, so by check whether the
3590+
* instruction at subseq_idx-1 is subprog call or not we
3591+
* can distinguish actual exit from *subprog* from
3592+
* exit from *callback*. In the former case, we need
3593+
* to propagate r0 precision, if necessary. In the
3594+
* former we never do that.
3595+
*/
3596+
r0_precise = subseq_idx - 1 >= 0 &&
3597+
bpf_pseudo_call(&env->prog->insnsi[subseq_idx - 1]) &&
3598+
bt_is_reg_set(bt, BPF_REG_0);
3599+
3600+
bt_clear_reg(bt, BPF_REG_0);
3601+
if (bt_subprog_enter(bt))
3602+
return -EFAULT;
3603+
3604+
if (r0_precise)
3605+
bt_set_reg(bt, BPF_REG_0);
3606+
/* r6-r9 and stack slots will stay set in caller frame
3607+
* bitmasks until we return back from callee(s)
3608+
*/
3609+
return 0;
34953610
} else if (BPF_SRC(insn->code) == BPF_X) {
34963611
if (!bt_is_reg_set(bt, dreg) && !bt_is_reg_set(bt, sreg))
34973612
return 0;
@@ -3745,7 +3860,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int frame, int r
37453860
struct bpf_func_state *func;
37463861
struct bpf_reg_state *reg;
37473862
bool skip_first = true;
3748-
int i, fr, err;
3863+
int i, prev_i, fr, err;
37493864

37503865
if (!env->bpf_capable)
37513866
return 0;
@@ -3815,12 +3930,12 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int frame, int r
38153930
return -EFAULT;
38163931
}
38173932

3818-
for (i = last_idx;;) {
3933+
for (i = last_idx, prev_i = -1;;) {
38193934
if (skip_first) {
38203935
err = 0;
38213936
skip_first = false;
38223937
} else {
3823-
err = backtrack_insn(env, i, bt);
3938+
err = backtrack_insn(env, i, prev_i, bt);
38243939
}
38253940
if (err == -ENOTSUPP) {
38263941
mark_all_scalars_precise(env, env->cur_state);
@@ -3837,6 +3952,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int frame, int r
38373952
return 0;
38383953
if (i == first_idx)
38393954
break;
3955+
prev_i = i;
38403956
i = get_prev_insn_idx(st, i, &history);
38413957
if (i >= env->prog->len) {
38423958
/* This can happen if backtracking reached insn 0
@@ -8418,17 +8534,13 @@ static int set_callee_state(struct bpf_verifier_env *env,
84188534
struct bpf_func_state *caller,
84198535
struct bpf_func_state *callee, int insn_idx);
84208536

8421-
static bool is_callback_calling_kfunc(u32 btf_id);
8422-
84238537
static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
84248538
int *insn_idx, int subprog,
84258539
set_callee_state_fn set_callee_state_cb)
84268540
{
84278541
struct bpf_verifier_state *state = env->cur_state;
8428-
struct bpf_func_info_aux *func_info_aux;
84298542
struct bpf_func_state *caller, *callee;
84308543
int err;
8431-
bool is_global = false;
84328544

84338545
if (state->curframe + 1 >= MAX_CALL_FRAMES) {
84348546
verbose(env, "the call stack of %d frames is too deep\n",
@@ -8443,13 +8555,10 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
84438555
return -EFAULT;
84448556
}
84458557

8446-
func_info_aux = env->prog->aux->func_info_aux;
8447-
if (func_info_aux)
8448-
is_global = func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
84498558
err = btf_check_subprog_call(env, subprog, caller->regs);
84508559
if (err == -EFAULT)
84518560
return err;
8452-
if (is_global) {
8561+
if (subprog_is_global(env, subprog)) {
84538562
if (err) {
84548563
verbose(env, "Caller passes invalid args into func#%d\n",
84558564
subprog);

0 commit comments

Comments
 (0)