Skip to content

tools/libbpf: avoid counting local symbols in ABI check #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: tools/libbpf: avoid counting local symbols in ABI check
version: 1
url: https://patchwork.ozlabs.org/project/netdev/list/?series=199744

@kernel-patches-bot
Copy link
Author

tsipa and others added 2 commits September 8, 2020 15:05
…ources

with GCC 8.4.0 and binutils 2.34: (long paths shortened)

  Warning: Num of global symbols in sharedobjs/libbpf-in.o (234) does NOT
  match with num of versioned symbols in libbpf.so (236). Please make sure
  all LIBBPF_API symbols are versioned in libbpf.map.
  --- libbpf_global_syms.tmp    2020-09-02 07:30:58.920084380 +0000
  +++ libbpf_versioned_syms.tmp 2020-09-02 07:30:58.924084388 +0000
  @@ -1,3 +1,5 @@
  +_fini
  +_init
   bpf_btf_get_fd_by_id
   bpf_btf_get_next_id
   bpf_create_map
  make[4]: *** [Makefile:210: check_abi] Error 1

Investigation shows _fini and _init are actually local symbols counted
amongst global ones:

  $ readelf --dyn-syms --wide libbpf.so|head -10

  Symbol table '.dynsym' contains 343 entries:
     Num:    Value  Size Type    Bind   Vis      Ndx Name
       0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 00004098     0 SECTION LOCAL  DEFAULT   11
       2: 00004098     8 FUNC    LOCAL  DEFAULT   11 _init@@LIBBPF_0.0.1
       3: 00023040     8 FUNC    LOCAL  DEFAULT   14 _fini@@LIBBPF_0.0.1
       4: 00000000     0 OBJECT  GLOBAL DEFAULT  ABS LIBBPF_0.0.4
       5: 00000000     0 OBJECT  GLOBAL DEFAULT  ABS LIBBPF_0.0.1
       6: 0000ffa4     8 FUNC    GLOBAL DEFAULT   12 bpf_object__find_map_by_offset@@LIBBPF_0.0.1

A previous commit filtered global symbols in sharedobjs/libbpf-in.o. Do the
same with the libbpf.so DSO for consistent comparison.

Fixes: 306b267 ("libbpf: Verify versioned symbols")

Signed-off-by: Tony Ambardar <[email protected]>
---
 tools/lib/bpf/Makefile | 2 ++
 1 file changed, 2 insertions(+)
@kernel-patches-bot
Copy link
Author

@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=199744 irrelevant now. Closing PR.

@kernel-patches-bot kernel-patches-bot deleted the series/199744 branch September 15, 2020 17:49
kernel-patches-bot pushed a commit that referenced this pull request Sep 16, 2020
…s metrics" test

Linux 5.9 introduced perf test case "Parse and process metrics" and
on s390 this test case always dumps core:

  [root@t35lp67 perf]# ./perf test -vvvv -F 67
  67: Parse and process metrics                             :
  --- start ---
  metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC
  parsing metric: inst_retired.any / cpu_clk_unhalted.thread
  Segmentation fault (core dumped)
  [root@t35lp67 perf]#

I debugged this core dump and gdb shows this call chain:

  (gdb) where
   #0  0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6
   #1  0x000003ffabc293de in strcasestr () from /lib64/libc.so.6
   #2  0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any",
            n=<optimized out>)
       at util/metricgroup.c:368
   #3  find_metric (map=<optimized out>, map=<optimized out>,
           metric=0x1e6ea20 "inst_retired.any")
      at util/metricgroup.c:765
   #4  __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0,
           metric_no_group=<optimized out>, m=<optimized out>)
      at util/metricgroup.c:844
   #5  resolve_metric (ids=0x0, map=0x0, metric_list=0x0,
          metric_no_group=<optimized out>)
      at util/metricgroup.c:881
   #6  metricgroup__add_metric (metric=<optimized out>,
        metric_no_group=metric_no_group@entry=false, events=<optimized out>,
        events@entry=0x3ffd84fb878, metric_list=0x0,
        metric_list@entry=0x3ffd84fb868, map=0x0)
      at util/metricgroup.c:943
   #7  0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>,
        metric_list=0x3ffd84fb868, events=0x3ffd84fb878,
        metric_no_group=<optimized out>, list=<optimized out>)
      at util/metricgroup.c:988
   #8  parse_groups (perf_evlist=perf_evlist@entry=0x1e70260,
          str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>,
          metric_no_merge=<optimized out>,
          fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>,
          metric_events=0x3ffd84fba58, map=0x1)
      at util/metricgroup.c:1040
   #9  0x0000000001103eb2 in metricgroup__parse_groups_test(
  	evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>,
  	str=str@entry=0x12f34b2 "IPC",
  	metric_no_group=metric_no_group@entry=false,
  	metric_no_merge=metric_no_merge@entry=false,
  	metric_events=0x3ffd84fba58)
      at util/metricgroup.c:1082
   #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0,
          ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC",
  	vals=0x3ffd84fbad8, name=0x12f34b2 "IPC")
      at tests/parse-metric.c:159
   #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8,
  	name=0x12f34b2 "IPC")
      at tests/parse-metric.c:189
   #12 test_ipc () at tests/parse-metric.c:208
.....
..... omitted many more lines

This test case was added with
commit 218ca91 ("perf tests: Add parse metric test for frontend metric").

When I compile with make DEBUG=y it works fine and I do not get a core dump.

It turned out that the above listed function call chain worked on a struct
pmu_event array which requires a trailing element with zeroes which was
missing. The marco map_for_each_event() loops over that array tests for members
metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes
the issue.

Output after:

  [root@t35lp46 perf]# ./perf test 67
  67: Parse and process metrics                             : Ok
  [root@t35lp46 perf]#

Committer notes:

As Ian remarks, this is not s390 specific:

<quote Ian>
  This also shows up with address sanitizer on all architectures
  (perhaps change the patch title) and perhaps add a "Fixes: <commit>"
  tag.

  =================================================================
  ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address
  0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp
  0x7ffd24327c58
  READ of size 8 at 0x55c93b4d59e8 thread T0
      #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2
      #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9
      #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9
      #3 0x55c93a1528db in metricgroup__add_metric
  tools/perf/util/metricgroup.c:943:9
      #4 0x55c93a151996 in metricgroup__add_metric_list
  tools/perf/util/metricgroup.c:988:9
      #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8
      #6 0x55c93a1513e1 in metricgroup__parse_groups_test
  tools/perf/util/metricgroup.c:1082:9
      #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8
      #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9
      #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2
      #10 0x55c93a00f1e8 in test__parse_metric
  tools/perf/tests/parse-metric.c:345:2
      #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9
      #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9
      #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4
      #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9
      #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11
      #16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8
      #17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2
      #18 0x55c939e45f7e in main tools/perf/perf.c:539:3

  0x55c93b4d59e8 is located 0 bytes to the right of global variable
  'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25'
  (0x55c93b4d54a0) of size 1352
  SUMMARY: AddressSanitizer: global-buffer-overflow
  tools/perf/util/metricgroup.c:764:2 in find_metric
  Shadow bytes around the buggy address:
    0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9
    0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
    0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
    0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
    0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:	   fa
    Freed heap region:	   fd
    Stack left redzone:	   f1
    Stack mid redzone:	   f2
    Stack right redzone:     f3
    Stack after return:	   f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:	   f6
    Poisoned by user:        f7
    Container overflow:	   fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
    Shadow gap:              cc
</quote>

I'm also adding the missing "Fixes" tag and setting just .name to NULL,
as doing it that way is more compact (the compiler will zero out
everything else) and the table iterators look for .name being NULL as
the sentinel marking the end of the table.

Fixes: 0a507af ("perf tests: Add parse metric test for ipc metric")
Signed-off-by: Thomas Richter <[email protected]>
Reviewed-by: Sumanth Korikkar <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
With CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, a system would try to probe,
unregister and probe again a driver.

When ghes_edac is attempted to be loaded on a system which is not on
the safe platforms list, ghes_edac_register() would return early. The
unregister counterpart ghes_edac_unregister() would still attempt to
unregister and exit early at the refcount test, leading to the refcount
underflow below.

In order to not do *anything* on the unregister path too, reuse the
force_load parameter and check it on that path too, before fumbling with
the refcount.

  ghes_edac: ghes_edac_register: entry
  ghes_edac: ghes_edac_register: return -ENODEV
  ------------[ cut here ]------------
  refcount_t: underflow; use-after-free.
  WARNING: CPU: 10 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0xb9/0x100
  Modules linked in:
  CPU: 10 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #12
  Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018
  RIP: 0010:refcount_warn_saturate+0xb9/0x100
  Code: 82 e8 fb 8f 4d 00 90 0f 0b 90 90 c3 80 3d 55 4c f5 00 00 75 88 c6 05 4c 4c f5 00 01 90 48 c7 c7 d0 8a 10 82 e8 d8 8f 4d 00 90 <0f> 0b 90 90 c3 80 3d 30 4c f5 00 00 0f 85 61 ff ff ff c6 05 23 4c
  RSP: 0018:ffffc90000037d58 EFLAGS: 00010292
  RAX: 0000000000000026 RBX: ffff88840b8da000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffffffff8216b24f RDI: 00000000ffffffff
  RBP: ffff88840c662e00 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000000000001 R11: 0000000000000046 R12: 0000000000000000
  R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff88840ee80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000800002211000 CR4: 00000000003506e0
  Call Trace:
   ghes_edac_unregister
   ghes_remove
   platform_drv_remove
   really_probe
   driver_probe_device
   device_driver_attach
   __driver_attach
   ? device_driver_attach
   ? device_driver_attach
   bus_for_each_dev
   bus_add_driver
   driver_register
   ? bert_init
   ghes_init
   do_one_initcall
   ? rcu_read_lock_sched_held
   kernel_init_freeable
   ? rest_init
   kernel_init
   ret_from_fork
   ...
  ghes_edac: ghes_edac_unregister: FALSE, refcount: -1073741824

Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The aliases were never released causing the following leaks:

  Indirect leak of 1224 byte(s) in 9 object(s) allocated from:
    #0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628)
    #1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322
    #2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778
    #3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295
    #4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367
    #5 0x56332c76a09b in run_test tests/builtin-test.c:410
    #6 0x56332c76a09b in test_and_print tests/builtin-test.c:440
    #7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695
    #8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807
    #9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 956a783 ("perf test: Test pmu-events aliases")
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The evsel->unit borrows a pointer of pmu event or alias instead of
owns a string.  But tool event (duration_time) passes a result of
strdup() caused a leak.

It was found by ASAN during metric test:

  Direct leak of 210 byte(s) in 70 object(s) allocated from:
    #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
    #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414
    #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414
    #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439
    #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096
    #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141
    #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406
    #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393
    #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415
    #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498
    #10 0x559fbbc0109b in run_test tests/builtin-test.c:410
    #11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440
    #12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695
    #13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807
    #14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The test_generic_metric() missed to release entries in the pctx.  Asan
reported following leak (and more):

  Direct leak of 128 byte(s) in 1 object(s) allocated from:
    #0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    #1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14)
    #2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497)
    #3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111
    #4 0x55f7e7341667 in expr__add_ref util/expr.c:120
    #5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783
    #6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858
    #7 0x55f7e712390b in compute_single tests/parse-metric.c:128
    #8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180
    #9 0x55f7e712446d in compute_metric tests/parse-metric.c:196
    #10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295
    #11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355
    #12 0x55f7e70be09b in run_test tests/builtin-test.c:410
    #13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
    #14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
    #15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
    #16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 6d432c4 ("perf tools: Add test_generic_metric function")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The metricgroup__add_metric() can find multiple match for a metric group
and it's possible to fail.  Also it can fail in the middle like in
resolve_metric() even for single metric.

In those cases, the intermediate list and ids will be leaked like:

  Direct leak of 3 byte(s) in 1 object(s) allocated from:
    #0 0x7f4c938f40b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
    #1 0x55f7e71c1bef in __add_metric util/metricgroup.c:683
    #2 0x55f7e71c31d0 in add_metric util/metricgroup.c:906
    #3 0x55f7e71c3844 in metricgroup__add_metric util/metricgroup.c:940
    #4 0x55f7e71c488d in metricgroup__add_metric_list util/metricgroup.c:993
    #5 0x55f7e71c488d in parse_groups util/metricgroup.c:1045
    #6 0x55f7e71c60a4 in metricgroup__parse_groups_test util/metricgroup.c:1087
    #7 0x55f7e71235ae in __compute_metric tests/parse-metric.c:164
    #8 0x55f7e7124650 in compute_metric tests/parse-metric.c:196
    #9 0x55f7e7124650 in test_recursion_fail tests/parse-metric.c:318
    #10 0x55f7e7124650 in test__parse_metric tests/parse-metric.c:356
    #11 0x55f7e70be09b in run_test tests/builtin-test.c:410
    #12 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
    #13 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
    #14 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
    #15 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #16 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #17 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #18 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #19 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 83de0b7 ("perf metric: Collect referenced metrics in struct metric_ref_node")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The following leaks were detected by ASAN:

  Indirect leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
    #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
    #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
    #4 0x560578e07045 in test__pmu tests/pmu.c:155
    #5 0x560578de109b in run_test tests/builtin-test.c:410
    #6 0x560578de109b in test_and_print tests/builtin-test.c:440
    #7 0x560578de401a in __cmd_test tests/builtin-test.c:661
    #8 0x560578de401a in cmd_test tests/builtin-test.c:807
    #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: cff7f95 ("perf tests: Move pmu tests into separate object")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
Ido Schimmel says:

====================
mlxsw: Refactor headroom management

Petr says:

On Spectrum, port buffers, also called port headroom, is where packets are
stored while they are parsed and the forwarding decision is being made. For
lossless traffic flows, in case shared buffer admission is not allowed,
headroom is also where to put the extra traffic received before the sent
PAUSE takes effect. Another aspect of the port headroom is the so called
internal buffer, which is used for egress mirroring.

Linux supports two DCB interfaces related to the headroom: dcbnl_setbuffer
for configuration, and dcbnl_getbuffer for inspection. In order to make it
possible to implement these interfaces, it is first necessary to clean up
headroom handling, which is currently strewn in several places in the
driver.

The end goal is an architecture whereby it is possible to take a copy of
the current configuration, adjust parameters, and then hand the proposed
configuration over to the system to implement it. When everything works,
the proposed configuration is accepted and saved. First, this centralizes
the reconfiguration handling to one function, which takes care of
coordinating buffer size changes and priority map changes to avoid
introducing drops. Second, the fact that the configuration is all in one
place makes it easy to keep a backup and handle error path rollbacks, which
were previously hard to understand.

Patch #1 introduces struct mlxsw_sp_hdroom, which will keep port headroom
configuration.

Patch #2 unifies handling of delay provision between PFC and PAUSE. From
now on, delay is to be measured in bytes of extra space, and will not
include MTU. PFC handler sets the delay directly from the parameter it gets
through the DCB interface. For PAUSE, MLXSW_SP_PAUSE_DELAY is converted to
have the same meaning.

In patches #3-#5, MTU, lossiness and priorities are gradually moved over to
struct mlxsw_sp_hdroom.

In patches #6-#11, handling of buffer resizing and priority maps is moved
from spectrum.c and spectrum_dcb.c to spectrum_buffers.c. The API is
gradually adapted so that struct mlxsw_sp_hdroom becomes the main interface
through which the various clients express how the headroom should be
configured.

Patch #12 is a small cleanup that the previous transformation made
possible.

In patch #13, the port init code becomes a boring client of the headroom
code, instead of rolling its own thing.

Patches #14 and #15 move handling of internal mirroring buffer to the new
headroom code as well. Previously, this code was in the SPAN module. This
patchset converts the SPAN module to another boring client of the headroom
code.
====================

Signed-off-by: David S. Miller <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 1, 2020
We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg
doesn't have any reclaimable memory.

It can be easily reproduced as below:

  watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
  CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
  Call Trace:
    shrink_lruvec+0x49f/0x640
    shrink_node+0x2a6/0x6f0
    do_try_to_free_pages+0xe9/0x3e0
    try_to_free_mem_cgroup_pages+0xef/0x1f0
    try_charge+0x2c1/0x750
    mem_cgroup_charge+0xd7/0x240
    __add_to_page_cache_locked+0x2fd/0x370
    add_to_page_cache_lru+0x4a/0xc0
    pagecache_get_page+0x10b/0x2f0
    filemap_fault+0x661/0xad0
    ext4_filemap_fault+0x2c/0x40
    __do_fault+0x4d/0xf9
    handle_mm_fault+0x1080/0x1790

It only happens on our 1-vcpu instances, because there's no chance for
oom reaper to run to reclaim the to-be-killed process.

Add a cond_resched() at the upper shrink_node_memcgs() to solve this
issue, this will mean that we will get a scheduling point for each memcg
in the reclaimed hierarchy without any dependency on the reclaimable
memory in that memcg thus making it more predictable.

Suggested-by: Michal Hocko <[email protected]>
Signed-off-by: Xunlei Pang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Chris Down <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Nov 16, 2020
…ertion-and-removal'

Ido Schimmel says:

====================
mlxsw: spectrum: Prepare for XM implementation - prefix insertion and removal

Jiri says:

This is a preparation patchset for follow-up support of boards with
extended mezzanine (XM), which is going to allow extended (scale-wise)
router offload.

XM requires a separate PRM register named XMDR to be used instead of
RALUE to insert/update/remove FIB entries. Therefore, this patchset
extends the previously introduces low-level ops to be able to have
XM-specific FIB entry config implementation.

Currently the existing original RALUE implementation is moved to "basic"
low-level ops.

Unlike legacy router, insertion/update/removal of FIB entries into XM
could be done in bulks up to 4 items in a single PRM register write.
That is why this patchset implements "an op context", that allows the
future XM ops implementation to squash multiple FIB events to single
register write. For that, the way in which the FIB events are processed
by the work queue has to be changed.

The conversion from 1:1 FIB event - work callback call to event queue is
implemented in patch #3.

Patch #4 introduces "an op context" that will allow in future to squash
multiple FIB events into one XMDR register write. Patch #12 converts it
from stack to be allocated per instance.

Existing RALUE manipulations are pushed to ops in patch #10.

Patch #13 is introducing a possibility for low-level implementation to
have per FIB entry private memory.

The rest of the patches are either cosmetics or smaller preparations.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Nov 20, 2020
This fix is for a failure that occurred in the DWARF unwind perf test.

Stack unwinders may probe memory when looking for frames.

Memory sanitizer will poison and track uninitialized memory on the
stack, and on the heap if the value is copied to the heap.

This can lead to false memory sanitizer failures for the use of an
uninitialized value.

Avoid this problem by removing the poison on the copied stack.

The full msan failure with track origins looks like:

==2168==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8
    #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22
    #1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13
    #2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #24 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9
    #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10
    #1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13
    #2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18
    #3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #25 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
    #1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2
    #2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9
    #3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6
    #4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #17 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events'
    #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445

SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: [email protected]
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Nov 30, 2020
crq->msgs could be NULL if the previous reset did not complete after
freeing crq->msgs. Check for NULL before dereferencing them.

Snippet of call trace:
...
ibmvnic 30000003 env3 (unregistering): Releasing sub-CRQ
ibmvnic 30000003 env3 (unregistering): Releasing CRQ
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc0000000000c1a30
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: ibmvnic(E-) rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag tun raw_diag inet_diag unix_diag bridge af_packet_diag netlink_diag stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ibmvnic]
CPU: 20 PID: 8426 Comm: kworker/20:0 Tainted: G            E     5.10.0-rc1+ #12
Workqueue: events __ibmvnic_reset [ibmvnic]
NIP:  c0000000000c1a30 LR: c008000001b00c18 CTR: 0000000000000400
REGS: c00000000d05b7a0 TRAP: 0380   Tainted: G            E      (5.10.0-rc1+)
MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 44002480  XER: 20040000
CFAR: c0000000000c19ec IRQMASK: 0
GPR00: 0000000000000400 c00000000d05ba30 c008000001b17c00 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000001e2
GPR08: 000000000001f400 ffffffffffffd950 0000000000000000 c008000001b0b280
GPR12: c0000000000c19c8 c00000001ec72e00 c00000000019a778 c00000002647b440
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000006 0000000000000001 0000000000000003 0000000000000002
GPR24: 0000000000001000 c008000001b0d570 0000000000000005 c00000007ab5d550
GPR28: c00000007ab5c000 c000000032fcf848 c00000007ab5cc00 c000000032fcf800
NIP [c0000000000c1a30] memset+0x68/0x104
LR [c008000001b00c18] ibmvnic_reset_crq+0x70/0x110 [ibmvnic]
Call Trace:
[c00000000d05ba30] [0000000000000800] 0x800 (unreliable)
[c00000000d05bab0] [c008000001b0a930] do_reset.isra.40+0x224/0x634 [ibmvnic]
[c00000000d05bb80] [c008000001b08574] __ibmvnic_reset+0x17c/0x3c0 [ibmvnic]
[c00000000d05bc50] [c00000000018d9ac] process_one_work+0x2cc/0x800
[c00000000d05bd20] [c00000000018df58] worker_thread+0x78/0x520
[c00000000d05bdb0] [c00000000019a934] kthread+0x1c4/0x1d0
[c00000000d05be20] [c00000000000d5d0] ret_from_kernel_thread+0x5c/0x6c

Fixes: 032c5e8 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Lijun Pan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Dec 14, 2020
Prior to sanitizing the GGTT, the only operations allowed in
intel_display_init_nogem() are those to reserve the preallocated (and
active) regions in the GGTT leftover from the BIOS. Trying to allocate a
GGTT vma (such as intel_pin_and_fence_fb_obj during the initial modeset)
may then conflict with other preallocated regions that have not yet been
protected.

Move the initial modesetting from the end of init_nogem to the beginning
of init so that any vma pinning (either framebuffers or DSB, for example),
is after the GGTT is ready to handle it.

This will prevent the DSB object from being destroyed too early:

[   53.449241] BUG: KASAN: use-after-free in i915_init_ggtt+0x324/0x9e0 [i915]
[   53.449309] Read of size 8 at addr ffff88811b1e8070 by task systemd-udevd/345

[   53.449399] CPU: 1 PID: 345 Comm: systemd-udevd Tainted: G        W         5.10.0-rc5+ #12
[   53.449409] Call Trace:
[   53.449418]  dump_stack+0x9a/0xcc
[   53.449558]  ? i915_init_ggtt+0x324/0x9e0 [i915]
[   53.449565]  print_address_description.constprop.0+0x3e/0x60
[   53.449577]  ? _raw_spin_lock_irqsave+0x4e/0x50
[   53.449718]  ? i915_init_ggtt+0x324/0x9e0 [i915]
[   53.449849]  ? i915_init_ggtt+0x324/0x9e0 [i915]
[   53.449857]  kasan_report.cold+0x1f/0x37
[   53.449993]  ? i915_init_ggtt+0x324/0x9e0 [i915]
[   53.450130]  i915_init_ggtt+0x324/0x9e0 [i915]
[   53.450273]  ? i915_ggtt_suspend+0x1f0/0x1f0 [i915]
[   53.450281]  ? static_obj+0x69/0x80
[   53.450289]  ? lockdep_init_map_waits+0xa9/0x310
[   53.450431]  ? intel_wopcm_init+0x96/0x3d0 [i915]
[   53.450581]  ? i915_gem_init+0x75/0x2d0 [i915]
[   53.450720]  i915_gem_init+0x75/0x2d0 [i915]
[   53.450852]  i915_driver_probe+0x8c2/0x1210 [i915]
[   53.450993]  ? i915_pm_prepare+0x630/0x630 [i915]
[   53.451006]  ? check_chain_key+0x1e7/0x2e0
[   53.451025]  ? __pm_runtime_resume+0x58/0xb0
[   53.451157]  i915_pci_probe+0xa6/0x2b0 [i915]
[   53.451285]  ? i915_pci_remove+0x40/0x40 [i915]
[   53.451295]  ? lockdep_hardirqs_on_prepare+0x124/0x230
[   53.451302]  ? _raw_spin_unlock_irqrestore+0x42/0x50
[   53.451309]  ? lockdep_hardirqs_on+0xbf/0x130
[   53.451315]  ? preempt_count_sub+0xf/0xb0
[   53.451321]  ? _raw_spin_unlock_irqrestore+0x2f/0x50
[   53.451335]  pci_device_probe+0xf9/0x190
[   53.451350]  really_probe+0x17f/0x5b0
[   53.451365]  driver_probe_device+0x13a/0x1c0
[   53.451376]  device_driver_attach+0x82/0x90
[   53.451386]  ? device_driver_attach+0x90/0x90
[   53.451391]  __driver_attach+0xab/0x190
[   53.451401]  ? device_driver_attach+0x90/0x90
[   53.451407]  bus_for_each_dev+0xe4/0x140
[   53.451414]  ? subsys_dev_iter_exit+0x10/0x10
[   53.451423]  ? __list_add_valid+0x2b/0xa0
[   53.451440]  bus_add_driver+0x227/0x2e0
[   53.451454]  driver_register+0xd3/0x150
[   53.451585]  i915_init+0x92/0xac [i915]
[   53.451592]  ? 0xffffffffa0a20000
[   53.451598]  do_one_initcall+0xb6/0x3b0
[   53.451606]  ? trace_event_raw_event_initcall_finish+0x150/0x150
[   53.451614]  ? __kasan_kmalloc.constprop.0+0xc2/0xd0
[   53.451627]  ? kmem_cache_alloc_trace+0x4a4/0x8e0
[   53.451634]  ? kasan_unpoison_shadow+0x33/0x40
[   53.451649]  do_init_module+0xf8/0x350
[   53.451662]  load_module+0x43de/0x47f0
[   53.451716]  ? module_frob_arch_sections+0x20/0x20
[   53.451731]  ? rw_verify_area+0x5f/0x130
[   53.451780]  ? __do_sys_finit_module+0x10d/0x1a0
[   53.451785]  __do_sys_finit_module+0x10d/0x1a0
[   53.451792]  ? __ia32_sys_init_module+0x40/0x40
[   53.451800]  ? seccomp_do_user_notification.isra.0+0x5c0/0x5c0
[   53.451829]  ? rcu_read_lock_bh_held+0xb0/0xb0
[   53.451835]  ? mark_held_locks+0x24/0x90
[   53.451856]  do_syscall_64+0x33/0x80
[   53.451863]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   53.451868] RIP: 0033:0x7fde09b4470d
[   53.451875] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 53 f7 0c 00 f7 d8 64 89 01 48
[   53.451880] RSP: 002b:00007ffd6abc1718 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   53.451890] RAX: ffffffffffffffda RBX: 000056444e528150 RCX: 00007fde09b4470d
[   53.451895] RDX: 0000000000000000 RSI: 00007fde09a21ded RDI: 000000000000000f
[   53.451899] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[   53.451904] R10: 000000000000000f R11: 0000000000000246 R12: 00007fde09a21ded
[   53.451909] R13: 0000000000000000 R14: 000056444e329200 R15: 000056444e528150

[   53.451957] Allocated by task 345:
[   53.451995]  kasan_save_stack+0x1b/0x40
[   53.452001]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[   53.452006]  kmem_cache_alloc+0x1cd/0x8d0
[   53.452146]  i915_vma_instance+0x126/0xb70 [i915]
[   53.452304]  i915_gem_object_ggtt_pin_ww+0x222/0x3f0 [i915]
[   53.452446]  intel_dsb_prepare+0x14f/0x230 [i915]
[   53.452588]  intel_atomic_commit+0x183/0x690 [i915]
[   53.452730]  intel_initial_commit+0x2bc/0x2f0 [i915]
[   53.452871]  intel_modeset_init_nogem+0xa02/0x2af0 [i915]
[   53.452995]  i915_driver_probe+0x8af/0x1210 [i915]
[   53.453120]  i915_pci_probe+0xa6/0x2b0 [i915]
[   53.453125]  pci_device_probe+0xf9/0x190
[   53.453131]  really_probe+0x17f/0x5b0
[   53.453136]  driver_probe_device+0x13a/0x1c0
[   53.453142]  device_driver_attach+0x82/0x90
[   53.453148]  __driver_attach+0xab/0x190
[   53.453153]  bus_for_each_dev+0xe4/0x140
[   53.453158]  bus_add_driver+0x227/0x2e0
[   53.453164]  driver_register+0xd3/0x150
[   53.453286]  i915_init+0x92/0xac [i915]
[   53.453292]  do_one_initcall+0xb6/0x3b0
[   53.453297]  do_init_module+0xf8/0x350
[   53.453302]  load_module+0x43de/0x47f0
[   53.453307]  __do_sys_finit_module+0x10d/0x1a0
[   53.453312]  do_syscall_64+0x33/0x80
[   53.453318]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[   53.453345] Freed by task 82:
[   53.453379]  kasan_save_stack+0x1b/0x40
[   53.453384]  kasan_set_track+0x1c/0x30
[   53.453389]  kasan_set_free_info+0x1b/0x30
[   53.453394]  __kasan_slab_free+0x112/0x160
[   53.453399]  kmem_cache_free+0xb2/0x3f0
[   53.453536]  i915_gem_flush_free_objects+0x31a/0x3b0 [i915]
[   53.453542]  process_one_work+0x519/0x9f0
[   53.453547]  worker_thread+0x75/0x5c0
[   53.453552]  kthread+0x1da/0x230
[   53.453557]  ret_from_fork+0x22/0x30

[   53.453584] The buggy address belongs to the object at ffff88811b1e8040
                which belongs to the cache i915_vma of size 968
[   53.453692] The buggy address is located 48 bytes inside of
                968-byte region [ffff88811b1e8040, ffff88811b1e8408)
[   53.453792] The buggy address belongs to the page:
[   53.453842] page:00000000b35f7048 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88811b1ef940 pfn:0x11b1e8
[   53.453847] head:00000000b35f7048 order:3 compound_mapcount:0 compound_pincount:0
[   53.453853] flags: 0x8000000000010200(slab|head)
[   53.453860] raw: 8000000000010200 ffff888115596248 ffff888115596248 ffff8881155b6340
[   53.453866] raw: ffff88811b1ef940 0000000000170001 00000001ffffffff 0000000000000000
[   53.453870] page dumped because: kasan: bad access detected

[   53.453895] Memory state around the buggy address:
[   53.453944]  ffff88811b1e7f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   53.454011]  ffff88811b1e7f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   53.454079] >ffff88811b1e8000: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
[   53.454146]                                                              ^
[   53.454211]  ffff88811b1e8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   53.454279]  ffff88811b1e8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   53.454347] ==================================================================
[   53.454414] Disabling lock debugging due to kernel taint
[   53.454434] general protection fault, probably for non-canonical address 0xdead0000000000d0: 0000 [#1] PREEMPT SMP KASAN PTI
[   53.454446] CPU: 1 PID: 345 Comm: systemd-udevd Tainted: G    B   W         5.10.0-rc5+ #12
[   53.454592] RIP: 0010:i915_init_ggtt+0x26f/0x9e0 [i915]
[   53.454602] Code: 89 8d 48 ff ff ff 4c 8d 60 d0 49 39 c7 0f 84 37 02 00 00 4c 89 b5 40 ff ff ff 4d 8d bc 24 90 00 00 00 4c 89 ff e8 c1 97 f8 e0 <49> 83 bc 24 90 00 00 00 00 0f 84 0f 02 00 00 49 8d 7c 24 08 e8 a8
[   53.454618] RSP: 0018:ffff88812247f430 EFLAGS: 00010286
[   53.454625] RAX: 0000000000000000 RBX: ffff888136440000 RCX: ffffffffa03fb78f
[   53.454633] RDX: 0000000000000000 RSI: 0000000000000008 RDI: dead000000000160
[   53.454641] RBP: ffff88812247f500 R08: ffffffff8113589f R09: 0000000000000000
[   53.454648] R10: ffffffff83063843 R11: fffffbfff060c708 R12: dead0000000000d0
[   53.454656] R13: ffff888136449ba0 R14: 0000000000002000 R15: dead000000000160
[   53.454664] FS:  00007fde095c4880(0000) GS:ffff88840c880000(0000) knlGS:0000000000000000
[   53.454672] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.454679] CR2: 00007fef132b4f28 CR3: 000000012245c002 CR4: 00000000003706e0
[   53.454686] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   53.454693] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   53.454700] Call Trace:
[   53.454833]  ? i915_ggtt_suspend+0x1f0/0x1f0 [i915]

Reported-by: Matthew Auld <[email protected]>
Fixes: afeda4f ("drm/i915/dsb: Pre allocate and late cleanup of cmd buffer")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Tested-by: Matthew Auld <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit b3bf99d)
Signed-off-by: Rodrigo Vivi <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Dec 15, 2020
Ido Schimmel says:

====================
mlxsw: Add support for Q-in-VNI

This patch set adds support for Q-in-VNI over Spectrum-{2,3} ASICs.
Q-in-VNI is like regular VxLAN encapsulation with the sole difference
that overlay packets can contain a VLAN tag. In Linux, this is achieved
by adding the VxLAN device to a 802.1ad bridge instead of a 802.1q
bridge.

From mlxsw perspective, Q-in-VNI support entails two main changes:

1. An outer VLAN tag should always be pushed to the overlay packet
during decapsulation

2. The EtherType used during decapsulation should be 802.1ad (0x88a8)
instead of the default 802.1q (0x8100)

Patch set overview:

Patches #1-#3 add required device registers and fields

Patch #4 performs small refactoring to allow code re-use

Patches #5-#7 make the EtherType used during decapsulation a property of
the tunnel port (i.e., VxLAN). This leads to the driver vetoing
configurations in which VxLAN devices are member in both 802.1ad and
802.1q/802.1d bridges. Will be handled in the future by determining the
overlay EtherType on the egress port instead

Patch #8 adds support for Q-in-VNI for Spectrum-2 and newer ASICs

Patches #9-#10 veto Q-in-VNI for Spectrum-1 ASICs due to some hardware
limitations. Can be worked around, but decided not to support it for now

Patch #11 adjusts mlxsw to stop vetoing addition of VXLAN devices to
802.1ad bridges

Patch #12 adds a generic forwarding test that can be used with both veth
pairs and physical ports with a loopback

Patch #13 adds a test to make sure mlxsw vetoes unsupported Q-in-VNI
configurations
====================

Signed-off-by: David S. Miller <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Mar 2, 2021
lanai_dev_open() can fail. When it fail, lanai->base is unmapped and the
pci device is disabled. The caller, lanai_init_one(), then tries to run
atm_dev_deregister(). This will subsequently call lanai_dev_close() and
use the already released MMIO area.

To fix this issue, set the lanai->base to NULL if open fail,
and test the flag in lanai_dev_close().

[    8.324153] lanai: lanai_start() failed, err=19
[    8.324819] lanai(itf 0): shutting down interface
[    8.325211] BUG: unable to handle page fault for address: ffffc90000180024
[    8.325781] #PF: supervisor write access in kernel mode
[    8.326215] #PF: error_code(0x0002) - not-present page
[    8.326641] PGD 100000067 P4D 100000067 PUD 100139067 PMD 10013a067 PTE 0
[    8.327206] Oops: 0002 [#1] SMP KASAN NOPTI
[    8.327557] CPU: 0 PID: 95 Comm: modprobe Not tainted 5.11.0-rc7-00090-gdcc0b49040c7 #12
[    8.328229] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-4
[    8.329145] RIP: 0010:lanai_dev_close+0x4f/0xe5 [lanai]
[    8.329587] Code: 00 48 c7 c7 00 d3 01 c0 e8 49 4e 0a c2 48 8d bd 08 02 00 00 e8 6e 52 14 c1 48 80
[    8.330917] RSP: 0018:ffff8881029ef680 EFLAGS: 00010246
[    8.331196] RAX: 000000000003fffe RBX: ffff888102fb4800 RCX: ffffffffc001a98a
[    8.331572] RDX: ffffc90000180000 RSI: 0000000000000246 RDI: ffff888102fb4000
[    8.331948] RBP: ffff888102fb4000 R08: ffffffff8115da8a R09: ffffed102053deaa
[    8.332326] R10: 0000000000000003 R11: ffffed102053dea9 R12: ffff888102fb48a4
[    8.332701] R13: ffffffffc00123c0 R14: ffff888102fb4b90 R15: ffff888102fb4b88
[    8.333077] FS:  00007f08eb9056a0(0000) GS:ffff88815b400000(0000) knlGS:0000000000000000
[    8.333502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.333806] CR2: ffffc90000180024 CR3: 0000000102a28000 CR4: 00000000000006f0
[    8.334182] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.334557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    8.334932] Call Trace:
[    8.335066]  atm_dev_deregister+0x161/0x1a0 [atm]
[    8.335324]  lanai_init_one.cold+0x20c/0x96d [lanai]
[    8.335594]  ? lanai_send+0x2a0/0x2a0 [lanai]
[    8.335831]  local_pci_probe+0x6f/0xb0
[    8.336039]  pci_device_probe+0x171/0x240
[    8.336255]  ? pci_device_remove+0xe0/0xe0
[    8.336475]  ? kernfs_create_link+0xb6/0x110
[    8.336704]  ? sysfs_do_create_link_sd.isra.0+0x76/0xe0
[    8.336983]  really_probe+0x161/0x420
[    8.337181]  driver_probe_device+0x6d/0xd0
[    8.337401]  device_driver_attach+0x82/0x90
[    8.337626]  ? device_driver_attach+0x90/0x90
[    8.337859]  __driver_attach+0x60/0x100
[    8.338065]  ? device_driver_attach+0x90/0x90
[    8.338298]  bus_for_each_dev+0xe1/0x140
[    8.338511]  ? subsys_dev_iter_exit+0x10/0x10
[    8.338745]  ? klist_node_init+0x61/0x80
[    8.338956]  bus_add_driver+0x254/0x2a0
[    8.339164]  driver_register+0xd3/0x150
[    8.339370]  ? 0xffffffffc0028000
[    8.339550]  do_one_initcall+0x84/0x250
[    8.339755]  ? trace_event_raw_event_initcall_finish+0x150/0x150
[    8.340076]  ? free_vmap_area_noflush+0x1a5/0x5c0
[    8.340329]  ? unpoison_range+0xf/0x30
[    8.340532]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[    8.340806]  ? unpoison_range+0xf/0x30
[    8.341014]  ? unpoison_range+0xf/0x30
[    8.341217]  do_init_module+0xf8/0x350
[    8.341419]  load_module+0x3fe6/0x4340
[    8.341621]  ? vm_unmap_ram+0x1d0/0x1d0
[    8.341826]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
[    8.342101]  ? module_frob_arch_sections+0x20/0x20
[    8.342358]  ? __do_sys_finit_module+0x108/0x170
[    8.342604]  __do_sys_finit_module+0x108/0x170
[    8.342841]  ? __ia32_sys_init_module+0x40/0x40
[    8.343083]  ? file_open_root+0x200/0x200
[    8.343298]  ? do_sys_open+0x85/0xe0
[    8.343491]  ? filp_open+0x50/0x50
[    8.343675]  ? exit_to_user_mode_prepare+0xfc/0x130
[    8.343935]  do_syscall_64+0x33/0x40
[    8.344132]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    8.344401] RIP: 0033:0x7f08eb887cf7
[    8.344594] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f7 48 89 d6 41
[    8.345565] RSP: 002b:00007ffcd5c98ad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    8.345962] RAX: ffffffffffffffda RBX: 00000000008fea70 RCX: 00007f08eb887cf7
[    8.346336] RDX: 0000000000000000 RSI: 00000000008fd9e0 RDI: 0000000000000003
[    8.346711] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001
[    8.347085] R10: 00007f08eb8eb300 R11: 0000000000000246 R12: 00000000008fd9e0
[    8.347460] R13: 0000000000000000 R14: 00000000008fddd0 R15: 0000000000000001
[    8.347836] Modules linked in: lanai(+) atm
[    8.348065] CR2: ffffc90000180024
[    8.348244] ---[ end trace 7fdc1c668f2003e5 ]---
[    8.348490] RIP: 0010:lanai_dev_close+0x4f/0xe5 [lanai]
[    8.348772] Code: 00 48 c7 c7 00 d3 01 c0 e8 49 4e 0a c2 48 8d bd 08 02 00 00 e8 6e 52 14 c1 48 80
[    8.349745] RSP: 0018:ffff8881029ef680 EFLAGS: 00010246
[    8.350022] RAX: 000000000003fffe RBX: ffff888102fb4800 RCX: ffffffffc001a98a
[    8.350397] RDX: ffffc90000180000 RSI: 0000000000000246 RDI: ffff888102fb4000
[    8.350772] RBP: ffff888102fb4000 R08: ffffffff8115da8a R09: ffffed102053deaa
[    8.351151] R10: 0000000000000003 R11: ffffed102053dea9 R12: ffff888102fb48a4
[    8.351525] R13: ffffffffc00123c0 R14: ffff888102fb4b90 R15: ffff888102fb4b88
[    8.351918] FS:  00007f08eb9056a0(0000) GS:ffff88815b400000(0000) knlGS:0000000000000000
[    8.352343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.352647] CR2: ffffc90000180024 CR3: 0000000102a28000 CR4: 00000000000006f0
[    8.353022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.353397] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    8.353958] modprobe (95) used greatest stack depth: 26216 bytes left

Signed-off-by: Tong Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Mar 10, 2021
Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_timeout at ffffffffa52a1bdd
  #3  wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
  #4  start_delalloc_inodes at ffffffffc0380db5
  #5  btrfs_start_delalloc_snapshot at ffffffffc0393836
  #6  try_flush_qgroup at ffffffffc03f04b2
  #7  __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
  #8  btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
  #9  btrfs_update_inode at ffffffffc0385ba8
 #10  btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
 #11  touch_atime at ffffffffa4cf0000
 #12  generic_file_read_iter at ffffffffa4c1f123
 #13  new_sync_read at ffffffffa4ccdc8a
 #14  vfs_read at ffffffffa4cd0849
 #15  ksys_read at ffffffffa4cd0bd1
 #16  do_syscall_64 at ffffffffa4a052eb
 #17  entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_preempt_disabled at ffffffffa529e80a
  #3  __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
  #4  btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
  #5  btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
  #6  cow_file_range_inline.constprop.78 at ffffffffc0386be7
  #7  cow_file_range at ffffffffc03879c1
  #8  btrfs_run_delalloc_range at ffffffffc038894c
  #9  writepage_delalloc at ffffffffc03a3c8f
 #10  __extent_writepage at ffffffffc03a4c01
 #11  extent_write_cache_pages at ffffffffc03a500b
 #12  extent_writepages at ffffffffc03a6de2
 #13  do_writepages at ffffffffa4c277eb
 #14  __filemap_fdatawrite_range at ffffffffa4c1e5bb
 #15  btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
 #16  normal_work_helper at ffffffffc03b706c
 #17  process_one_work at ffffffffa4aba4e4
 #18  worker_thread at ffffffffa4aba6fd
 #19  kthread at ffffffffa4ac0a3d
 #20  ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Mar 10, 2021
The evlist has the maps with its own refcounts so we don't need to set
the pointers to NULL.  Otherwise following error was reported by Asan.

  # perf test -v 4
   4: Read samples using the mmap interface      :
  --- start ---
  test child forked, pid 139782
  mmap size 528384B

  =================================================================
  ==139782==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f1f76daee8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x564ba21a0fea in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x564ba21a1a0f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x564ba21a21cf in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x564ba21a21cf in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x564ba1e48298 in test__basic_mmap tests/mmap-basic.c:55
    #6 0x564ba1e278fb in run_test tests/builtin-test.c:428
    #7 0x564ba1e278fb in test_and_print tests/builtin-test.c:458
    #8 0x564ba1e29a53 in __cmd_test tests/builtin-test.c:679
    #9 0x564ba1e29a53 in cmd_test tests/builtin-test.c:825
    #10 0x564ba1e95cb4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #11 0x564ba1d1fa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #12 0x564ba1d1fa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #13 0x564ba1d1fa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #14 0x7f1f768e4d09 in __libc_start_main ../csu/libc-start.c:308

    ...
  test child finished with 1
  ---- end ----
  Read samples using the mmap interface: FAILED!
  failed to open shell test directory: /home/namhyung/libexec/perf-core/tests/shell

Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jan 8, 2025
syzbot presented an use-after-free report [0] regarding ipvlan and
linkwatch.

ipvlan does not hold a refcnt of the lower device unlike vlan and
macvlan.

If the linkwatch work is triggered for the ipvlan dev, the lower dev
might have already been freed, resulting in UAF of ipvlan->phy_dev in
ipvlan_get_iflink().

We can delay the lower dev unregistration like vlan and macvlan by
holding the lower dev's refcnt in dev->netdev_ops->ndo_init() and
releasing it in dev->priv_destructor().

Jakub pointed out calling .ndo_XXX after unregister_netdevice() has
returned is error prone and suggested [1] addressing this UAF in the
core by taking commit 750e516 ("net: avoid potential UAF in
default_operstate()") further.

Let's assume unregistering devices DOWN and use RCU protection in
default_operstate() not to race with the device unregistration.

[0]:
BUG: KASAN: slab-use-after-free in ipvlan_get_iflink+0x84/0x88 drivers/net/ipvlan/ipvlan_main.c:353
Read of size 4 at addr ffff0000d768c0e0 by task kworker/u8:35/6944

CPU: 0 UID: 0 PID: 6944 Comm: kworker/u8:35 Not tainted 6.13.0-rc2-g9bc5c9515b48 kernel-patches#12 4c3cb9e8b4565456f6a355f312ff91f4f29b3c47
Hardware name: linux,dummy-virt (DT)
Workqueue: events_unbound linkwatch_event
Call trace:
 show_stack+0x38/0x50 arch/arm64/kernel/stacktrace.c:484 (C)
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xbc/0x108 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0x16c/0x6f0 mm/kasan/report.c:489
 kasan_report+0xc0/0x120 mm/kasan/report.c:602
 __asan_report_load4_noabort+0x20/0x30 mm/kasan/report_generic.c:380
 ipvlan_get_iflink+0x84/0x88 drivers/net/ipvlan/ipvlan_main.c:353
 dev_get_iflink+0x7c/0xd8 net/core/dev.c:674
 default_operstate net/core/link_watch.c:45 [inline]
 rfc2863_policy+0x144/0x360 net/core/link_watch.c:72
 linkwatch_do_dev+0x60/0x228 net/core/link_watch.c:175
 __linkwatch_run_queue+0x2f4/0x5b8 net/core/link_watch.c:239
 linkwatch_event+0x64/0xa8 net/core/link_watch.c:282
 process_one_work+0x700/0x1398 kernel/workqueue.c:3229
 process_scheduled_works kernel/workqueue.c:3310 [inline]
 worker_thread+0x8c4/0xe10 kernel/workqueue.c:3391
 kthread+0x2b0/0x360 kernel/kthread.c:389
 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:862

Allocated by task 9303:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x30/0x68 mm/kasan/common.c:68
 kasan_save_alloc_info+0x44/0x58 mm/kasan/generic.c:568
 poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
 __kasan_kmalloc+0x84/0xa0 mm/kasan/common.c:394
 kasan_kmalloc include/linux/kasan.h:260 [inline]
 __do_kmalloc_node mm/slub.c:4283 [inline]
 __kmalloc_node_noprof+0x2a0/0x560 mm/slub.c:4289
 __kvmalloc_node_noprof+0x9c/0x230 mm/util.c:650
 alloc_netdev_mqs+0xb4/0x1118 net/core/dev.c:11209
 rtnl_create_link+0x2b8/0xb60 net/core/rtnetlink.c:3595
 rtnl_newlink_create+0x19c/0x868 net/core/rtnetlink.c:3771
 __rtnl_newlink net/core/rtnetlink.c:3896 [inline]
 rtnl_newlink+0x122c/0x15c0 net/core/rtnetlink.c:4011
 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6901
 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2542
 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6928
 netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
 netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1347
 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1891
 sock_sendmsg_nosec net/socket.c:711 [inline]
 __sock_sendmsg net/socket.c:726 [inline]
 __sys_sendto+0x2ec/0x438 net/socket.c:2197
 __do_sys_sendto net/socket.c:2204 [inline]
 __se_sys_sendto net/socket.c:2200 [inline]
 __arm64_sys_sendto+0xe4/0x110 net/socket.c:2200
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151
 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744
 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762
 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600

Freed by task 10200:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x30/0x68 mm/kasan/common.c:68
 kasan_save_free_info+0x58/0x70 mm/kasan/generic.c:582
 poison_slab_object mm/kasan/common.c:247 [inline]
 __kasan_slab_free+0x48/0x68 mm/kasan/common.c:264
 kasan_slab_free include/linux/kasan.h:233 [inline]
 slab_free_hook mm/slub.c:2338 [inline]
 slab_free mm/slub.c:4598 [inline]
 kfree+0x140/0x420 mm/slub.c:4746
 kvfree+0x4c/0x68 mm/util.c:693
 netdev_release+0x94/0xc8 net/core/net-sysfs.c:2034
 device_release+0x98/0x1c0
 kobject_cleanup lib/kobject.c:689 [inline]
 kobject_release lib/kobject.c:720 [inline]
 kref_put include/linux/kref.h:65 [inline]
 kobject_put+0x2b0/0x438 lib/kobject.c:737
 netdev_run_todo+0xdd8/0xf48 net/core/dev.c:10924
 rtnl_unlock net/core/rtnetlink.c:152 [inline]
 rtnl_net_unlock net/core/rtnetlink.c:209 [inline]
 rtnl_dellink+0x484/0x680 net/core/rtnetlink.c:3526
 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6901
 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2542
 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6928
 netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
 netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1347
 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1891
 sock_sendmsg_nosec net/socket.c:711 [inline]
 __sock_sendmsg net/socket.c:726 [inline]
 ____sys_sendmsg+0x410/0x708 net/socket.c:2583
 ___sys_sendmsg+0x178/0x1d8 net/socket.c:2637
 __sys_sendmsg net/socket.c:2669 [inline]
 __do_sys_sendmsg net/socket.c:2674 [inline]
 __se_sys_sendmsg net/socket.c:2672 [inline]
 __arm64_sys_sendmsg+0x12c/0x1c8 net/socket.c:2672
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151
 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744
 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762
 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600

The buggy address belongs to the object at ffff0000d768c000
 which belongs to the cache kmalloc-cg-4k of size 4096
The buggy address is located 224 bytes inside of
 freed 4096-byte region [ffff0000d768c000, ffff0000d768d000)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x117688
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff0000c77ef981
flags: 0xbfffe0000000040(head|node=0|zone=2|lastcpupid=0x1ffff)
page_type: f5(slab)
raw: 0bfffe0000000040 ffff0000c000f500 dead000000000100 dead000000000122
raw: 0000000000000000 0000000000040004 00000001f5000000 ffff0000c77ef981
head: 0bfffe0000000040 ffff0000c000f500 dead000000000100 dead000000000122
head: 0000000000000000 0000000000040004 00000001f5000000 ffff0000c77ef981
head: 0bfffe0000000003 fffffdffc35da201 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff0000d768bf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff0000d768c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff0000d768c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                       ^
 ffff0000d768c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff0000d768c180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 8c55fac ("net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down")
Reported-by: syzkaller <[email protected]>
Suggested-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/netdev/[email protected]/ [1]
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jan 9, 2025
Hou Tao says:

====================
The use of migrate_{disable|enable} pair in BPF is mainly due to the
introduction of bpf memory allocator and the use of per-CPU data struct
in its internal implementation. The caller needs to disable migration
before invoking the alloc or free APIs of bpf memory allocator, and
enable migration after the invocation.

The main users of bpf memory allocator are various kind of bpf maps in
which the map values or the special fields in the map values are
allocated by using bpf memory allocator.

At present, the running context for bpf program has already disabled
migration explictly or implictly, therefore, when these maps are
manipulated in bpf program, it is OK to not invoke migrate_disable()
and migrate_enable() pair. Howevers, it is not always the case when
these maps are manipulated through bpf syscall, therefore many
migrate_{disable|enable} pairs are added when the map can either be
manipulated by BPF program or BPF syscall.

The initial idea of reducing the use of migrate_{disable|enable} comes
from Alexei [1]. I turned it into a patch set that archives the goals
through the following three methods:

1. remove unnecessary migrate_{disable|enable} pair
when the BPF syscall path also disables migration, it is OK to remove
the pair. Patch #1~#3 fall into this category, while patch #4~#5 are
partially included.

2. move the migrate_{disable|enable} pair from inner callee to outer
   caller
Instead of invoking migrate_disable() in the inner callee, invoking
migrate_disable() in the outer caller to simplify reasoning about when
migrate_disable() is needed. Patch #4~#5 and patch #6~#19 belongs to
this category.

3. add cant_migrate() check in the inner callee
Add cant_migrate() check in the inner callee to ensure the guarantee
that migration is disabled is not broken. Patch #1~#5, #13, #16~#19 also
belong to this category.

Please check the individual patches for more details. Comments are
always welcome.

Change Log:
v2:
  * sqaush the ->map_free related patches (#10~#12, #15) into one patch
  * remove unnecessary cant_migrate() checks.

v1: https://lore.kernel.org/bpf/[email protected]
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Feb 3, 2025
libtraceevent parses and returns an array of argument fields, sometimes
larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr",
idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6
elements max, creating an out-of-bounds access. This runtime error is
found by UBsan. The error message:

  $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1
  builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]'
    #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966
    #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110
    #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436
    #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897
    #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335
    #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502
    #6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351
    #7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404
    #8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448
    #9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556
    #10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360
    #12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6)

     0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1)                                      = 1

Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint")
Signed-off-by: Howard Chu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 10, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 11, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 12, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Feb 12, 2025
An ioctl caller of SIOCETHTOOL ETHTOOL_GSET can provoke the legacy
ethtool codepath on a non-present device, leading to kernel panic:

     [exception RIP: qed_get_current_link+0x11]
  kernel-patches#8 [ffffa2021d70f948] qede_get_link_ksettings at ffffffffc07bfa9a [qede]
  kernel-patches#9 [ffffa2021d70f9d0] __rh_call_get_link_ksettings at ffffffff9bad2723
 kernel-patches#10 [ffffa2021d70fa30] ethtool_get_settings at ffffffff9bad29d0
 kernel-patches#11 [ffffa2021d70fb18] __dev_ethtool at ffffffff9bad442b
 kernel-patches#12 [ffffa2021d70fc28] dev_ethtool at ffffffff9bad6db8
 kernel-patches#13 [ffffa2021d70fc60] dev_ioctl at ffffffff9ba7a55c
 kernel-patches#14 [ffffa2021d70fc98] sock_do_ioctl at ffffffff9ba22a44
 kernel-patches#15 [ffffa2021d70fd08] sock_ioctl at ffffffff9ba22d1c
 kernel-patches#16 [ffffa2021d70fd78] do_vfs_ioctl at ffffffff9b584cf4

Device is not present with no state bits set:

crash> net_device.state ffff8fff95240000
  state = 0x0,

Existing patch commit a699781 ("ethtool: check device is present
when getting link settings") fixes this in the modern sysfs reader's
ksettings path.

Fix this in the legacy ioctl path by checking for device presence as
well.

Fixes: 4bc71cb ("net: consolidate and fix ethtool_ops->get_settings calling")
Fixes: 3f1ac7a ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Fixes: 1da177e ("Linux-2.6.12-rc2")
Tested-by: John J Coleman <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: John J Coleman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Mar 17, 2025
Chia-Yu Chang says:

====================
AccECN protocol preparation patch series

Please find the v7

v7 (03-Mar-2025)
- Move 2 new patches added in v6 to the next AccECN patch series

v6 (27-Dec-2024)
- Avoid removing removing the potential CA_ACK_WIN_UPDATE in ack_ev_flags of patch kernel-patches#1 (Eric Dumazet <[email protected]>)
- Add reviewed-by tag in patches kernel-patches#2, kernel-patches#3, kernel-patches#4, kernel-patches#5, kernel-patches#6, kernel-patches#7, kernel-patches#8, kernel-patches#12, kernel-patches#14
- Foloiwng 2 new pathces are added after patch kernel-patches#9 (Patch that adds SKB_GSO_TCP_ACCECN)
  * New patch kernel-patches#10 to replace exisiting SKB_GSO_TCP_ECN with SKB_GSO_TCP_ACCECN in the driver to avoid CWR flag corruption
  * New patch kernel-patches#11 adds AccECN for virtio by adding new negotiation flag (VIRTIO_NET_F_HOST/GUEST_ACCECN) in feature handshake and translating Accurate ECN GSO flag between virtio_net_hdr (VIRTIO_NET_HDR_GSO_ACCECN) and skb header (SKB_GSO_TCP_ACCECN)
- Add detailed changelog and comments in kernel-patches#13 (Eric Dumazet <[email protected]>)
- Move patch kernel-patches#14 to the next AccECN patch series (Eric Dumazet <[email protected]>)

v5 (5-Nov-2024)
- Add helper function "tcp_flags_ntohs" to preserve last 2 bytes of TCP flags of patch kernel-patches#4 (Paolo Abeni <[email protected]>)
- Fix reverse X-max tree order of patches kernel-patches#4, kernel-patches#11 (Paolo Abeni <[email protected]>)
- Rename variable "delta" as "timestamp_delta" of patch kernel-patches#2 fo clariety
- Remove patch kernel-patches#14 in this series (Paolo Abeni <[email protected]>, Joel Granados <[email protected]>)

v4 (21-Oct-2024)
- Fix line length warning of patches kernel-patches#2, kernel-patches#4, kernel-patches#8, kernel-patches#10, kernel-patches#11, kernel-patches#14
- Fix spaces preferred around '|' (ctx:VxV) warning of patch kernel-patches#7
- Add missing CC'ed of patches kernel-patches#4, kernel-patches#12, kernel-patches#14

v3 (19-Oct-2024)
- Fix build error in v2

v2 (18-Oct-2024)
- Fix warning caused by NETIF_F_GSO_ACCECN_BIT in patch kernel-patches#9 (Jakub Kicinski <[email protected]>)

The full patch series can be found in
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/

The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28
====================

Signed-off-by: David S. Miller <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Apr 1, 2025
perf test 11 hwmon fails on s390 with this error

 # ./perf test -Fv 11
 --- start ---
 ---- end ----
 11.1: Basic parsing test             : Ok
 --- start ---
 Testing 'temp_test_hwmon_event1'
 Using CPUID IBM,3931,704,A01,3.7,002f
 temp_test_hwmon_event1 -> hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/
 FAILED tests/hwmon_pmu.c:189 Unexpected config for
    'temp_test_hwmon_event1', 292470092988416 != 655361
 ---- end ----
 11.2: Parsing without PMU name       : FAILED!
 --- start ---
 Testing 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/'
 FAILED tests/hwmon_pmu.c:189 Unexpected config for
    'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/',
    292470092988416 != 655361
 ---- end ----
 11.3: Parsing with PMU name          : FAILED!
 #

The root cause is in member test_event::config which is initialized
to 0xA0001 or 655361. During event parsing a long list event parsing
functions are called and end up with this gdb call stack:

 #0  hwmon_pmu__config_term (hwm=0x168dfd0, attr=0x3ffffff5ee8,
	term=0x168db60, err=0x3ffffff81c8) at util/hwmon_pmu.c:623
 #1  hwmon_pmu__config_terms (pmu=0x168dfd0, attr=0x3ffffff5ee8,
	terms=0x3ffffff5ea8, err=0x3ffffff81c8) at util/hwmon_pmu.c:662
 #2  0x00000000012f870c in perf_pmu__config_terms (pmu=0x168dfd0,
	attr=0x3ffffff5ee8, terms=0x3ffffff5ea8, zero=false,
	apply_hardcoded=false, err=0x3ffffff81c8) at util/pmu.c:1519
 #3  0x00000000012f88a4 in perf_pmu__config (pmu=0x168dfd0, attr=0x3ffffff5ee8,
	head_terms=0x3ffffff5ea8, apply_hardcoded=false, err=0x3ffffff81c8)
	at util/pmu.c:1545
 #4  0x00000000012680c4 in parse_events_add_pmu (parse_state=0x3ffffff7fb8,
	list=0x168dc00, pmu=0x168dfd0, const_parsed_terms=0x3ffffff6090,
	auto_merge_stats=true, alternate_hw_config=10)
	at util/parse-events.c:1508
 #5  0x00000000012684c6 in parse_events_multi_pmu_add (parse_state=0x3ffffff7fb8,
	event_name=0x168ec10 "temp_test_hwmon_event1", hw_config=10,
	const_parsed_terms=0x0, listp=0x3ffffff6230, loc_=0x3ffffff70e0)
	at util/parse-events.c:1592
 #6  0x00000000012f0e4e in parse_events_parse (_parse_state=0x3ffffff7fb8,
	scanner=0x16878c0) at util/parse-events.y:293
 #7  0x00000000012695a0 in parse_events__scanner (str=0x3ffffff81d8
	"temp_test_hwmon_event1", input=0x0, parse_state=0x3ffffff7fb8)
	at util/parse-events.c:1867
 #8  0x000000000126a1e8 in __parse_events (evlist=0x168b580,
	str=0x3ffffff81d8 "temp_test_hwmon_event1", pmu_filter=0x0,
	err=0x3ffffff81c8, fake_pmu=false, warn_if_reordered=true,
	fake_tp=false) at util/parse-events.c:2136
 #9  0x00000000011e36aa in parse_events (evlist=0x168b580,
	str=0x3ffffff81d8 "temp_test_hwmon_event1", err=0x3ffffff81c8)
	at /root/linux/tools/perf/util/parse-events.h:41
 #10 0x00000000011e3e64 in do_test (i=0, with_pmu=false, with_alias=false)
	at tests/hwmon_pmu.c:164
 #11 0x00000000011e422c in test__hwmon_pmu (with_pmu=false)
	at tests/hwmon_pmu.c:219
 #12 0x00000000011e431c in test__hwmon_pmu_without_pmu (test=0x1610368
	<suite.hwmon_pmu>, subtest=1) at tests/hwmon_pmu.c:23

where the attr::config is set to value 292470092988416 or 0x10a0000000000
in line 625 of file ./util/hwmon_pmu.c:

   attr->config = key.type_and_num;

However member key::type_and_num is defined as union and bit field:

   union hwmon_pmu_event_key {
        long type_and_num;
        struct {
                int num :16;
                enum hwmon_type type :8;
        };
   };

s390 is big endian and Intel is little endian architecture.
The events for the hwmon dummy pmu have num = 1 or num = 2 and
type is set to HWMON_TYPE_TEMP (which is 10).
On s390 this assignes member key::type_and_num the value of
0x10a0000000000 (which is 292470092988416) as shown in above
trace output.

Fix this and export the structure/union hwmon_pmu_event_key
so the test shares the same implementation as the event parsing
functions for union and bit fields. This should avoid
endianess issues on all platforms.

Output after:
 # ./perf test -F 11
 11.1: Basic parsing test         : Ok
 11.2: Parsing without PMU name   : Ok
 11.3: Parsing with PMU name      : Ok
 #

Fixes: 531ee0f ("perf test: Add hwmon "PMU" test")
Signed-off-by: Thomas Richter <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Apr 1, 2025
Ian told me that there are many memory leaks in the hierarchy mode.  I
can easily reproduce it with the follwing command.

  $ make DEBUG=1 EXTRA_CFLAGS=-fsanitize=leak

  $ perf record --latency -g -- ./perf test -w thloop

  $ perf report -H --stdio
  ...
  Indirect leak of 168 byte(s) in 21 object(s) allocated from:
      #0 0x7f3414c16c65 in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
      #1 0x55ed3602346e in map__get util/map.h:189
      #2 0x55ed36024cc4 in hist_entry__init util/hist.c:476
      #3 0x55ed36025208 in hist_entry__new util/hist.c:588
      #4 0x55ed36027c05 in hierarchy_insert_entry util/hist.c:1587
      #5 0x55ed36027e2e in hists__hierarchy_insert_entry util/hist.c:1638
      #6 0x55ed36027fa4 in hists__collapse_insert_entry util/hist.c:1685
      #7 0x55ed360283e8 in hists__collapse_resort util/hist.c:1776
      #8 0x55ed35de0323 in report__collapse_hists /home/namhyung/project/linux/tools/perf/builtin-report.c:735
      #9 0x55ed35de15b4 in __cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1119
      #10 0x55ed35de43dc in cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1867
      #11 0x55ed35e66767 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:351
      #12 0x55ed35e66a0e in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:404
      #13 0x55ed35e66b67 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:448
      #14 0x55ed35e66eb0 in main /home/namhyung/project/linux/tools/perf/perf.c:556
      #15 0x7f340ac33d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
  ...

  $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak'
  93

I found that hist_entry__delete() missed to release child entries in the
hierarchy tree (hroot_{in,out}).  It needs to iterate the child entries
and call hist_entry__delete() recursively.

After this change:

  $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak'
  0

Reported-by: Ian Rogers <[email protected]>
Tested-by Thomas Falcon <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Apr 1, 2025
The env.pmu_mapping can be leaked when it reads data from a pipe on AMD.
For a pipe data, it reads the header data including pmu_mapping from
PERF_RECORD_HEADER_FEATURE runtime.  But it's already set in:

  perf_session__new()
    __perf_session__new()
      evlist__init_trace_event_sample_raw()
        evlist__has_amd_ibs()
          perf_env__nr_pmu_mappings()

Then it'll overwrite that when it processes the HEADER_FEATURE record.
Here's a report from address sanitizer.

  Direct leak of 2689 byte(s) in 1 object(s) allocated from:
    #0 0x7fed8f814596 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
    #1 0x5595a7d416b1 in strbuf_grow util/strbuf.c:64
    #2 0x5595a7d414ef in strbuf_init util/strbuf.c:25
    #3 0x5595a7d0f4b7 in perf_env__read_pmu_mappings util/env.c:362
    #4 0x5595a7d12ab7 in perf_env__nr_pmu_mappings util/env.c:517
    #5 0x5595a7d89d2f in evlist__has_amd_ibs util/amd-sample-raw.c:315
    #6 0x5595a7d87fb2 in evlist__init_trace_event_sample_raw util/sample-raw.c:23
    #7 0x5595a7d7f893 in __perf_session__new util/session.c:179
    #8 0x5595a7b79572 in perf_session__new util/session.h:115
    #9 0x5595a7b7e9dc in cmd_report builtin-report.c:1603
    #10 0x5595a7c019eb in run_builtin perf.c:351
    #11 0x5595a7c01c92 in handle_internal_command perf.c:404
    #12 0x5595a7c01deb in run_argv perf.c:448
    #13 0x5595a7c02134 in main perf.c:556
    #14 0x7fed85833d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Let's free the existing pmu_mapping data if any.

Cc: Ravi Bangoria <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Apr 1, 2025
…ge_order()

Patch series "mm: MM owner tracking for large folios (!hugetlb) +
CONFIG_NO_PAGE_MAPCOUNT", v3.

Let's add an "easy" way to decide -- without false positives, without
page-mapcounts and without page table/rmap scanning -- whether a large
folio is "certainly mapped exclusively" into a single MM, or whether it
"maybe mapped shared" into multiple MMs.

Use that information to implement Copy-on-Write reuse, to convert
folio_likely_mapped_shared() to folio_maybe_mapped_share(), and to
introduce a kernel config option that lets us not use+maintain per-page
mapcounts in large folios anymore.

The bigger picture was presented at LSF/MM [1].

This series is effectively a follow-up on my early work [2], which
implemented a more precise, but also more complicated, way to identify
whether a large folio is "mapped shared" into multiple MMs or "mapped
exclusively" into a single MM.


1 Patch Organization
====================

Patch #1 -> #6: make more room in order-1 folios, so we have two
                "unsigned long" available for our purposes

Patch #7 -> #11: preparations

Patch #12: MM owner tracking for large folios

Patch #13: COW reuse for PTE-mapped anon THP

Patch #14: folio_maybe_mapped_shared()

Patch #15 -> #20: introduce and implement CONFIG_NO_PAGE_MAPCOUNT


2 MM owner tracking
===================

We assign each MM a unique ID ("MM ID"), to be able to squeeze more
information in our folios.  On 32bit we use 15-bit IDs, on 64bit we use
31-bit IDs.

For each large folios, we now store two MM-ID+mapcount ("slot")
combinations:
* mm0_id + mm0_mapcount
* mm1_id + mm1_mapcount

On 32bit, we use a 16-bit per-MM mapcount, on 64bit an ordinary 32bit
mapcount.  This way, we require 2x "unsigned long" on 32bit and 64bit for
both slots.

Paired with the large mapcount, we can reliably identify whether one of
these MMs is the current owner (-> owns all mappings) or even holds all
folio references (-> owns all mappings, and all references are from
mappings).

As long as only two MMs map folio pages at a time, we can reliably and
precisely identify whether a large folio is "mapped shared" or "mapped
exclusively".

Any additional MM that starts mapping the folio while there are no free
slots becomes an "untracked MM".  If one such "untracked MM" is the last
one mapping a folio exclusively, we will not detect the folio as "mapped
exclusively" but instead as "maybe mapped shared".  (exception: only a
single mapping remains)

So that's where the approach gets imprecise.

For now, we use a bit-spinlock to sync the large mapcount + slots, and
make sure we do keep the machinery fast, to not degrade (un)map
performance drastically: for example, we make sure to only use a single
atomic (when grabbing the bit-spinlock), like we would already perform
when updating the large mapcount.


3 CONFIG_NO_PAGE_MAPCOUNT
=========================

patch #15 -> #20 spell out and document what exactly is affected when not
maintaining the per-page mapcounts in large folios anymore.

Most importantly, as we cannot maintain folio->_nr_pages_mapped anymore
when (un)mapping pages, we'll account a complete folio as mapped if a
single page is mapped.  In addition, we'll not detect partially mapped
anonymous folios as such in all cases yet.

Likely less relevant changes include that we might now under-estimate the
USS (Unique Set Size) of a process, but never over-estimate it.

The goal is to make CONFIG_NO_PAGE_MAPCOUNT the default at some point, to
then slowly make it the only option, as we learn about real-life impacts
and possible ways to mitigate them.


4 Performance
=============

Detailed performance numbers were included in v1 [3], and not that much
changed between v1 and v2.

I did plenty of measurements on different systems in the meantime, that
all revealed slightly different results.

The pte-mapped-folio micro-benchmarks [4] are fairly sensitive to code
layout changes on some systems.  Especially the fork() benchmark started
being more-shaky-than-before on recent kernels for some reason.

In summary, with my micro-benchmarks:

* Small folios are not impacted.

* CoW performance seems to be mostly unchanged across all folios sizes.

* CoW reuse performance of large folios now matches CoW reuse
  performance of small folios, because we now actually implement the CoW
  reuse optimization.  On an Intel Xeon Silver 4210R I measured a ~65%
  reduction in runtime, on an arm64 system I measured ~54% reduction.

* munmap() performance improves with CONFIG_NO_PAGE_MAPCOUNT.  I saw
  double-digit % reduction (up to ~30% on an Intel Xeon Silver 4210R and
  up to ~70% on an AmpereOne A192-32X) with larger folios.  The larger the
  folios, the larger the performance improvement.

* munmao() performance very slightly (couple percent) degrades without
  CONFIG_NO_PAGE_MAPCOUNT for smaller folios.  For larger folios, there
  seems to be no change at all.

* fork() performance improves with CONFIG_NO_PAGE_MAPCOUNT.  I saw
  double-digit % reduction (up to ~20% on an Intel Xeon Silver 4210R and
  up to ~10% on an AmpereOne A192-32X) with larger folios.  The larger the
  folios, the larger the performance improvement.

* While fork() performance without CONFIG_NO_PAGE_MAPCOUNT seems to be
  almost unchanged on some systems, I saw some degradation for smaller
  folios on the AmpereOne A192-32X.  I did not investigate the details
  yet, but I suspect code layout changes or suboptimal code placement /
  inlining.

I'm not to worried about the fork() micro-benchmarks for smaller folios
given how shaky the results are lately and by how much we improved fork()
performance recently.

I also ran case-anon-cow-rand and case-anon-cow-seq part of
vm-scalability, to assess the scalability and the impact of the
bit-spinlock.  My measurements on a two 2-socket 10-core Intel Xeon Silver
4210R CPU revealed no significant changes.

Similarly, running these benchmarks with 2 MiB THPs enabled on the
AmpereOne A192-32X with 192 cores, I got < 1% difference with < 1% stdev,
which is nice.

So far, I did not get my hands on a similarly large system with multiple
sockets.

I found no other fitting scalability benchmarks that seem to really hammer
on concurrent mapping/unmapping of large folio pages like
case-anon-cow-seq does.


5 Concerns
==========

5.1 Bit spinlock
----------------

I'm not quite happy about the bit-spinlock, but so far it does not seem to
affect scalability in my measurements.

If it ever becomes a problem we could either investigate improving the
locking, or simply stopping the MM tracking once there are "too many
mappings" and simply assume that the folio is "mapped shared" until it was
freed.

This would be similar (but slightly different) to the "0,1,2,stopped"
counting idea Willy had at some point.  Adding that logic to "stop
tracking" adds more code to the hot path, so I avoided that for now.


5.2 folio_maybe_mapped_shared()
-------------------------------

I documented the change from folio_likely_mapped_shared() to
folio_maybe_mapped_shared() quite extensively.  If we run into surprises,
I have some ideas on how to resolve them.  For now, I think we should be
fine.


5.3 Added code to map/unmap hot path
------------------------------------

So far, it looks like the added code on the rmap hot path does not really
seem to matter much in the bigger picture.  I'd like to further reduce it
(and possibly improve fork() performance further), but I don't easily see
how right now.  Well, and I am out of puff 🙂

Having that said, alternatives I considered (e.g., per-MM per-folio
mapcount) would add a lot more overhead to these hot paths.


6 Future Work
=============

6.1 Large mapcount
------------------

It would be very handy if the large mapcount would count how often folio
pages are actually mapped into page tables: a PMD on x86-64 would count
512 times.  Calculating the average per-page mapcount will be easy, and
remapping (PMD->PTE) folios would get even faster.

That would also remove the need for the entire mapcount (except for
PMD-sized folios for memory statistics reasons ...), and allow for mapping
folios larger than PMDs (e.g., 4 MiB) easily.

We likely would also have to take the same number of folio references to
make our folio_mapcount() == folio_ref_count() work, and we'd want to be
able to avoid mapcount+refcount overflows: this could already become an
issue with pte-mapped PUD-sized folios (fsdax).

One approach we discussed in the THP cabal meeting is (1) extending the
mapcount for large folios to 64bit (at least on 64bit systems) and (2)
keeping the refcount at 32bit, but (3) having exactly one reference if the
the mapcount != 0.

It should be doable, but there are some corner cases to consider on the
unmap path; it is something that I will be looking into next.


6.2 hugetlb
-----------

I'd love to make use of the same tracking also for hugetlb.

The real problem is PMD table sharing: getting a page mapped by MM X and
unmapped by MM Y will not work.  With mshare, that problem should not
exist (all mapping/unmapping will be routed through the mshare MM).

[1] https://lwn.net/Articles/974223/
[2] https://lore.kernel.org/linux-mm/[email protected]/T/
[3] https://lkml.kernel.org/r/[email protected]
[4] https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/pte-mapped-folio-benchmarks.c


This patch (of 20):

Let's factor it out into a simple helper function.  This helper will also
come in handy when working with code where we know that our folio is
large.

Maybe in the future we'll have the order readily available for small and
large folios; in that case, folio_large_order() would simply translate to
folio_order().

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Lance Yang <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirks^H^Hski <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Matthew Wilcow (Oracle) <[email protected]>
Cc: Michal Koutn <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: tejun heo <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Zefan Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Apr 3, 2025
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash> bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     #8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    #18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def284 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng <[email protected]>
Reviewed-by: Tianxiang Peng <[email protected]>
Reviewed-by: Hao Peng <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
olsajiri pushed a commit to olsajiri/bpf that referenced this pull request Apr 18, 2025
Switch from the atomic_long_add_return() to its relaxed version.

We do not need a full memory barrier or any memory ordering during
increasing the "vmap_lazy_nr" variable.  What we only need is to do it
atomically.  This is what atomic_long_add_return_relaxed() guarantees.

AARCH64:

<snip>
Default:
    40ec:       d34cfe94        lsr     x20, x20, kernel-patches#12
    40f0:       14000044        b       4200 <free_vmap_area_noflush+0x19c>
    40f4:       94000000        bl      0 <__sanitizer_cov_trace_pc>
    40f8:       90000000        adrp    x0, 0 <__traceiter_alloc_vmap_area>
    40fc:       91000000        add     x0, x0, #0x0
    4100:       f8f40016        ldaddal x20, x22, [x0]
    4104:       8b160296        add     x22, x20, x22

Relaxed:
    40ec:       d34cfe94        lsr     x20, x20, kernel-patches#12
    40f0:       14000044        b       4200 <free_vmap_area_noflush+0x19c>
    40f4:       94000000        bl      0 <__sanitizer_cov_trace_pc>
    40f8:       90000000        adrp    x0, 0 <__traceiter_alloc_vmap_area>
    40fc:       91000000        add     x0, x0, #0x0
    4100:       f8340016        ldadd   x20, x22, [x0]
    4104:       8b160296        add     x22, x20, x22
<snip>

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Cc: Christop Hellwig <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Apr 22, 2025
Ido Schimmel says:

====================
vxlan: Convert FDB table to rhashtable

The VXLAN driver currently stores FDB entries in a hash table with a
fixed number of buckets (256), resulting in reduced performance as the
number of entries grows. This patchset solves the issue by converting
the driver to use rhashtable which maintains a more or less constant
performance regardless of the number of entries.

Measured transmitted packets per second using a single pktgen thread
with varying number of entries when the transmitted packet always hits
the default entry (worst case):

Number of entries | Improvement
------------------|------------
1k                | +1.12%
4k                | +9.22%
16k               | +55%
64k               | +585%
256k              | +2460%

The first patches are preparations for the conversion in the last patch.
Specifically, the series is structured as follows:

Patch kernel-patches#1 adds RCU read-side critical sections in the Tx path when
accessing FDB entries. Targeting at net-next as I am not aware of any
issues due to this omission despite the code being structured that way
for a long time. Without it, traces will be generated when converting
FDB lookup to rhashtable_lookup().

Patch kernel-patches#2-kernel-patches#5 simplify the creation of the default FDB entry (all-zeroes).
Current code assumes that insertion into the hash table cannot fail,
which will no longer be true with rhashtable.

Patches kernel-patches#6-kernel-patches#10 add FDB entries to a linked list for entry traversal
instead of traversing over them using the fixed size hash table which is
removed in the last patch.

Patches kernel-patches#11-kernel-patches#12 add wrappers for FDB lookup that make it clear when each
should be used along with lockdep annotations. Needed as a preparation
for rhashtable_lookup() that must be called from an RCU read-side
critical section.

Patch kernel-patches#13 treats dst cache initialization errors as non-fatal. See more
info in the commit message. The current code happens to work because
insertion into the fixed size hash table is slow enough for the per-CPU
allocator to be able to create new chunks of per-CPU memory.

Patch kernel-patches#14 adds an FDB key structure that includes the MAC address and
source VNI. To be used as rhashtable key.

Patch kernel-patches#15 does the conversion to rhashtable.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
olsajiri pushed a commit to olsajiri/bpf that referenced this pull request May 15, 2025
… prevent wrong idmap generation

The PTE_MAYBE_NG macro sets the nG page table bit according to the value
of "arm64_use_ng_mappings". This variable is currently placed in the
.bss section. create_init_idmap() is called before the .bss section
initialisation which is done in early_map_kernel(). Therefore,
data/test_prot in create_init_idmap() could be set incorrectly through
the PAGE_KERNEL -> PROT_DEFAULT -> PTE_MAYBE_NG macros.

   # llvm-objdump-21 --syms vmlinux-gcc | grep arm64_use_ng_mappings
     ffff800082f242a8 g     O .bss    0000000000000001 arm64_use_ng_mappings

The create_init_idmap() function disassembly compiled with llvm-21:

  // create_init_idmap()
  ffff80008255c058: d10103ff     	sub	sp, sp, #0x40
  ffff80008255c05c: a9017bfd     	stp	x29, x30, [sp, #0x10]
  ffff80008255c060: a90257f6     	stp	x22, x21, [sp, #0x20]
  ffff80008255c064: a9034ff4     	stp	x20, x19, [sp, #0x30]
  ffff80008255c068: 910043fd     	add	x29, sp, #0x10
  ffff80008255c06c: 90003fc8     	adrp	x8, 0xffff800082d54000
  ffff80008255c070: d280e06a     	mov	x10, #0x703     // =1795
  ffff80008255c074: 91400409     	add	x9, x0, #0x1, lsl kernel-patches#12 // =0x1000
  ffff80008255c078: 394a4108     	ldrb	w8, [x8, #0x290] ------------- (1)
  ffff80008255c07c: f2e00d0a     	movk	x10, #0x68, lsl kernel-patches#48
  ffff80008255c080: f90007e9     	str	x9, [sp, #0x8]
  ffff80008255c084: aa0103f3     	mov	x19, x1
  ffff80008255c088: aa0003f4     	mov	x20, x0
  ffff80008255c08c: 14000000     	b	0xffff80008255c08c <__pi_create_init_idmap+0x34>
  ffff80008255c090: aa082d56     	orr	x22, x10, x8, lsl kernel-patches#11 -------- (2)

Note (1) is loading the arm64_use_ng_mappings value in w8 and (2) is set
the text or data prot with the w8 value to set PTE_NG bit. If the .bss
section isn't initialized, x8 could include a garbage value and generate
an incorrect mapping.

Annotate arm64_use_ng_mappings as __read_mostly so that it is placed in
the .data section.

Fixes: 84b04d3 ("arm64: kernel: Create initial ID map from C code")
Cc: [email protected] # 6.9.x
Tested-by: Nathan Chancellor <[email protected]>
Signed-off-by: Yeoreum Yun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[[email protected]: use __read_mostly instead of __ro_after_init]
[[email protected]: slight tweaking of the code comment]
Signed-off-by: Catalin Marinas <[email protected]>
olsajiri pushed a commit to olsajiri/bpf that referenced this pull request May 15, 2025
Switch from the atomic_long_add_return() to its relaxed version.

We do not need a full memory barrier or any memory ordering during
increasing the "vmap_lazy_nr" variable.  What we only need is to do it
atomically.  This is what atomic_long_add_return_relaxed() guarantees.

AARCH64:

<snip>
Default:
    40ec:       d34cfe94        lsr     x20, x20, kernel-patches#12
    40f0:       14000044        b       4200 <free_vmap_area_noflush+0x19c>
    40f4:       94000000        bl      0 <__sanitizer_cov_trace_pc>
    40f8:       90000000        adrp    x0, 0 <__traceiter_alloc_vmap_area>
    40fc:       91000000        add     x0, x0, #0x0
    4100:       f8f40016        ldaddal x20, x22, [x0]
    4104:       8b160296        add     x22, x20, x22

Relaxed:
    40ec:       d34cfe94        lsr     x20, x20, kernel-patches#12
    40f0:       14000044        b       4200 <free_vmap_area_noflush+0x19c>
    40f4:       94000000        bl      0 <__sanitizer_cov_trace_pc>
    40f8:       90000000        adrp    x0, 0 <__traceiter_alloc_vmap_area>
    40fc:       91000000        add     x0, x0, #0x0
    4100:       f8340016        ldadd   x20, x22, [x0]
    4104:       8b160296        add     x22, x20, x22
<snip>

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Cc: Christop Hellwig <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants