[LTS 8.6] perf: Disallow mis-matched inherited group reads #475
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[LTS 8.6]
CVE-2023-5717
VULN-4127
Problem
https://www.cve.org/CVERecord?id=CVE-2023-5717
Applicability: yes
The perf component is included with the
CONFIG_PERF_EVENTS
option, which is enabled in mostciqlts8_6
configsThe commit "flipping the order of child_list and sibling_list" which introduced the bug - fa8c269 - is present in
ciqlts8_6
's history. The fixing commit 32671e3 is missing and wasn't backported.Solution
The mainline fix 32671e3 adds a new
group_generation
field to theperf_event
struct. This breaks LTS 8.6 kABI.Investigation
The
check-kabi
program reports 57 symbols changed:__lock_page
,__module_get
,__page_file_index
,__pagevec_release
,__put_page
,__put_task_struct
,__scsi_iterate_devices
,__task_pid_nr_ns
,alloc_pages_current
,blk_alloc_queue
,blk_cleanup_queue
,blk_get_request
,blk_put_request
,blk_queue_flag_clear
,blk_queue_flag_set
,filemap_fault
,find_pid_ns
,find_vma
,flush_signals
,force_sig
,generic_make_request
,get_task_mm
,kernel_recvmsg
,kernel_sendmsg
,kernel_setsockopt
,kmem_cache_create
,kmem_cache_create_usercopy
,kmem_cache_destroy
,kmem_cache_shrink
,mark_page_accessed
,mmput
,module_put
,module_refcount
,pagevec_lookup_range
,pagevec_lookup_range_tag
,pid_task
,read_cache_pages
,sched_setscheduler
,scsi_change_queue_depth
,scsi_device_get
,scsi_device_lookup
,scsi_device_put
,scsi_get_vpd_page
,send_sig
,set_cpus_allowed_ptr
,set_page_dirty
,set_page_dirty_lock
,set_user_nice
,sock_create_kern
,sock_release
,starget_for_each_device
,submit_bio
,try_module_get
,unlock_page
,wait_on_page_bit
,wake_up_process
,write_cache_pages
.A useful tool exists developed by a RH engineer to assist in debugging the kABI breakage issues specifically:
kabi-dw
(https://github.com/skozina/kabi-dw). It was used to establish the relation between the changed whitelisted symbols with the modifiedperf_event
structure to asses whether a workaround may be devised.The tool requires the kernel and the modules to be compiled with
CONFIG_DEBUG_INFO
option set (it's set by default in all configs ofciqlts8_6
). After the binaries are generated their.debug_info
section is read and all the symbols are dumped into separate text files, for examplefunc--kernel_setsockopt.txt
:Full output: kabi-ciqlts8_6-CVE-2023-5717.tar.gz
As is visible in the example above in the
line, the elementary symbols of the compound types are referenced in the form of
@"‹symbol-file›"
. This defines a directed graph with symbol files as vertices. Searching for paths from each of the changed whitelisted symbol's file (eg.func--scsi_device_get.txt
) to the patch-modified symbolstruct--perf_event.txt
would show how the CVE-2023-5717 patch breaks the kABI. This was easily achieved using python's implicit graph librarynographs
(https://nographs.readthedocs.io/en/latest/). While the full picture would include all paths, only the shortest ones were explored for simplicity:perf_event-paths.txt
The file contains sections like
This exhibits that the
alloc_pages_current
symbol is a function, which is defined using thepage
struct (in this case as a return value, though it can't be known from this output, see below), which defines some field usingmem_cgroup
struct, which defines some field using thetask_struct
struct, which defines some field using thethread_struct
struct, which defines some field using theperf_event
struct that changed its definition in the patch.While the format above gives a good overview, for the actual understanding required to properly asess the impact of
perf_event
change it's important to knowhow the symbols are used (eg. as a pointer vs as a static field). This can be quite easily shown by
grep -C
-ing the symbol files for the next symbol in the usage chain:perf_event-paths-context.txt
The entries take the following form (note that the chain order is reversed):
This shows that:
ptrace_bps
array inthread_struct
doesn't containperf_event
elements but the pointers toperf_event
elements, so the memory layout changes don't proliferate to thethread_struct
or any further.perf_event
struct is behind a chain of 3 pointer dereferences from the object returned by the whitelisted symbolalloc_pages_current
(page
→mem_cgroup
→task_struct
→perf_event
).Analysis
Analyzing the files obtained in the previous steps results in the following observations:
All modified whitelisted symbols are functions.
In 42 out of 57 cases the
perf_event
struct is used as a pointer in thethread_struct
(see example above).In the rest 15 out of 57 cases the
perf_event
struct is used in thetrace_event_call
struct as a pointer argument to the function pointer field:The number of pointer dereferences to get to the
perf_event
struct at least 2.Points 2 and 3 stem from
perf_event
struct having the property of being strictly allocated from kernel's heap through theperf_event_alloc(…)
function, which means that it's always dynamically allocated and referenced indirectly through a pointer.Following "A Kernel Developer's Guide to kABI" (https://gitlab.com/redhat/centos-stream/src/kernel/documentation/-/tree/main/content/docs/kABI?ref_type=heads):
While the kABI formally must be broken, it can effectively be preserved given the conditions above. This effective kABI preservation can be signaled to the kabi-checking tools through the
RH_KABI_EXTEND
macro (include/linux/rh_kabi.h
):All three conditions mentioned are met. While it's said that the list is not exhaustive, the full picture of
perf_event
's impact on whitelisted symbols provided before strenghtens the argument that the structure can be safely extended.Regarding the second condition, while it can't be known for sure that no driver will use the
perf_event
struct on the stack, for example, this situation would be outside of scope. Following the guide again:Solution
The
group_generation
field added in the mainline fix 32671e3 was preserved, but moved to the end of the struct and wrapped in theRH_KABI_EXTEND
macro.Additionally, a fix-of-the-fix on the mainlie was commited in a71ef31 which was also included in this backport.
kABI check: passed
Boot test: passed
boot-test.log
Kselftests: passed relative
Coverage
android
,bpf
(excepttest_sockmap
,test_maps
,test_progs-no_alu32
,test_progs
,test_xsk.sh
,test_kmod.sh
),breakpoints
(exceptstep_after_suspend_test
),capabilities
,core
,cpu-hotplug
,cpufreq
,exec
,firmware
,fpu
,ftrace
,futex
,gpio
,intel_pstate
,ipc
,kcmp
,kexec
,kvm
,lib
,livepatch
,membarrier
,memfd
,memory-hotplug
,mount
,net/forwarding
(exceptsch_tbf_prio.sh
,ipip_hier_gre_keys.sh
,mirror_gre_vlan_bridge_1q.sh
,sch_ets.sh
,tc_actions.sh
,mirror_gre_bridge_1d_vlan.sh
,sch_tbf_root.sh
,sch_tbf_ets.sh
),net/mptcp
(exceptsimult_flows.sh
,mptcp_join.sh
),net
(exceptip_defrag.sh
,reuseport_addr_any.sh
,udpgro_fwd.sh
,gro.sh
,txtimestamp.sh
,xfrm_policy.sh
,udpgso_bench.sh
),netfilter
(exceptnft_trans_stress.sh
),nsfs
,proc
,pstore
,ptrace
,rseq
,sgx
,sigaltstack
,size
,splice
,static_keys
,tc-testing
,timens
,timers
(exceptraw_skew
),tpm2
,vm
,x86
,zram
Reference
kselftests–ciqlts8_6–run1.log
kselftests–ciqlts8_6–run2.log
Patch
kselftests–ciqlts8_6-CVE-2023-5717–run1.log
kselftests–ciqlts8_6-CVE-2023-5717–run2.log
kselftests–ciqlts8_6-CVE-2023-5717–run3.log
Comparison
The patch and reference results are the same.
Specific tests: passed
While not strictly testing the provided patch, a very basic sanity check of the perf_events module was done to see if it remains functional.
Reference
Patch