Skip to content

Conversation

SandeepKUMARII
Copy link

need apply a patch

WuHao270 and others added 30 commits October 3, 2018 09:24
In some cases, other DFL driver modules may need to access some port
operations, e.g. disable / enable port for partial reconfiguration in
FME module. In order to avoid dependency between port and FME modules,
this patch introduces the dfl_fpga_port_ops support in DFL framework.
A global dfl_fpga_port_ops list is added in the DFL framework, and
it allows other DFL modules to use these port operations registered
to this list, even in virtualization case, the port platform device
is turned into VF / guest VM and hidden in host, the registered
port_ops is still usable. It resolves the dependency issues between
modules, but once get port ops API returns a valid port ops, that
means related port driver module has been module_get to prevent from
unexpected unload, and put port ops API must be invoked after use.

These APIs introduced by this patch is listed below:
 * dfl_fpga_port_ops_add
   add one port ops to the global list.

 * dfl_fpga_port_ops_del
   del one port ops from the global list.

 * dfl_fpga_port_ops_get / dfl_fpga_port_ops_put
   get/put the port ops before/after use.

Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 6e8fd6e)
Signed-off-by: Geert Uytterhoeven <[email protected]>
This patch adds one common function in DFL framework. It uses
port_ops get_id callback to get port id and compare it with given
value. This function could be used as match function of the
dfl_fpga_cdev_find_port function.

Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit d06b004)
Signed-off-by: Geert Uytterhoeven <[email protected]>
This patch implements the basic framework of the driver for FPGA PCIe
device which implements the Device Feature List (DFL) in its MMIO space.
This driver is verified on Intel(R) PCIe-based FPGA DFL devices, including
both integrated (e.g. Intel Server Platform with In-package FPGA) and
discrete (e.g. Intel FPGA PCIe Acceleration Cards) solutions.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 72ddd9f)
Signed-off-by: Geert Uytterhoeven <[email protected]>
The Device Feature List (DFL) is implemented in MMIO and features
are linked via the DFLs. This patch enables pcie driver to prepare
enumeration information (e.g. locations of all device feature lists
in MMIO) and use common APIs provided by the Device Feature List
framework to enumerate each feature device linked.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 968b819)
Signed-off-by: Geert Uytterhoeven <[email protected]>
The FPGA Management Engine (FME) provides power, thermal management,
performance counters, partial reconfiguration and other functions. For each
function, it is packaged into a private feature linked to the FME feature
device in the 'Device Feature List'. It's a platform device created by
DFL framework.

This patch adds the basic framework of FME platform driver. It defines
sub feature drivers to handle the different sub features, including init,
uinit and ioctl. It also registers the file operations for the device file.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Kang Luwei <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 322ddeb)
Signed-off-by: Geert Uytterhoeven <[email protected]>
The Header Register set is always present for FPGA Management Engine (FME),
this patch implements init and uinit function for header sub feature and
introduces several read-only sysfs interfaces for the capability and
status.

Sysfs interfaces:
* /sys/class/fpga_region/<regionX>/<dfl-fme.x>/ports_num
  Read-only. Number of ports implemented

* /sys/class/fpga_region/<regionX>/<dfl-fme.x>/bitstream_id
  Read-only. Bitstream (static FPGA region) identifier number. It contains
  the detailed version and other information of this static FPGA region.

* /sys/class/fpga_region/<regionX>/<dfl-fme.x>/bitstream_metadata
  Read-only. Bitstream (static FPGA region) meta data. It contains the
  synthesis date, seed and other information of this static FPGA region.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Kang Luwei <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 0a27ff2)
Signed-off-by: Geert Uytterhoeven <[email protected]>
…upport

DFL_FPGA_GET_API_VERSION and DFL_FPGA_CHECK_EXTENSION ioctls are common
ones which need to be supported by all feature devices drivers including
FME and AFU. Userspace application can use these ioctl interfaces to get
the API info and check if specific extension is supported or not in
current driver.

This patch implements above 2 ioctls in FPGA Management Engine (FME)
driver.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 620e190)
Signed-off-by: Geert Uytterhoeven <[email protected]>
Partial Reconfiguration (PR) is the most important function for FME. It
allows reconfiguration for given Port/Accelerated Function Unit (AFU).

It creates platform devices for fpga-mgr, fpga-regions and fpga-bridges,
and invokes fpga-region's interface (fpga_region_program_fpga) for PR
operation once PR request received via ioctl. Below user space interface
is exposed by this sub feature.

Ioctl interface:
* DFL_FPGA_FME_PORT_PR
  Do partial reconfiguration per information from userspace, including
  target port(AFU), buffer size and address info. It returns error code
  to userspace if failed. For detailed PR error information, user needs
  to read fpga-mgr's status sysfs interface.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Kang Luwei <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 29de762)
Signed-off-by: Geert Uytterhoeven <[email protected]>
This patch adds fpga manager driver for FPGA Management Engine (FME). It
implements fpga_manager_ops for FPGA Partial Reconfiguration function.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Kang Luwei <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit af275ec)
Signed-off-by: Geert Uytterhoeven <[email protected]>
This patch adds compat_id support to fme manager driver, it
reads the ID from the hardware register. And it could be used
for compatibility check before partial reconfiguration.

Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 5ebae80)
Signed-off-by: Geert Uytterhoeven <[email protected]>
This patch adds fpga bridge platform driver for FPGA Management Engine.
It implements the enable_set callback for fpga bridge.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit de892df)
Signed-off-by: Geert Uytterhoeven <[email protected]>
This patch adds fpga region platform driver for FPGA Management Engine.
It register an fpga region with given fpga manager / bridge device.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit bb61b9b)
Signed-off-by: Geert Uytterhoeven <[email protected]>
This patch adds compat_id support, it reuses fme manager's
compat id, as the per region compat id is actually from the
fme manager's register.

Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 7514a42)
Signed-off-by: Geert Uytterhoeven <[email protected]>
On DFL FPGA devices, the Accelerated Function Unit (AFU), can be
reprogrammed for different functions. It connects to the FPGA
infrastructure (static FPGA region) via a Port. Port CSRs are
implemented separately from the AFU CSRs to provide control and
status of the Port. Once valid PR bitstream is programmed into
the AFU, it allows access to the AFU CSRs in the AFU MMIO space.

This patch only implements basic driver framework for AFU, including
device file operation framework.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 1a1527c)
Signed-off-by: Geert Uytterhoeven <[email protected]>
This patch registers the port ops into the global list in the DFL
framework, and it allows other modules to use the port ops. And
This patch includes the implementation of the get_id and enable_set
ops too.

Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 47c1b19)
Signed-off-by: Geert Uytterhoeven <[email protected]>
The port header register set is always present for port, it is mainly
for capability, control and status of the ports that AFU connected to.

This patch implements header sub feature support. Below user interfaces
are created by this patch.

Sysfs interface:
* /sys/class/fpga_region/<regionX>/<dfl-port.x>/id
  Read-only. Port ID.

Ioctl interface:
* DFL_FPGA_PORT_RESET
  Reset the FPGA Port and its AFU.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit e4664c0)
Signed-off-by: Geert Uytterhoeven <[email protected]>
…upport

DFL_FPGA_GET_API_VERSION and DFL_FPGA_CHECK_EXTENSION ioctls are common
ones which need to be supported by all feature devices drivers including
FME and AFU. This patch implements above 2 ioctls in FPGA Accelerated
Function Unit (AFU) driver.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 6fd893c)
Signed-off-by: Geert Uytterhoeven <[email protected]>
User Accelerated Function Unit sub feature exposes the MMIO region of
the AFU. After valid PR bitstream is programmed and the port is enabled,
then this MMIO region could be accessed.

This patch adds support to enumerate the AFU MMIO region and expose it
to userspace via mmap file operation. Below interfaces are exposed to user:

Sysfs interface:
* /sys/class/fpga_region/<regionX>/<dfl-port.x>/afu_id
  Read-only. Indicate which PR bitstream is programmed to this AFU.

Ioctl interfaces:
* DFL_FPGA_PORT_GET_INFO
  Provide info to userspace on the number of supported region.
  Only UAFU region is supported now.

* DFL_FPGA_PORT_GET_REGION_INFO
  Provide region information, including access permission, region size,
  offset from the start of device fd.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 857a262)
Signed-off-by: Geert Uytterhoeven <[email protected]>
DMA memory regions are required for Accelerated Function Unit (AFU) usage.
These two ioctls allow user space applications to map user memory regions
for dma, and unmap them after use. Iova is returned from driver to user
space application via DFL_FPGA_PORT_DMA_MAP ioctl. Application needs to
unmap it after use, otherwise, driver will unmap them in device file
release operation.

Each AFU has its own rb tree to keep track of its mapped DMA regions.

Ioctl interfaces:
* DFL_FPGA_PORT_DMA_MAP
  Do the dma mapping per user_addr and length provided by user.
  Return iova in provided struct dfl_fpga_port_dma_map.

* DFL_FPGA_PORT_DMA_UNMAP
  Unmap the dma region per iova provided by user.

Signed-off-by: Tim Whisonant <[email protected]>
Signed-off-by: Enno Luebbers <[email protected]>
Signed-off-by: Shiva Rao <[email protected]>
Signed-off-by: Christopher Rauer <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit fa8dda1)
Signed-off-by: Geert Uytterhoeven <[email protected]>
Add entry for FPGA Device Feature List (DFL) drivers.

Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 5d6bd30)
Signed-off-by: Geert Uytterhoeven <[email protected]>
Add dt binding documentation details for Lattice MachXO2 FPGA configuration
over Slave SPI interface.

Signed-off-by: Paolo Pisati <[email protected]>
Acked-by: Rob Herring <[email protected]>
Acked-by: Alan Tull <[email protected]>
Signed-off-by: Moritz Fischer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit d549ac0)
Signed-off-by: Geert Uytterhoeven <[email protected]>
Make this const as it is only passed to a const argument of the function
fpga_bridge_register.

Signed-off-by: Bhumika Goyal <[email protected]>
Acked-by: Michal Simek <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Signed-off-by: Alan Tull <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit d8d9d93)
Signed-off-by: Geert Uytterhoeven <[email protected]>
Blank help texts are probably either a typo, a Kconfig misunderstanding,
or some kind of half-committing to adding a help text (in which case a
TODO comment would be clearer, if the help text really can't be added
right away).

Best to remove them, IMO.

Signed-off-by: Ulf Magnusson <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Acked-by: Ley Foon Tan <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
(cherry picked from commit e191603)
Signed-off-by: Geert Uytterhoeven <[email protected]>
Improve the DTS files by removing all the leading "0x" and zeros to fix the
following dtc warnings:

Warning (unit_address_format): Node /XXX unit name should not have leading "0x"

and

Warning (unit_address_format): Node /XXX unit name should not have leading 0s

Converted using the following command:

find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} +

For simplicity, two sed expressions were used to solve each warnings separately.

To make the regex expression more robust a few other issues were resolved,
namely setting unit-address to lower case, and adding a whitespace before the
the opening curly brace:

https://elinux.org/Device_Tree_Linux#Linux_conventions

This is a follow up to commit 4c9847b ("dt-bindings: Remove leading 0x from bindings notation")

Reported-by: David Daney <[email protected]>
Suggested-by: Rob Herring <[email protected]>
Signed-off-by: Mathieu Malaterre <[email protected]>
Acked-by: Ley Foon Tan <[email protected]>
(cherry picked from commit 5d13c73)
Signed-off-by: Geert Uytterhoeven <[email protected]>
Remove old, dead Kconfig option INET_LRO. It is gone since
commit 7bbf3ca ("ipv4: Remove inet_lro library").

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Acked-by: Ley Foon Tan <[email protected]>
(cherry picked from commit e0691eb)
Signed-off-by: Geert Uytterhoeven <[email protected]>
A commit for the nds32 architecture bootstrap("asm-generic/io.h:
move ioremap_nocache/ioremap_uc/ioremap_wc/ioremap_wt out of ifndef
CONFIG_MMU") will move the ioremap_nocache out of the CONFIG_MMU ifdef.
This means that in order to suppress re-definition errors we need to
setup #define's before importing asm-generic/io.h.

Signed-off-by: Greentime Hu <[email protected]>
Reviewed-by: Tobias Klauser <[email protected]>
(cherry picked from commit c601a89)
Signed-off-by: Geert Uytterhoeven <[email protected]>
Since struct timespec is not y2038 safe on 32bit machines, this patch
converts read_persistent_clock() to read_persistent_clock64() using
struct timespec64, as well as converting mktime() to mktime64().

Signed-off-by: Baolin Wang <[email protected]>
Signed-off-by: Ley Foon Tan <[email protected]>
(cherry picked from commit 3d9644e)
Signed-off-by: Geert Uytterhoeven <[email protected]>
The dwc2 USB controller in Stratix10 has an additional ECC reset bit that
needs to get de-asserted in order for the controller to work properly.

Acked-by: John Youn <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
(cherry picked from commit f2830ad)
Signed-off-by: Geert Uytterhoeven <[email protected]>
Enable the reset driver to get built for the Stratix10 platform.

Signed-off-by: Dinh Nguyen <[email protected]>
Signed-off-by: Philipp Zabel <[email protected]>
(cherry picked from commit db21f9c)
Signed-off-by: Geert Uytterhoeven <[email protected]>
Use BIT macro to avoid shift-31-problem, indent a little more and use
GENMASK to make it easier to add new flags.

Signed-off-by: Wolfram Sang <[email protected]>
Reviewed-by: Ulrich Hecht <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
(cherry picked from commit b07531a)
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: Geert Uytterhoeven <[email protected]>
rgongatintel pushed a commit that referenced this pull request Feb 6, 2020
[ Upstream commit 7ac6269 ]

The LCD timing definitions between Linux DRM vs Allwinner are different,
below diagram shows this clear differences.

           Active                 Front           Sync           Back
           Region                 Porch                          Porch
<-----------------------><----------------><--------------><-------------->
  //////////////////////|
 ////////////////////// |
//////////////////////  |..................                ................
                                           ________________
<----- [hv]display ----->
<------------- [hv]sync_start ------------>
<--------------------- [hv]sync_end ---------------------->
<-------------------------------- [hv]total ------------------------------>

<----- lcd_[xy] -------->		  <- lcd_[hv]spw ->
					  <---------- lcd_[hv]bp --------->
<-------------------------------- lcd_[hv]t ------------------------------>

The DSI driver misinterpreted the hbp term from the BSP code to refer
only to the backporch, when in fact it was backporch + sync. Thus the
driver incorrectly used the horizontal front porch plus sync in its
calculation of the DRQ set bit value, when it should not have included
the sync timing.

Including additional sync timings leads to flip_done timed out as:

WARNING: CPU: 0 PID: 31 at drivers/gpu/drm/drm_atomic_helper.c:1429 drm_atomic_helper_wait_for_vblanks.part.1+0x298/0x2a0
[CRTC:46:crtc-0] vblank wait timed out
Modules linked in:
CPU: 0 PID: 31 Comm: kworker/0:1 Not tainted 5.1.0-next-20190514-00026-g01f0c75b902d-dirty #13
Hardware name: Allwinner sun8i Family
Workqueue: events deferred_probe_work_func
[<c010ed54>] (unwind_backtrace) from [<c010b76c>] (show_stack+0x10/0x14)
[<c010b76c>] (show_stack) from [<c0688c70>] (dump_stack+0x84/0x98)
[<c0688c70>] (dump_stack) from [<c011d9e4>] (__warn+0xfc/0x114)
[<c011d9e4>] (__warn) from [<c011da40>] (warn_slowpath_fmt+0x44/0x68)
[<c011da40>] (warn_slowpath_fmt) from [<c040cd50>] (drm_atomic_helper_wait_for_vblanks.part.1+0x298/0x2a0)
[<c040cd50>] (drm_atomic_helper_wait_for_vblanks.part.1) from [<c040e694>] (drm_atomic_helper_commit_tail_rpm+0x5c/0x6c)
[<c040e694>] (drm_atomic_helper_commit_tail_rpm) from [<c040e4dc>] (commit_tail+0x40/0x6c)
[<c040e4dc>] (commit_tail) from [<c040e5cc>] (drm_atomic_helper_commit+0xbc/0x128)
[<c040e5cc>] (drm_atomic_helper_commit) from [<c0411b64>] (restore_fbdev_mode_atomic+0x1cc/0x1dc)
[<c0411b64>] (restore_fbdev_mode_atomic) from [<c04156f8>] (drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xa0)
[<c04156f8>] (drm_fb_helper_restore_fbdev_mode_unlocked) from [<c0415774>] (drm_fb_helper_set_par+0x30/0x54)
[<c0415774>] (drm_fb_helper_set_par) from [<c03ad450>] (fbcon_init+0x560/0x5ac)
[<c03ad450>] (fbcon_init) from [<c03eb8a0>] (visual_init+0xbc/0x104)
[<c03eb8a0>] (visual_init) from [<c03ed1b8>] (do_bind_con_driver+0x1b0/0x390)
[<c03ed1b8>] (do_bind_con_driver) from [<c03ed780>] (do_take_over_console+0x13c/0x1c4)
[<c03ed780>] (do_take_over_console) from [<c03ad800>] (do_fbcon_takeover+0x74/0xcc)
[<c03ad800>] (do_fbcon_takeover) from [<c013c9c8>] (notifier_call_chain+0x44/0x84)
[<c013c9c8>] (notifier_call_chain) from [<c013cd20>] (__blocking_notifier_call_chain+0x48/0x60)
[<c013cd20>] (__blocking_notifier_call_chain) from [<c013cd50>] (blocking_notifier_call_chain+0x18/0x20)
[<c013cd50>] (blocking_notifier_call_chain) from [<c03a6e44>] (register_framebuffer+0x1e0/0x2f8)
[<c03a6e44>] (register_framebuffer) from [<c04153c0>] (__drm_fb_helper_initial_config_and_unlock+0x2fc/0x50c)
[<c04153c0>] (__drm_fb_helper_initial_config_and_unlock) from [<c04158c8>] (drm_fbdev_client_hotplug+0xe8/0x1b8)
[<c04158c8>] (drm_fbdev_client_hotplug) from [<c0415a20>] (drm_fbdev_generic_setup+0x88/0x118)
[<c0415a20>] (drm_fbdev_generic_setup) from [<c043f060>] (sun4i_drv_bind+0x128/0x160)
[<c043f060>] (sun4i_drv_bind) from [<c044b598>] (try_to_bring_up_master+0x164/0x1a0)
[<c044b598>] (try_to_bring_up_master) from [<c044b668>] (__component_add+0x94/0x140)
[<c044b668>] (__component_add) from [<c0445e1c>] (sun6i_dsi_probe+0x144/0x234)
[<c0445e1c>] (sun6i_dsi_probe) from [<c0452ef4>] (platform_drv_probe+0x48/0x9c)
[<c0452ef4>] (platform_drv_probe) from [<c04512cc>] (really_probe+0x1dc/0x2c8)
[<c04512cc>] (really_probe) from [<c0451518>] (driver_probe_device+0x60/0x160)
[<c0451518>] (driver_probe_device) from [<c044f7a4>] (bus_for_each_drv+0x74/0xb8)
[<c044f7a4>] (bus_for_each_drv) from [<c045107c>] (__device_attach+0xd0/0x13c)
[<c045107c>] (__device_attach) from [<c0450474>] (bus_probe_device+0x84/0x8c)
[<c0450474>] (bus_probe_device) from [<c0450900>] (deferred_probe_work_func+0x64/0x90)
[<c0450900>] (deferred_probe_work_func) from [<c0135970>] (process_one_work+0x204/0x420)
[<c0135970>] (process_one_work) from [<c013690c>] (worker_thread+0x274/0x5a0)
[<c013690c>] (worker_thread) from [<c013b3d8>] (kthread+0x11c/0x14c)
[<c013b3d8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
Exception stack(0xde539fb0 to 0xde539ff8)
9fa0:                                     00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end trace b57eb1e5c64c6b8b ]---
random: fast init done
[drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:46:crtc-0] flip_done timed out
[drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CONNECTOR:48:DSI-1] flip_done timed out
[drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:30:plane-0] flip_done timed out

With the terms(as described in above diagram) fixed, the panel
displays correctly without any timeouts.

Tested-by: Merlijn Wajer <[email protected]>
Signed-off-by: Jagan Teki <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Feb 6, 2020
[ Upstream commit 238191d ]

If a faulty initiator fails to bind the socket to the iSCSI connection
before emitting a command, for instance, a subsequent send_pdu, it will
crash the kernel due to a null pointer dereference in sock_sendmsg(), as
shown in the log below.  This patch makes sure the bind succeeded before
trying to use the socket.

BUG: kernel NULL pointer dereference, address: 0000000000000018
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 7 Comm: kworker/u8:0 Not tainted 5.4.0-rc2.iscsi+ #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[   24.158246] Workqueue: iscsi_q_0 iscsi_xmitworker
[   24.158883] RIP: 0010:apparmor_socket_sendmsg+0x5/0x20
[...]
[   24.161739] RSP: 0018:ffffab6440043ca0 EFLAGS: 00010282
[   24.162400] RAX: ffffffff891c1c00 RBX: ffffffff89d53968 RCX: 0000000000000001
[   24.163253] RDX: 0000000000000030 RSI: ffffab6440043d00 RDI: 0000000000000000
[   24.164104] RBP: 0000000000000030 R08: 0000000000000030 R09: 0000000000000030
[   24.165166] R10: ffffffff893e66a0 R11: 0000000000000018 R12: ffffab6440043d00
[   24.166038] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9d5575a62e90
[   24.166919] FS:  0000000000000000(0000) GS:ffff9d557db80000(0000) knlGS:0000000000000000
[   24.167890] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.168587] CR2: 0000000000000018 CR3: 000000007a838000 CR4: 00000000000006e0
[   24.169451] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   24.170320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   24.171214] Call Trace:
[   24.171537]  security_socket_sendmsg+0x3a/0x50
[   24.172079]  sock_sendmsg+0x16/0x60
[   24.172506]  iscsi_sw_tcp_xmit_segment+0x77/0x120
[   24.173076]  iscsi_sw_tcp_pdu_xmit+0x58/0x170
[   24.173604]  ? iscsi_dbg_trace+0x63/0x80
[   24.174087]  iscsi_tcp_task_xmit+0x101/0x280
[   24.174666]  iscsi_xmit_task+0x83/0x110
[   24.175206]  iscsi_xmitworker+0x57/0x380
[   24.175757]  ? __schedule+0x2a2/0x700
[   24.176273]  process_one_work+0x1b5/0x360
[   24.176837]  worker_thread+0x50/0x3c0
[   24.177353]  kthread+0xf9/0x130
[   24.177799]  ? process_one_work+0x360/0x360
[   24.178401]  ? kthread_park+0x90/0x90
[   24.178915]  ret_from_fork+0x35/0x40
[   24.179421] Modules linked in:
[   24.179856] CR2: 0000000000000018
[   24.180327] ---[ end trace b4b7674b6df5f480 ]---

Signed-off-by: Anatol Pomazau <[email protected]>
Co-developed-by: Frank Mayhar <[email protected]>
Signed-off-by: Frank Mayhar <[email protected]>
Co-developed-by: Bharath Ravi <[email protected]>
Signed-off-by: Bharath Ravi <[email protected]>
Co-developed-by: Khazhimsel Kumykov <[email protected]>
Signed-off-by: Khazhimsel Kumykov <[email protected]>
Co-developed-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Reviewed-by: Lee Duncan <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Feb 6, 2020
[ Upstream commit 8354d88 ]

Ensure we grab an active reference in cifs superblock while doing
failover to prevent automounts (DFS links) of expiring and then
destroying the superblock pointer.

This patch fixes the following KASAN report:

[  464.301462] BUG: KASAN: use-after-free in
cifs_reconnect+0x6ab/0x1350
[  464.303052] Read of size 8 at addr ffff888155e580d0 by task
cifsd/1107

[  464.304682] CPU: 3 PID: 1107 Comm: cifsd Not tainted 5.4.0-rc4+ #13
[  464.305552] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.12.1-0-ga5cab58-rebuilt.opensuse.org 04/01/2014
[  464.307146] Call Trace:
[  464.307875]  dump_stack+0x5b/0x90
[  464.308631]  print_address_description.constprop.0+0x16/0x200
[  464.309478]  ? cifs_reconnect+0x6ab/0x1350
[  464.310253]  ? cifs_reconnect+0x6ab/0x1350
[  464.311040]  __kasan_report.cold+0x1a/0x41
[  464.311811]  ? cifs_reconnect+0x6ab/0x1350
[  464.312563]  kasan_report+0xe/0x20
[  464.313300]  cifs_reconnect+0x6ab/0x1350
[  464.314062]  ? extract_hostname.part.0+0x90/0x90
[  464.314829]  ? printk+0xad/0xde
[  464.315525]  ? _raw_spin_lock+0x7c/0xd0
[  464.316252]  ? _raw_read_lock_irq+0x40/0x40
[  464.316961]  ? ___ratelimit+0xed/0x182
[  464.317655]  cifs_readv_from_socket+0x289/0x3b0
[  464.318386]  cifs_read_from_socket+0x98/0xd0
[  464.319078]  ? cifs_readv_from_socket+0x3b0/0x3b0
[  464.319782]  ? try_to_wake_up+0x43c/0xa90
[  464.320463]  ? cifs_small_buf_get+0x4b/0x60
[  464.321173]  ? allocate_buffers+0x98/0x1a0
[  464.321856]  cifs_demultiplex_thread+0x218/0x14a0
[  464.322558]  ? cifs_handle_standard+0x270/0x270
[  464.323237]  ? __switch_to_asm+0x40/0x70
[  464.323893]  ? __switch_to_asm+0x34/0x70
[  464.324554]  ? __switch_to_asm+0x40/0x70
[  464.325226]  ? __switch_to_asm+0x40/0x70
[  464.325863]  ? __switch_to_asm+0x34/0x70
[  464.326505]  ? __switch_to_asm+0x40/0x70
[  464.327161]  ? __switch_to_asm+0x34/0x70
[  464.327784]  ? finish_task_switch+0xa1/0x330
[  464.328414]  ? __switch_to+0x363/0x640
[  464.329044]  ? __schedule+0x575/0xaf0
[  464.329655]  ? _raw_spin_lock_irqsave+0x82/0xe0
[  464.330301]  kthread+0x1a3/0x1f0
[  464.330884]  ? cifs_handle_standard+0x270/0x270
[  464.331624]  ? kthread_create_on_node+0xd0/0xd0
[  464.332347]  ret_from_fork+0x35/0x40

[  464.333577] Allocated by task 1110:
[  464.334381]  save_stack+0x1b/0x80
[  464.335123]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[  464.335848]  cifs_smb3_do_mount+0xd4/0xb00
[  464.336619]  legacy_get_tree+0x6b/0xa0
[  464.337235]  vfs_get_tree+0x41/0x110
[  464.337975]  fc_mount+0xa/0x40
[  464.338557]  vfs_kern_mount.part.0+0x6c/0x80
[  464.339227]  cifs_dfs_d_automount+0x336/0xd29
[  464.339846]  follow_managed+0x1b1/0x450
[  464.340449]  lookup_fast+0x231/0x4a0
[  464.341039]  path_openat+0x240/0x1fd0
[  464.341634]  do_filp_open+0x126/0x1c0
[  464.342277]  do_sys_open+0x1eb/0x2c0
[  464.342957]  do_syscall_64+0x5e/0x190
[  464.343555]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[  464.344772] Freed by task 0:
[  464.345347]  save_stack+0x1b/0x80
[  464.345966]  __kasan_slab_free+0x12c/0x170
[  464.346576]  kfree+0xa6/0x270
[  464.347211]  rcu_core+0x39c/0xc80
[  464.347800]  __do_softirq+0x10d/0x3da

[  464.348919] The buggy address belongs to the object at
ffff888155e58000
                which belongs to the cache kmalloc-256 of size 256
[  464.350222] The buggy address is located 208 bytes inside of
                256-byte region [ffff888155e58000, ffff888155e58100)
[  464.351575] The buggy address belongs to the page:
[  464.352333] page:ffffea0005579600 refcount:1 mapcount:0
mapping:ffff88815a803400 index:0x0 compound_mapcount: 0
[  464.353583] flags: 0x200000000010200(slab|head)
[  464.354209] raw: 0200000000010200 ffffea0005576200 0000000400000004
ffff88815a803400
[  464.355353] raw: 0000000000000000 0000000080100010 00000001ffffffff
0000000000000000
[  464.356458] page dumped because: kasan: bad access detected

[  464.367005] Memory state around the buggy address:
[  464.367787]  ffff888155e57f80: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[  464.368877]  ffff888155e58000: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  464.369967] >ffff888155e58080: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  464.371111]                                                  ^
[  464.371775]  ffff888155e58100: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[  464.372893]  ffff888155e58180: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[  464.373983] ==================================================================

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Reviewed-by: Aurelien Aptel <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Feb 6, 2020
[ Upstream commit 32546a9 ]

This patch moves the final part of the cifsFileInfo_put() logic where we
need a write lock on lock_sem to be processed in a separate thread that
holds no other locks.
This is to prevent deadlocks like the one below:

> there are 6 processes looping to while trying to down_write
> cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from
> cifs_new_fileinfo
>
> and there are 5 other processes which are blocked, several of them
> waiting on either PG_writeback or PG_locked (which are both set), all
> for the same page of the file
>
> 2 inode_lock() (inode->i_rwsem) for the file
> 1 wait_on_page_writeback() for the page
> 1 down_read(inode->i_rwsem) for the inode of the directory
> 1 inode_lock()(inode->i_rwsem) for the inode of the directory
> 1 __lock_page
>
>
> so processes are blocked waiting on:
>   page flags PG_locked and PG_writeback for one specific page
>   inode->i_rwsem for the directory
>   inode->i_rwsem for the file
>   cifsInodeInflock_sem
>
>
>
> here are the more gory details (let me know if I need to provide
> anything more/better):
>
> [0 00:48:22.765] [UN]  PID: 8863   TASK: ffff8c691547c5c0  CPU: 3
> COMMAND: "reopen_file"
>  #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095
>  #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df
>  #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7
>  #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d
>  #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d
>  #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33
>  #6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6
>  #7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315
> * (I think legitimize_path is bogus)
>
> in path_openat
>         } else {
>                 const char *s = path_init(nd, flags);
>                 while (!(error = link_path_walk(s, nd)) &&
>                         (error = do_last(nd, file, op)) > 0) {  <<<<
>
> do_last:
>         if (open_flag & O_CREAT)
>                 inode_lock(dir->d_inode);  <<<<
>         else
> so it's trying to take inode->i_rwsem for the directory
>
>      DENTRY           INODE           SUPERBLK     TYPE PATH
> ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR  /mnt/vm1_smb/
> inode.i_rwsem is ffff8c691158efc0
>
> <struct rw_semaphore 0xffff8c691158efc0>:
>         owner: <struct task_struct 0xffff8c6914275d00> (UN -   8856 -
> reopen_file), counter: 0x0000000000000003
>         waitlist: 2
>         0xffff9965007e3c90     8863   reopen_file      UN 0  1:29:22.926
>   RWSEM_WAITING_FOR_WRITE
>         0xffff996500393e00     9802   ls               UN 0  1:17:26.700
>   RWSEM_WAITING_FOR_READ
>
>
> the owner of the inode.i_rwsem of the directory is:
>
> [0 00:00:00.109] [UN]  PID: 8856   TASK: ffff8c6914275d00  CPU: 3
> COMMAND: "reopen_file"
>  #0 [ffff99650065b828] __schedule at ffffffff9b6e6095
>  #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df
>  #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89
>  #3 [ffff99650065b940] msleep at ffffffff9af573a9
>  #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
>  #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs]
>  #6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs]
>  #7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd
>  #8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs]
>  #9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs]
> #10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs]
> #11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7
> #12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33
> #13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6
> #14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315
>
> cifs_launder_page is for page 0xffffd1e2c07d2480
>
> crash> page.index,mapping,flags 0xffffd1e2c07d2480
>       index = 0x8
>       mapping = 0xffff8c68f3cd0db0
>   flags = 0xfffffc0008095
>
>   PAGE-FLAG       BIT  VALUE
>   PG_locked         0  0000001
>   PG_uptodate       2  0000004
>   PG_lru            4  0000010
>   PG_waiters        7  0000080
>   PG_writeback     15  0008000
>
>
> inode is ffff8c68f3cd0c40
> inode.i_rwsem is ffff8c68f3cd0ce0
>      DENTRY           INODE           SUPERBLK     TYPE PATH
> ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG
> /mnt/vm1_smb/testfile.8853
>
>
> this process holds the inode->i_rwsem for the parent directory, is
> laundering a page attached to the inode of the file it's opening, and in
> _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem
> for the file itself.
>
>
> <struct rw_semaphore 0xffff8c68f3cd0ce0>:
>         owner: <struct task_struct 0xffff8c6914272e80> (UN -   8854 -
> reopen_file), counter: 0x0000000000000003
>         waitlist: 1
>         0xffff9965005dfd80     8855   reopen_file      UN 0  1:29:22.912
>   RWSEM_WAITING_FOR_WRITE
>
> this is the inode.i_rwsem for the file
>
> the owner:
>
> [0 00:48:22.739] [UN]  PID: 8854   TASK: ffff8c6914272e80  CPU: 2
> COMMAND: "reopen_file"
>  #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095
>  #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df
>  #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2
>  #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f
>  #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf
>  #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c
>  #6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs]
>  #7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4
>  #8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a
>  #9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs]
> #10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd
> #11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35
> #12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9
> #13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315
>
> the process holds the inode->i_rwsem for the file to which it's writing,
> and is trying to __lock_page for the same page as in the other processes
>
>
> the other tasks:
> [0 00:00:00.028] [UN]  PID: 8859   TASK: ffff8c6915479740  CPU: 2
> COMMAND: "reopen_file"
>  #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095
>  #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df
>  #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89
>  #3 [ffff9965007b3af0] msleep at ffffffff9af573a9
>  #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs]
>  #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs]
>  #6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a
>  #7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f
>  #8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33
>  #9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6
> #10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315
>
> this is opening the file, and is trying to down_write cinode->lock_sem
>
>
> [0 00:00:00.041] [UN]  PID: 8860   TASK: ffff8c691547ae80  CPU: 2
> COMMAND: "reopen_file"
> [0 00:00:00.057] [UN]  PID: 8861   TASK: ffff8c6915478000  CPU: 3
> COMMAND: "reopen_file"
> [0 00:00:00.059] [UN]  PID: 8858   TASK: ffff8c6914271740  CPU: 2
> COMMAND: "reopen_file"
> [0 00:00:00.109] [UN]  PID: 8862   TASK: ffff8c691547dd00  CPU: 6
> COMMAND: "reopen_file"
>  #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095
>  #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df
>  #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89
>  #3 [ffff9965007c3d90] msleep at ffffffff9af573a9
>  #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
>  #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs]
>  #6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e
>  #7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614
>  #8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f
>  #9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c
>
> closing the file, and trying to down_write cifsi->lock_sem
>
>
> [0 00:48:22.839] [UN]  PID: 8857   TASK: ffff8c6914270000  CPU: 7
> COMMAND: "reopen_file"
>  #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095
>  #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df
>  #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2
>  #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6
>  #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028
>  #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165
>  #6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs]
>  #7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1
>  #8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e
>  #9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315
>
> in __filemap_fdatawait_range
>                         wait_on_page_writeback(page);
> for the same page of the file
>
>
>
> [0 00:48:22.718] [UN]  PID: 8855   TASK: ffff8c69142745c0  CPU: 7
> COMMAND: "reopen_file"
>  #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095
>  #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df
>  #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7
>  #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs]
>  #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd
>  #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35
>  #6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9
>  #7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315
>
>         inode_lock(inode);
>
>
> and one 'ls' later on, to see whether the rest of the mount is available
> (the test file is in the root, so we get blocked up on the directory
> ->i_rwsem), so the entire mount is unavailable
>
> [0 00:36:26.473] [UN]  PID: 9802   TASK: ffff8c691436ae80  CPU: 4
> COMMAND: "ls"
>  #0 [ffff996500393d28] __schedule at ffffffff9b6e6095
>  #1 [ffff996500393db8] schedule at ffffffff9b6e64df
>  #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421
>  #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2
>  #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56
>  #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c
>  #6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6
>  #7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315
>
> in iterate_dir:
>         if (shared)
>                 res = down_read_killable(&inode->i_rwsem);  <<<<
>         else
>                 res = down_write_killable(&inode->i_rwsem);
>

Reported-by: Frank Sorenson <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Feb 6, 2020
[ Upstream commit 6abf572 ]

Running stress-test test_2 in mtd-utils on ubi device, sometimes we can
get following oops message:

  BUG: unable to handle page fault for address: ffffffff00000140
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 280a067 P4D 280a067 PUD 0
  Oops: 0000 [#1] SMP
  CPU: 0 PID: 60 Comm: kworker/u16:1 Kdump: loaded Not tainted 5.2.0 #13
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0
  -0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  Workqueue: writeback wb_workfn (flush-ubifs_0_0)
  RIP: 0010:rb_next_postorder+0x2e/0xb0
  Code: 80 db 03 01 48 85 ff 0f 84 97 00 00 00 48 8b 17 48 83 05 bc 80 db
  03 01 48 83 e2 fc 0f 84 82 00 00 00 48 83 05 b2 80 db 03 01 <48> 3b 7a
  10 48 89 d0 74 02 f3 c3 48 8b 52 08 48 83 05 a3 80 db 03
  RSP: 0018:ffffc90000887758 EFLAGS: 00010202
  RAX: ffff888129ae4700 RBX: ffff888138b08400 RCX: 0000000080800001
  RDX: ffffffff00000130 RSI: 0000000080800024 RDI: ffff888138b08400
  RBP: ffff888138b08400 R08: ffffea0004a6b920 R09: 0000000000000000
  R10: ffffc90000887740 R11: 0000000000000001 R12: ffff888128d48000
  R13: 0000000000000800 R14: 000000000000011e R15: 00000000000007c8
  FS:  0000000000000000(0000) GS:ffff88813ba00000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffff00000140 CR3: 000000013789d000 CR4: 00000000000006f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
    destroy_old_idx+0x5d/0xa0 [ubifs]
    ubifs_tnc_start_commit+0x4fe/0x1380 [ubifs]
    do_commit+0x3eb/0x830 [ubifs]
    ubifs_run_commit+0xdc/0x1c0 [ubifs]

Above Oops are due to the slab-out-of-bounds happened in do-while of
function layout_in_gaps indirectly called by ubifs_tnc_start_commit. In
function layout_in_gaps, there is a do-while loop placing index nodes
into the gaps created by obsolete index nodes in non-empty index LEBs
until rest index nodes can totally be placed into pre-allocated empty
LEBs. @c->gap_lebs points to a memory area(integer array) which records
LEB numbers used by 'in-the-gaps' method. Whenever a fitable index LEB
is found, corresponding lnum will be incrementally written into the
memory area pointed by @c->gap_lebs. The size
((@c->lst.idx_lebs + 1) * sizeof(int)) of memory area is allocated before
do-while loop and can not be changed in the loop. But @c->lst.idx_lebs
could be increased by function ubifs_change_lp (called by
layout_leb_in_gaps->ubifs_find_dirty_idx_leb->get_idx_gc_leb) during the
loop. So, sometimes oob happens when number of cycles in do-while loop
exceeds the original value of @c->lst.idx_lebs. See detail in
https://bugzilla.kernel.org/show_bug.cgi?id=204229.
This patch fixes oob in layout_in_gaps.

Signed-off-by: Zhihao Cheng <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
fmhess pushed a commit to fmhess/linux-socfpga that referenced this pull request Mar 8, 2020
It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 altera-fpga#3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 altera-fpga#4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 altera-fpga#5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 altera-fpga#6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 altera-fpga#7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 altera-fpga#8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 altera-fpga#9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
altera-fpga#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
altera-fpga#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
altera-fpga#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
altera-fpga#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David Howells <[email protected]>
tthayer-intel pushed a commit that referenced this pull request May 14, 2020
commit 7cf64b1 upstream.

vt_in_use() dereferences console_driver->ttys[i] without proper locking.
This is broken because the tty can be closed and freed concurrently.

We could fix this by using 'READ_ONCE(console_driver->ttys[i]) != NULL'
and skipping the check of tty_struct::count.  But, looking at
console_driver->ttys[i] isn't really appropriate anyway because even if
it is NULL the tty can still be in the process of being closed.

Instead, fix it by making vt_in_use() require console_lock() and check
whether the vt is allocated and has port refcount > 1.  This works since
following the patch "vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use
virtual console" the port refcount is incremented while the vt is open.

Reproducer (very unreliable, but it worked for me after a few minutes):

	#include <fcntl.h>
	#include <linux/vt.h>

	int main()
	{
		int fd, nproc;
		struct vt_stat state;
		char ttyname[16];

		fd = open("/dev/tty10", O_RDONLY);
		for (nproc = 1; nproc < 8; nproc *= 2)
			fork();
		for (;;) {
			sprintf(ttyname, "/dev/tty%d", rand() % 8);
			close(open(ttyname, O_RDONLY));
			ioctl(fd, VT_GETSTATE, &state);
		}
	}

KASAN report:

	BUG: KASAN: use-after-free in vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
	BUG: KASAN: use-after-free in vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
	Read of size 4 at addr ffff888065722468 by task syz-vt2/132

	CPU: 0 PID: 132 Comm: syz-vt2 Not tainted 5.6.0-rc5-00130-g089b6d3654916 #13
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
	Call Trace:
	 [...]
	 vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
	 vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
	 tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660
	 [...]

	Allocated by task 136:
	 [...]
	 kzalloc include/linux/slab.h:669 [inline]
	 alloc_tty_struct+0x96/0x8a0 drivers/tty/tty_io.c:2982
	 tty_init_dev+0x23/0x350 drivers/tty/tty_io.c:1334
	 tty_open_by_driver drivers/tty/tty_io.c:1987 [inline]
	 tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035
	 [...]

	Freed by task 41:
	 [...]
	 kfree+0xbf/0x200 mm/slab.c:3757
	 free_tty_struct+0x8d/0xb0 drivers/tty/tty_io.c:177
	 release_one_tty+0x22d/0x2f0 drivers/tty/tty_io.c:1468
	 process_one_work+0x7f1/0x14b0 kernel/workqueue.c:2264
	 worker_thread+0x8b/0xc80 kernel/workqueue.c:2410
	 [...]

Fixes: 4001d7b ("vt: push down the tty lock so we can see what is left to tackle")
Cc: <[email protected]> # v3.4+
Acked-by: Jiri Slaby <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
tthayer-intel pushed a commit that referenced this pull request May 14, 2020
[ Upstream commit 1bc7896 ]

When experimenting with bpf_send_signal() helper in our production
environment (5.2 based), we experienced a deadlock in NMI mode:
   #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
   #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
   #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
   #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
   #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
  #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
  #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
  #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
  #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
  #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
  #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
  #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
  #17 [ffffc9002219fc90] schedule at ffffffff81a3e219
  #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
  #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
  #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
  #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
  #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
  #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068

The above call stack is actually very similar to an issue
reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") by Song Liu. The only difference is
bpf_send_signal() helper instead of bpf_get_stack() helper.

The above deadlock is triggered with a perf_sw_event.
Similar to Commit eac9153, the below almost identical reproducer
used tracepoint point sched/sched_switch so the issue can be easily caught.
  /* stress_test.c */
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/mman.h>
  #include <pthread.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>

  #define THREAD_COUNT 1000
  char *filename;
  void *worker(void *p)
  {
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd < 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&ts, NULL);
        }
        close(fd);
        return NULL;
  }

  int main(int argc, char *argv[])
  {
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc < 2)
                return 0;

        filename = argv[1];

        for (i = 0; i < THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i < THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
  }
and the following command:
  1. run `stress_test /bin/ls` in one windown
  2. hack bcc trace.py with the following change:
#     --- a/tools/trace.py
#     +++ b/tools/trace.py
     @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
              __data.tgid = __tgid;
              __data.pid = __pid;
              bpf_get_current_comm(&__data.comm, sizeof(__data.comm));
     +        bpf_send_signal(10);
      %s
      %s
              %s.perf_submit(%s, &__data, sizeof(__data));
  3. in a different window run
     ./trace.py -p $(pidof stress_test) t:sched:sched_switch

The deadlock can be reproduced in our production system.

Similar to Song's fix, the fix is to delay sending signal if
irqs is disabled to avoid deadlocks involving with rq_lock.
With this change, my above stress-test in our production system
won't cause deadlock any more.

I also implemented a scale-down version of reproducer in the
selftest (a subsequent commit). With latest bpf-next,
it complains for the following potential deadlock.
  [   32.832450] -> #1 (&p->pi_lock){-.-.}:
  [   32.833100]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.833696]        task_rq_lock+0x2c/0xa0
  [   32.834182]        task_sched_runtime+0x59/0xd0
  [   32.834721]        thread_group_cputime+0x250/0x270
  [   32.835304]        thread_group_cputime_adjusted+0x2e/0x70
  [   32.835959]        do_task_stat+0x8a7/0xb80
  [   32.836461]        proc_single_show+0x51/0xb0
  ...
  [   32.839512] -> #0 (&(&sighand->siglock)->rlock){....}:
  [   32.840275]        __lock_acquire+0x1358/0x1a20
  [   32.840826]        lock_acquire+0xc7/0x1d0
  [   32.841309]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.841916]        __lock_task_sighand+0x79/0x160
  [   32.842465]        do_send_sig_info+0x35/0x90
  [   32.842977]        bpf_send_signal+0xa/0x10
  [   32.843464]        bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
  [   32.844301]        trace_call_bpf+0x115/0x270
  [   32.844809]        perf_trace_run_bpf_submit+0x4a/0xc0
  [   32.845411]        perf_trace_sched_switch+0x10f/0x180
  [   32.846014]        __schedule+0x45d/0x880
  [   32.846483]        schedule+0x5f/0xd0
  ...

  [   32.853148] Chain exists of:
  [   32.853148]   &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock
  [   32.853148]
  [   32.854451]  Possible unsafe locking scenario:
  [   32.854451]
  [   32.855173]        CPU0                    CPU1
  [   32.855745]        ----                    ----
  [   32.856278]   lock(&rq->lock);
  [   32.856671]                                lock(&p->pi_lock);
  [   32.857332]                                lock(&rq->lock);
  [   32.857999]   lock(&(&sighand->siglock)->rlock);

  Deadlock happens on CPU0 when it tries to acquire &sighand->siglock
  but it has been held by CPU1 and CPU1 tries to grab &rq->lock
  and cannot get it.

  This is not exactly the callstack in our production environment,
  but sympotom is similar and both locks are using spin_lock_irqsave()
  to acquire the lock, and both involves rq_lock. The fix to delay
  sending signal when irq is disabled also fixed this issue.

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
rgongatintel pushed a commit that referenced this pull request Jul 23, 2020
[ Upstream commit 7b01b72 ]

When the nvmem framework is enabled, a nvmem device is created per mtd
device/partition.

It is not uncommon that a device can have multiple mtd devices with
partitions that have the same name. Eg, when there DT overlay is allowed
and the same device with mtd is attached twice.

Under that circumstances, the mtd fails to register due to a name
duplication on the nvmem framework.

With this patch we use the mtdX name instead of the partition name,
which is unique.

[    8.948991] sysfs: cannot create duplicate filename '/bus/nvmem/devices/Production Data'
[    8.948992] CPU: 7 PID: 246 Comm: systemd-udevd Not tainted 5.5.0-qtec-standard #13
[    8.948993] Hardware name: AMD Dibbler/Dibbler, BIOS 05.22.04.0019 10/26/2019
[    8.948994] Call Trace:
[    8.948996]  dump_stack+0x50/0x70
[    8.948998]  sysfs_warn_dup.cold+0x17/0x2d
[    8.949000]  sysfs_do_create_link_sd.isra.0+0xc2/0xd0
[    8.949002]  bus_add_device+0x74/0x140
[    8.949004]  device_add+0x34b/0x850
[    8.949006]  nvmem_register.part.0+0x1bf/0x640
...
[    8.948926] mtd mtd8: Failed to register NVMEM device

Fixes: c4dfa25 ("mtd: add support for reading MTD devices via the nvmem API")
Signed-off-by: Ricardo Ribalda Delgado <[email protected]>
Acked-by: Miquel Raynal <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Sep 3, 2020
[ Upstream commit c2e929b ]

Booting a power9 server with hash MMU could trigger an undefined
behaviour because pud_offset(p4d, 0) will do,

0 >> (PAGE_SHIFT:16 + PTE_INDEX_SIZE:8 + H_PMD_INDEX_SIZE:10)

Fix it by converting pud_index() and friends to static inline
functions.

UBSAN: shift-out-of-bounds in arch/powerpc/mm/ptdump/ptdump.c:282:15
shift exponent 34 is too large for 32-bit type 'int'
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200303+ #13
Call Trace:
dump_stack+0xf4/0x164 (unreliable)
ubsan_epilogue+0x18/0x78
__ubsan_handle_shift_out_of_bounds+0x160/0x21c
walk_pagetables+0x2cc/0x700
walk_pud at arch/powerpc/mm/ptdump/ptdump.c:282
(inlined by) walk_pagetables at arch/powerpc/mm/ptdump/ptdump.c:311
ptdump_check_wx+0x8c/0xf0
mark_rodata_ro+0x48/0x80
kernel_init+0x74/0x194
ret_from_kernel_thread+0x5c/0x74

Suggested-by: Christophe Leroy <[email protected]>
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Sep 3, 2020
[ Upstream commit 8dbe4c5 ]

of_find_node_by_name() will do an of_node_put() on the "from" argument.
With CONFIG_OF_DYNAMIC enabled which checks for device_node reference
counts, we would be getting a warning like this:

[    6.347230] refcount_t: increment on 0; use-after-free.
[    6.352498] WARNING: CPU: 3 PID: 77 at lib/refcount.c:156
refcount_inc_checked+0x38/0x44
[    6.360601] Modules linked in:
[    6.363661] CPU: 3 PID: 77 Comm: kworker/3:1 Tainted: G        W
5.4.46-gb78b3e9956e6 #13
[    6.372546] Hardware name: BCM97278SV (DT)
[    6.376649] Workqueue: events deferred_probe_work_func
[    6.381796] pstate: 60000005 (nZCv daif -PAN -UAO)
[    6.386595] pc : refcount_inc_checked+0x38/0x44
[    6.391133] lr : refcount_inc_checked+0x38/0x44
...
[    6.478791] Call trace:
[    6.481243]  refcount_inc_checked+0x38/0x44
[    6.485433]  kobject_get+0x3c/0x4c
[    6.488840]  of_node_get+0x24/0x34
[    6.492247]  of_irq_find_parent+0x3c/0xe0
[    6.496263]  of_irq_parse_one+0xe4/0x1d0
[    6.500191]  irq_of_parse_and_map+0x44/0x84
[    6.504381]  bcm_sf2_sw_probe+0x22c/0x844
[    6.508397]  platform_drv_probe+0x58/0xa8
[    6.512413]  really_probe+0x238/0x3fc
[    6.516081]  driver_probe_device+0x11c/0x12c
[    6.520358]  __device_attach_driver+0xa8/0x100
[    6.524808]  bus_for_each_drv+0xb4/0xd0
[    6.528650]  __device_attach+0xd0/0x164
[    6.532493]  device_initial_probe+0x24/0x30
[    6.536682]  bus_probe_device+0x38/0x98
[    6.540524]  deferred_probe_work_func+0xa8/0xd4
[    6.545061]  process_one_work+0x178/0x288
[    6.549078]  process_scheduled_works+0x44/0x48
[    6.553529]  worker_thread+0x218/0x270
[    6.557285]  kthread+0xdc/0xe4
[    6.560344]  ret_from_fork+0x10/0x18
[    6.563925] ---[ end trace 68f65caf69bb152a ]---

Fix this by adding a of_node_get() to increment the reference count
prior to the call.

Fixes: afa3b59 ("net: dsa: bcm_sf2: Ensure correct sub-node is parsed")
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Oct 15, 2020
[ Upstream commit e24c644 ]

I compiled with AddressSanitizer and I had these memory leaks while I
was using the tep_parse_format function:

    Direct leak of 28 byte(s) in 4 object(s) allocated from:
        #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
        #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985
        #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140
        #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206
        #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291
        #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299
        #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849
        #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161
        #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207
        #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786
        #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285
        #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369
        #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335
        #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389
        #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431
        #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251
        #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284
        #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593
        #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727
        #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048
        #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127
        #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152
        #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252
        #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347
        #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461
        #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673
        #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

The token variable in the process_dynamic_array_len function is
allocated in the read_expect_type function, but is not freed before
calling the read_token function.

Free the token variable before calling read_token in order to plug the
leak.

Signed-off-by: Philippe Duplessis-Guindon <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Link: https://lore.kernel.org/linux-trace-devel/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Oct 15, 2020
[ Upstream commit 7fd5b25 ]

After commit 92cc68e ("drm/vblank: Use
spin_(un)lock_irq() in drm_crtc_vblank_on()") omapdrm locking is broken:

WARNING: inconsistent lock state
5.8.0-rc2-00483-g92cc68e35863 #13 Tainted: G        W
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
ea98222c (&dev->event_lock#2){?.+.}-{2:2}, at: drm_handle_vblank+0x4c/0x520 [drm]
{HARDIRQ-ON-W} state was registered at:
  trace_hardirqs_on+0x9c/0x1ec
  _raw_spin_unlock_irq+0x20/0x58
  omap_crtc_atomic_enable+0x54/0xa0 [omapdrm]
  drm_atomic_helper_commit_modeset_enables+0x218/0x270 [drm_kms_helper]
  omap_atomic_commit_tail+0x48/0xc4 [omapdrm]
  commit_tail+0x9c/0x190 [drm_kms_helper]
  drm_atomic_helper_commit+0x154/0x188 [drm_kms_helper]
  drm_client_modeset_commit_atomic+0x228/0x268 [drm]
  drm_client_modeset_commit_locked+0x60/0x1d0 [drm]
  drm_client_modeset_commit+0x24/0x40 [drm]
  drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xa8 [drm_kms_helper]
  drm_fb_helper_set_par+0x2c/0x5c [drm_kms_helper]
  drm_fb_helper_hotplug_event.part.0+0xa0/0xbc [drm_kms_helper]
  drm_kms_helper_hotplug_event+0x24/0x30 [drm_kms_helper]
  output_poll_execute+0x1a8/0x1c0 [drm_kms_helper]
  process_one_work+0x268/0x800
  worker_thread+0x30/0x4e0
  kthread+0x164/0x190
  ret_from_fork+0x14/0x20

The reason for this is that omapdrm calls drm_crtc_vblank_on() while
holding event_lock taken with spin_lock_irq().

It is not clear why drm_crtc_vblank_on() and drm_crtc_vblank_get() are
called while holding event_lock. I don't see any problem with moving
those calls outside the lock, which is what this patch does.

Signed-off-by: Tomi Valkeinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Laurent Pinchart <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Dec 17, 2020
[ Upstream commit b12eea5 ]

The evsel->unit borrows a pointer of pmu event or alias instead of
owns a string.  But tool event (duration_time) passes a result of
strdup() caused a leak.

It was found by ASAN during metric test:

  Direct leak of 210 byte(s) in 70 object(s) allocated from:
    #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
    #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414
    #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414
    #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439
    #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096
    #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141
    #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406
    #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393
    #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415
    #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498
    #10 0x559fbbc0109b in run_test tests/builtin-test.c:410
    #11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440
    #12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695
    #13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807
    #14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Dec 17, 2020
[ Upstream commit d26383d ]

The following leaks were detected by ASAN:

  Indirect leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
    #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
    #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
    #4 0x560578e07045 in test__pmu tests/pmu.c:155
    #5 0x560578de109b in run_test tests/builtin-test.c:410
    #6 0x560578de109b in test_and_print tests/builtin-test.c:440
    #7 0x560578de401a in __cmd_test tests/builtin-test.c:661
    #8 0x560578de401a in cmd_test tests/builtin-test.c:807
    #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: cff7f95 ("perf tests: Move pmu tests into separate object")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request Dec 17, 2020
[ Upstream commit ca2ef4f ]

A kernel built with KASAN && FTRACE_WITH_REGS && !MODULES, produces a
boot-time splat in the bowels of ftrace:

| [    0.000000] ftrace: allocating 32281 entries in 127 pages
| [    0.000000] ------------[ cut here ]------------
| [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2019 ftrace_bug+0x27c/0x328
| [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc3-00008-g7f08ae53a7e3 #13
| [    0.000000] Hardware name: linux,dummy-virt (DT)
| [    0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO)
| [    0.000000] pc : ftrace_bug+0x27c/0x328
| [    0.000000] lr : ftrace_init+0x640/0x6cc
| [    0.000000] sp : ffffa000120e7e00
| [    0.000000] x29: ffffa000120e7e00 x28: ffff00006ac01b10
| [    0.000000] x27: ffff00006ac898c0 x26: dfffa00000000000
| [    0.000000] x25: ffffa000120ef290 x24: ffffa0001216df40
| [    0.000000] x23: 000000000000018d x22: ffffa0001244c700
| [    0.000000] x21: ffffa00011bf393c x20: ffff00006ac898c0
| [    0.000000] x19: 00000000ffffffff x18: 0000000000001584
| [    0.000000] x17: 0000000000001540 x16: 0000000000000007
| [    0.000000] x15: 0000000000000000 x14: ffffa00010432770
| [    0.000000] x13: ffff940002483519 x12: 1ffff40002483518
| [    0.000000] x11: 1ffff40002483518 x10: ffff940002483518
| [    0.000000] x9 : dfffa00000000000 x8 : 0000000000000001
| [    0.000000] x7 : ffff940002483519 x6 : ffffa0001241a8c0
| [    0.000000] x5 : ffff940002483519 x4 : ffff940002483519
| [    0.000000] x3 : ffffa00011780870 x2 : 0000000000000001
| [    0.000000] x1 : 1fffe0000d591318 x0 : 0000000000000000
| [    0.000000] Call trace:
| [    0.000000]  ftrace_bug+0x27c/0x328
| [    0.000000]  ftrace_init+0x640/0x6cc
| [    0.000000]  start_kernel+0x27c/0x654
| [    0.000000] random: get_random_bytes called from print_oops_end_marker+0x30/0x60 with crng_init=0
| [    0.000000] ---[ end trace 0000000000000000 ]---
| [    0.000000] ftrace faulted on writing
| [    0.000000] [<ffffa00011bf393c>] _GLOBAL__sub_D_65535_0___tracepoint_initcall_level+0x4/0x28
| [    0.000000] Initializing ftrace call sites
| [    0.000000] ftrace record flags: 0
| [    0.000000]  (0)
| [    0.000000]  expected tramp: ffffa000100b3344

This is due to an unfortunate combination of several factors.

Building with KASAN results in the compiler generating anonymous
functions to register/unregister global variables against the shadow
memory. These functions are placed in .text.startup/.text.exit, and
given mangled names like _GLOBAL__sub_{I,D}_65535_0_$OTHER_SYMBOL. The
kernel linker script places these in .init.text and .exit.text
respectively, which are both discarded at runtime as part of initmem.

Building with FTRACE_WITH_REGS uses -fpatchable-function-entry=2, which
also instruments KASAN's anonymous functions. When these are discarded
with the rest of initmem, ftrace removes dangling references to these
call sites.

Building without MODULES implicitly disables STRICT_MODULE_RWX, and
causes arm64's patch_map() function to treat any !core_kernel_text()
symbol as something that can be modified in-place. As core_kernel_text()
is only true for .text and .init.text, with the latter depending on
system_state < SYSTEM_RUNNING, we'll treat .exit.text as something that
can be patched in-place. However, .exit.text is mapped read-only.

Hence in this configuration the ftrace init code blows up while trying
to patch one of the functions generated by KASAN.

We could try to filter out the call sites in .exit.text rather than
initializing them, but this would be inconsistent with how we handle
.init.text, and requires hooking into core bits of ftrace. The behaviour
of patch_map() is also inconsistent today, so instead let's clean that
up and have it consistently handle .exit.text.

This patch teaches patch_map() to handle .exit.text at init time,
preventing the boot-time splat above. The flow of patch_map() is
reworked to make the logic clearer and minimize redundant
conditionality.

Fixes: 3b23e49 ("arm64: implement ftrace with regs")
Signed-off-by: Mark Rutland <[email protected]>
Cc: Amit Daniel Kachhap <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Torsten Duwe <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rgongatintel pushed a commit that referenced this pull request May 27, 2021
commit b12de52 upstream.

[BUG]
When running btrfs/072 with only one online CPU, it has a pretty high
chance to fail:

#  btrfs/072 12s ... _check_dmesg: something found in dmesg (see xfstests-dev/results//btrfs/072.dmesg)
#  - output mismatch (see xfstests-dev/results//btrfs/072.out.bad)
#      --- tests/btrfs/072.out     2019-10-22 15:18:14.008965340 +0800
#      +++ /xfstests-dev/results//btrfs/072.out.bad      2019-11-14 15:56:45.877152240 +0800
#      @@ -1,2 +1,3 @@
#       QA output created by 072
#       Silence is golden
#      +Scrub find errors in "-m dup -d single" test
#      ...

And with the following call trace:

  BTRFS info (device dm-5): scrub: started on devid 1
  ------------[ cut here ]------------
  BTRFS: Transaction aborted (error -27)
  WARNING: CPU: 0 PID: 55087 at fs/btrfs/block-group.c:1890 btrfs_create_pending_block_groups+0x3e6/0x470 [btrfs]
  CPU: 0 PID: 55087 Comm: btrfs Tainted: G        W  O      5.4.0-rc1-custom+ #13
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:btrfs_create_pending_block_groups+0x3e6/0x470 [btrfs]
  Call Trace:
   __btrfs_end_transaction+0xdb/0x310 [btrfs]
   btrfs_end_transaction+0x10/0x20 [btrfs]
   btrfs_inc_block_group_ro+0x1c9/0x210 [btrfs]
   scrub_enumerate_chunks+0x264/0x940 [btrfs]
   btrfs_scrub_dev+0x45c/0x8f0 [btrfs]
   btrfs_ioctl+0x31a1/0x3fb0 [btrfs]
   do_vfs_ioctl+0x636/0xaa0
   ksys_ioctl+0x67/0x90
   __x64_sys_ioctl+0x43/0x50
   do_syscall_64+0x79/0xe0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  ---[ end trace 166c865cec7688e7 ]---

[CAUSE]
The error number -27 is -EFBIG, returned from the following call chain:
btrfs_end_transaction()
|- __btrfs_end_transaction()
   |- btrfs_create_pending_block_groups()
      |- btrfs_finish_chunk_alloc()
         |- btrfs_add_system_chunk()

This happens because we have used up all space of
btrfs_super_block::sys_chunk_array.

The root cause is, we have the following bad loop of creating tons of
system chunks:

1. The only SYSTEM chunk is being scrubbed
   It's very common to have only one SYSTEM chunk.
2. New SYSTEM bg will be allocated
   As btrfs_inc_block_group_ro() will check if we have enough space
   after marking current bg RO. If not, then allocate a new chunk.
3. New SYSTEM bg is still empty, will be reclaimed
   During the reclaim, we will mark it RO again.
4. That newly allocated empty SYSTEM bg get scrubbed
   We go back to step 2, as the bg is already mark RO but still not
   cleaned up yet.

If the cleaner kthread doesn't get executed fast enough (e.g. only one
CPU), then we will get more and more empty SYSTEM chunks, using up all
the space of btrfs_super_block::sys_chunk_array.

[FIX]
Since scrub/dev-replace doesn't always need to allocate new extent,
especially chunk tree extent, so we don't really need to do chunk
pre-allocation.

To break above spiral, here we introduce a new parameter to
btrfs_inc_block_group(), @do_chunk_alloc, which indicates whether we
need extra chunk pre-allocation.

For relocation, we pass @do_chunk_alloc=true, while for scrub, we pass
@do_chunk_alloc=false.
This should keep unnecessary empty chunks from popping up for scrub.

Also, since there are two parameters for btrfs_inc_block_group_ro(),
add more comment for it.

Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
rebeccasf pushed a commit that referenced this pull request Sep 30, 2021
[ Upstream commit 5796015 ]

When TEE target mirrors traffic to another interface, sk_buff may
not have enough headroom to be processed correctly.
ip_finish_output2() detect this situation for ipv4 and allocates
new skb with enogh headroom. However ipv6 lacks this logic in
ip_finish_output2 and it leads to skb_under_panic:

 skbuff: skb_under_panic: text:ffffffffc0866ad4 len:96 put:24
 head:ffff97be85e31800 data:ffff97be85e317f8 tail:0x58 end:0xc0 dev:gre0
 ------------[ cut here ]------------
 kernel BUG at net/core/skbuff.c:110!
 invalid opcode: 0000 [#1] SMP PTI
 CPU: 2 PID: 393 Comm: kworker/2:2 Tainted: G           OE     5.13.0 #13
 Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:skb_panic+0x48/0x4a
 Call Trace:
  skb_push.cold.111+0x10/0x10
  ipgre_header+0x24/0xf0 [ip_gre]
  neigh_connected_output+0xae/0xf0
  ip6_finish_output2+0x1a8/0x5a0
  ip6_output+0x5c/0x110
  nf_dup_ipv6+0x158/0x1000 [nf_dup_ipv6]
  tee_tg6+0x2e/0x40 [xt_TEE]
  ip6t_do_table+0x294/0x470 [ip6_tables]
  nf_hook_slow+0x44/0xc0
  nf_hook.constprop.34+0x72/0xe0
  ndisc_send_skb+0x20d/0x2e0
  ndisc_send_ns+0xd1/0x210
  addrconf_dad_work+0x3c8/0x540
  process_one_work+0x1d1/0x370
  worker_thread+0x30/0x390
  kthread+0x116/0x130
  ret_from_fork+0x22/0x30

Signed-off-by: Vasily Averin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
rebeccasf pushed a commit that referenced this pull request Sep 30, 2021
[ Upstream commit c4eb1f4 ]

In __htab_map_lookup_and_delete_batch(), hash buckets are iterated
over to count the number of elements in each bucket (bucket_size).
If bucket_size is large enough, the multiplication to calculate
kvmalloc() size could overflow, resulting in out-of-bounds write
as reported by KASAN:

  [...]
  [  104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60
  [  104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112
  [  104.986889]
  [  104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13
  [  104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
  [  104.988104] Call Trace:
  [  104.988410]  dump_stack_lvl+0x34/0x44
  [  104.988706]  print_address_description.constprop.0+0x21/0x140
  [  104.988991]  ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
  [  104.989327]  ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
  [  104.989622]  kasan_report.cold+0x7f/0x11b
  [  104.989881]  ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
  [  104.990239]  kasan_check_range+0x17c/0x1e0
  [  104.990467]  memcpy+0x39/0x60
  [  104.990670]  __htab_map_lookup_and_delete_batch+0x5ce/0xb60
  [  104.990982]  ? __wake_up_common+0x4d/0x230
  [  104.991256]  ? htab_of_map_free+0x130/0x130
  [  104.991541]  bpf_map_do_batch+0x1fb/0x220
  [...]

In hashtable, if the elements' keys have the same jhash() value, the
elements will be put into the same bucket. By putting a lot of elements
into a single bucket, the value of bucket_size can be increased to
trigger the integer overflow.

Triggering the overflow is possible for both callers with CAP_SYS_ADMIN
and callers without CAP_SYS_ADMIN.

It will be trivial for a caller with CAP_SYS_ADMIN to intentionally
reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set
the random seed passed to jhash() to 0, it will be easy for the caller
to prepare keys which will be hashed into the same value, and thus put
all the elements into the same bucket.

If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be
used. However, it will be still technically possible to trigger the
overflow, by guessing the random seed value passed to jhash() (32bit)
and repeating the attempt to trigger the overflow. In this case,
the probability to trigger the overflow will be low and will take
a very long time.

Fix the integer overflow by calling kvmalloc_array() instead of
kvmalloc() to allocate memory.

Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map")
Signed-off-by: Tatsuhiko Yasumatsu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
rebeccasf pushed a commit that referenced this pull request Oct 14, 2021
[upstream commit c4eb1f4]

In __htab_map_lookup_and_delete_batch(), hash buckets are iterated
over to count the number of elements in each bucket (bucket_size).
If bucket_size is large enough, the multiplication to calculate
kvmalloc() size could overflow, resulting in out-of-bounds write
as reported by KASAN:

  [...]
  [  104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60
  [  104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112
  [  104.986889]
  [  104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13
  [  104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
  [  104.988104] Call Trace:
  [  104.988410]  dump_stack_lvl+0x34/0x44
  [  104.988706]  print_address_description.constprop.0+0x21/0x140
  [  104.988991]  ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
  [  104.989327]  ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
  [  104.989622]  kasan_report.cold+0x7f/0x11b
  [  104.989881]  ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60
  [  104.990239]  kasan_check_range+0x17c/0x1e0
  [  104.990467]  memcpy+0x39/0x60
  [  104.990670]  __htab_map_lookup_and_delete_batch+0x5ce/0xb60
  [  104.990982]  ? __wake_up_common+0x4d/0x230
  [  104.991256]  ? htab_of_map_free+0x130/0x130
  [  104.991541]  bpf_map_do_batch+0x1fb/0x220
  [...]

In hashtable, if the elements' keys have the same jhash() value, the
elements will be put into the same bucket. By putting a lot of elements
into a single bucket, the value of bucket_size can be increased to
trigger the integer overflow.

Triggering the overflow is possible for both callers with CAP_SYS_ADMIN
and callers without CAP_SYS_ADMIN.

It will be trivial for a caller with CAP_SYS_ADMIN to intentionally
reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set
the random seed passed to jhash() to 0, it will be easy for the caller
to prepare keys which will be hashed into the same value, and thus put
all the elements into the same bucket.

If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be
used. However, it will be still technically possible to trigger the
overflow, by guessing the random seed value passed to jhash() (32bit)
and repeating the attempt to trigger the overflow. In this case,
the probability to trigger the overflow will be low and will take
a very long time.

Fix the integer overflow by calling kvmalloc_array() instead of
kvmalloc() to allocate memory.

Fixes: 0579963 ("bpf: Add batch ops to all htab bpf map")
Signed-off-by: Tatsuhiko Yasumatsu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
johanbhussin pushed a commit that referenced this pull request Oct 28, 2021
commit 41d5854 upstream.

I got several memory leak reports from Asan with a simple command.  It
was because VDSO is not released due to the refcount.  Like in
__dsos_addnew_id(), it should put the refcount after adding to the list.

  $ perf record true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]

  =================================================================
  ==692599==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 439 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256
    #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132
    #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347
    #4 0x559bce50826c in map__new util/map.c:175
    #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #6 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #7 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #10 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #11 0x559bce519bea in perf_session__process_events util/session.c:2297
    #12 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #13 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #14 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #15 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #16 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #17 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #18 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #19 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  Indirect leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169
    #2 0x559bce50821b in map__new util/map.c:168
    #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #4 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #8 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #9 0x559bce519bea in perf_session__process_events util/session.c:2297
    #10 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #11 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #12 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #13 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #14 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #15 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #16 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #17 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).

Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Hanjun Guo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
johanbhussin pushed a commit that referenced this pull request Oct 28, 2021
commit 57f0ff0 upstream.

It's later supposed to be either a correct address or NULL. Without the
initialization, it may contain an undefined value which results in the
following segmentation fault:

  # perf top --sort comm -g --ignore-callees=do_idle

terminates with:

  #0  0x00007ffff56b7685 in __strlen_avx2 () from /lib64/libc.so.6
  #1  0x00007ffff55e3802 in strdup () from /lib64/libc.so.6
  #2  0x00005555558cb139 in hist_entry__init (callchain_size=<optimized out>, sample_self=true, template=0x7fffde7fb110, he=0x7fffd801c250) at util/hist.c:489
  #3  hist_entry__new (template=template@entry=0x7fffde7fb110, sample_self=sample_self@entry=true) at util/hist.c:564
  #4  0x00005555558cb4ba in hists__findnew_entry (hists=hists@entry=0x5555561d9e38, entry=entry@entry=0x7fffde7fb110, al=al@entry=0x7fffde7fb420,
      sample_self=sample_self@entry=true) at util/hist.c:657
  #5  0x00005555558cba1b in __hists__add_entry (hists=hists@entry=0x5555561d9e38, al=0x7fffde7fb420, sym_parent=<optimized out>, bi=bi@entry=0x0, mi=mi@entry=0x0,
      sample=sample@entry=0x7fffde7fb4b0, sample_self=true, ops=0x0, block_info=0x0) at util/hist.c:288
  #6  0x00005555558cbb70 in hists__add_entry (sample_self=true, sample=0x7fffde7fb4b0, mi=0x0, bi=0x0, sym_parent=<optimized out>, al=<optimized out>, hists=0x5555561d9e38)
      at util/hist.c:1056
  #7  iter_add_single_cumulative_entry (iter=0x7fffde7fb460, al=<optimized out>) at util/hist.c:1056
  #8  0x00005555558cc8a4 in hist_entry_iter__add (iter=iter@entry=0x7fffde7fb460, al=al@entry=0x7fffde7fb420, max_stack_depth=<optimized out>, arg=arg@entry=0x7fffffff7db0)
      at util/hist.c:1231
  #9  0x00005555557cdc9a in perf_event__process_sample (machine=<optimized out>, sample=0x7fffde7fb4b0, evsel=<optimized out>, event=<optimized out>, tool=0x7fffffff7db0)
      at builtin-top.c:842
  #10 deliver_event (qe=<optimized out>, qevent=<optimized out>) at builtin-top.c:1202
  #11 0x00005555558a9318 in do_flush (show_progress=false, oe=0x7fffffff80e0) at util/ordered-events.c:244
  #12 __ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP, timestamp=timestamp@entry=0) at util/ordered-events.c:323
  #13 0x00005555558a9789 in __ordered_events__flush (timestamp=<optimized out>, how=<optimized out>, oe=<optimized out>) at util/ordered-events.c:339
  #14 ordered_events__flush (how=OE_FLUSH__TOP, oe=0x7fffffff80e0) at util/ordered-events.c:341
  #15 ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP) at util/ordered-events.c:339
  #16 0x00005555557cd631 in process_thread (arg=0x7fffffff7db0) at builtin-top.c:1114
  #17 0x00007ffff7bb817a in start_thread () from /lib64/libpthread.so.0
  #18 0x00007ffff5656dc3 in clone () from /lib64/libc.so.6

If you look at the frame #2, the code is:

488	 if (he->srcline) {
489          he->srcline = strdup(he->srcline);
490          if (he->srcline == NULL)
491              goto err_rawdata;
492	 }

If he->srcline is not NULL (it is not NULL if it is uninitialized rubbish),
it gets strdupped and strdupping a rubbish random string causes the problem.

Also, if you look at the commit 1fb7d06, it adds the srcline property
into the struct, but not initializing it everywhere needed.

Committer notes:

Now I see, when using --ignore-callees=do_idle we end up here at line
2189 in add_callchain_ip():

2181         if (al.sym != NULL) {
2182                 if (perf_hpp_list.parent && !*parent &&
2183                     symbol__match_regex(al.sym, &parent_regex))
2184                         *parent = al.sym;
2185                 else if (have_ignore_callees && root_al &&
2186                   symbol__match_regex(al.sym, &ignore_callees_regex)) {
2187                         /* Treat this symbol as the root,
2188                            forgetting its callees. */
2189                         *root_al = al;
2190                         callchain_cursor_reset(cursor);
2191                 }
2192         }

And the al that doesn't have the ->srcline field initialized will be
copied to the root_al, so then, back to:

1211 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
1212                          int max_stack_depth, void *arg)
1213 {
1214         int err, err2;
1215         struct map *alm = NULL;
1216
1217         if (al)
1218                 alm = map__get(al->map);
1219
1220         err = sample__resolve_callchain(iter->sample, &callchain_cursor, &iter->parent,
1221                                         iter->evsel, al, max_stack_depth);
1222         if (err) {
1223                 map__put(alm);
1224                 return err;
1225         }
1226
1227         err = iter->ops->prepare_entry(iter, al);
1228         if (err)
1229                 goto out;
1230
1231         err = iter->ops->add_single_entry(iter, al);
1232         if (err)
1233                 goto out;
1234

That al at line 1221 is what hist_entry_iter__add() (called from
sample__resolve_callchain()) saw as 'root_al', and then:

        iter->ops->add_single_entry(iter, al);

will go on with al->srcline with a bogus value, I'll add the above
sequence to the cset and apply, thanks!

Signed-off-by: Michael Petlan <[email protected]>
CC: Milian Wolff <[email protected]>
Cc: Jiri Olsa <[email protected]>
Fixes: 1fb7d06 ("perf report Use srcline from callchain for hist entries")
Link: https //lore.kernel.org/r/[email protected]
Reported-by: Juri Lelli <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
johanbhussin pushed a commit that referenced this pull request Jan 20, 2022
commit df909df upstream.

ARM: kasan: Fix __get_user_check failure with kasan

In macro __get_user_check defined in arch/arm/include/asm/uaccess.h,
error code is store in register int __e(r0). When kasan is
enabled, assigning value to kernel address might trigger kasan check,
which unexpectedly overwrites r0 and causes undefined behavior on arm
kasan images.

One example is failure in do_futex and results in process soft lockup.
Log:
watchdog: BUG: soft lockup - CPU#0 stuck for 62946ms! [rs:main
Q:Reg:1151]
...
(__asan_store4) from (futex_wait_setup+0xf8/0x2b4)
(futex_wait_setup) from (futex_wait+0x138/0x394)
(futex_wait) from (do_futex+0x164/0xe40)
(do_futex) from (sys_futex_time32+0x178/0x230)
(sys_futex_time32) from (ret_fast_syscall+0x0/0x50)

The soft lockup happens in function futex_wait_setup. The reason is
function get_futex_value_locked always return EINVAL, thus pc jump
back to retry label and causes looping.

This line in function get_futex_value_locked
	ret = __get_user(*dest, from);
is expanded to
	*dest = (typeof(*(p))) __r2; ,
in macro __get_user_check. Writing to pointer dest triggers kasan check
and overwrites the return value of __get_user_x function.
The assembly code of get_futex_value_locked in kernel/futex.c:
...
c01f6dc8:       eb0b020e        bl      c04b7608 <__get_user_4>
// "x = (typeof(*(p))) __r2;" triggers kasan check and r0 is overwritten
c01f6dCc:       e1a00007        mov     r0, r7
c01f6dd0:       e1a05002        mov     r5, r2
c01f6dd4:       eb04f1e6        bl      c0333574 <__asan_store4>
c01f6dd8:       e5875000        str     r5, [r7]
// save ret value of __get_user(*dest, from), which is dest address now
c01f6ddc:       e1a05000        mov     r5, r0
...
// checking return value of __get_user failed
c01f6e00:       e3550000        cmp     r5, #0
...
c01f6e0c:       01a00005        moveq   r0, r5
// assign return value to EINVAL
c01f6e10:       13e0000d        mvnne   r0, #13

Return value is the destination address of get_user thus certainly
non-zero, so get_futex_value_locked always return EINVAL.

Fix it by using a tmp vairable to store the error code before the
assignment. This fix has no effects to non-kasan images thanks to compiler
optimization. It only affects cases that overwrite r0 due to kasan check.

This should fix bug discussed in Link:
[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/

Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM")
Signed-off-by: Lexi Shao <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
johanbhussin pushed a commit that referenced this pull request Jan 20, 2022
[ Upstream commit b93c6a9 ]

When I do fuzz test for bonding device interface, I got the following
use-after-free Calltrace:

==================================================================
BUG: KASAN: use-after-free in bond_enslave+0x1521/0x24f0
Read of size 8 at addr ffff88825bc11c00 by task ifenslave/7365

CPU: 5 PID: 7365 Comm: ifenslave Tainted: G            E     5.15.0-rc1+ #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
Call Trace:
 dump_stack_lvl+0x6c/0x8b
 print_address_description.constprop.0+0x48/0x70
 kasan_report.cold+0x82/0xdb
 __asan_load8+0x69/0x90
 bond_enslave+0x1521/0x24f0
 bond_do_ioctl+0x3e0/0x450
 dev_ifsioc+0x2ba/0x970
 dev_ioctl+0x112/0x710
 sock_do_ioctl+0x118/0x1b0
 sock_ioctl+0x2e0/0x490
 __x64_sys_ioctl+0x118/0x150
 do_syscall_64+0x35/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f19159cf577
Code: b3 66 90 48 8b 05 11 89 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 78
RSP: 002b:00007ffeb3083c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffeb3084bca RCX: 00007f19159cf577
RDX: 00007ffeb3083ce0 RSI: 0000000000008990 RDI: 0000000000000003
RBP: 00007ffeb3084bc4 R08: 0000000000000040 R09: 0000000000000000
R10: 00007ffeb3084bc0 R11: 0000000000000246 R12: 00007ffeb3083ce0
R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffeb3083cb0

Allocated by task 7365:
 kasan_save_stack+0x23/0x50
 __kasan_kmalloc+0x83/0xa0
 kmem_cache_alloc_trace+0x22e/0x470
 bond_enslave+0x2e1/0x24f0
 bond_do_ioctl+0x3e0/0x450
 dev_ifsioc+0x2ba/0x970
 dev_ioctl+0x112/0x710
 sock_do_ioctl+0x118/0x1b0
 sock_ioctl+0x2e0/0x490
 __x64_sys_ioctl+0x118/0x150
 do_syscall_64+0x35/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 7365:
 kasan_save_stack+0x23/0x50
 kasan_set_track+0x20/0x30
 kasan_set_free_info+0x24/0x40
 __kasan_slab_free+0xf2/0x130
 kfree+0xd1/0x5c0
 slave_kobj_release+0x61/0x90
 kobject_put+0x102/0x180
 bond_sysfs_slave_add+0x7a/0xa0
 bond_enslave+0x11b6/0x24f0
 bond_do_ioctl+0x3e0/0x450
 dev_ifsioc+0x2ba/0x970
 dev_ioctl+0x112/0x710
 sock_do_ioctl+0x118/0x1b0
 sock_ioctl+0x2e0/0x490
 __x64_sys_ioctl+0x118/0x150
 do_syscall_64+0x35/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Last potentially related work creation:
 kasan_save_stack+0x23/0x50
 kasan_record_aux_stack+0xb7/0xd0
 insert_work+0x43/0x190
 __queue_work+0x2e3/0x970
 delayed_work_timer_fn+0x3e/0x50
 call_timer_fn+0x148/0x470
 run_timer_softirq+0x8a8/0xc50
 __do_softirq+0x107/0x55f

Second to last potentially related work creation:
 kasan_save_stack+0x23/0x50
 kasan_record_aux_stack+0xb7/0xd0
 insert_work+0x43/0x190
 __queue_work+0x2e3/0x970
 __queue_delayed_work+0x130/0x180
 queue_delayed_work_on+0xa7/0xb0
 bond_enslave+0xe25/0x24f0
 bond_do_ioctl+0x3e0/0x450
 dev_ifsioc+0x2ba/0x970
 dev_ioctl+0x112/0x710
 sock_do_ioctl+0x118/0x1b0
 sock_ioctl+0x2e0/0x490
 __x64_sys_ioctl+0x118/0x150
 do_syscall_64+0x35/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff88825bc11c00
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 0 bytes inside of
 1024-byte region [ffff88825bc11c00, ffff88825bc12000)
The buggy address belongs to the page:
page:ffffea00096f0400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x25bc10
head:ffffea00096f0400 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x57ff00000010200(slab|head|node=1|zone=2|lastcpupid=0x7ff)
raw: 057ff00000010200 ffffea0009a71c08 ffff888240001968 ffff88810004dbc0
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88825bc11b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88825bc11b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88825bc11c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                   ^
 ffff88825bc11c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88825bc11d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Put new_slave in bond_sysfs_slave_add() will cause use-after-free problems
when new_slave is accessed in the subsequent error handling process. Since
new_slave will be put in the subsequent error handling process, remove the
unnecessary put to fix it.
In addition, when sysfs_create_file() fails, if some files have been crea-
ted successfully, we need to call sysfs_remove_file() to remove them.
Since there are sysfs_create_files() & sysfs_remove_files() can be used,
use these two functions instead.

Fixes: 7afcaec (bonding: use kobject_put instead of _del after kobject_add)
Signed-off-by: Huang Guobin <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
johanbhussin pushed a commit that referenced this pull request Apr 28, 2022
[ Upstream commit 4224cfd ]

When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.

    [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
    [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
    ...
    [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
    [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280

    crash> bt
    ...
    PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
    ...
     #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
        [exception RIP: dma_pool_alloc+0x1ab]
        RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
        RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
        RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
        RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
        R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
        R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
    #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
    #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
    #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
    #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
    #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
    #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
    #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
    #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
    #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
    #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
    #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
    #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
    #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
    #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
    #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
    #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92

    crash> net_device.state ffff89443b0c0000
      state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)

To prevent this scenario, we also make sure that the netdevice is present.

Signed-off-by: suresh kumar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
drosdi pushed a commit that referenced this pull request Feb 28, 2023
[ Upstream commit 93c660c ]

ASAN reports an use-after-free in btf_dump_name_dups:

ERROR: AddressSanitizer: heap-use-after-free on address 0xffff927006db at pc 0xaaaab5dfb618 bp 0xffffdd89b890 sp 0xffffdd89b928
READ of size 2 at 0xffff927006db thread T0
    #0 0xaaaab5dfb614 in __interceptor_strcmp.part.0 (test_progs+0x21b614)
    #1 0xaaaab635f144 in str_equal_fn tools/lib/bpf/btf_dump.c:127
    #2 0xaaaab635e3e0 in hashmap_find_entry tools/lib/bpf/hashmap.c:143
    #3 0xaaaab635e72c in hashmap__find tools/lib/bpf/hashmap.c:212
    #4 0xaaaab6362258 in btf_dump_name_dups tools/lib/bpf/btf_dump.c:1525
    #5 0xaaaab636240c in btf_dump_resolve_name tools/lib/bpf/btf_dump.c:1552
    #6 0xaaaab6362598 in btf_dump_type_name tools/lib/bpf/btf_dump.c:1567
    #7 0xaaaab6360b48 in btf_dump_emit_struct_def tools/lib/bpf/btf_dump.c:912
    #8 0xaaaab6360630 in btf_dump_emit_type tools/lib/bpf/btf_dump.c:798
    #9 0xaaaab635f720 in btf_dump__dump_type tools/lib/bpf/btf_dump.c:282
    #10 0xaaaab608523c in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:236
    #11 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #12 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #13 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #14 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #15 0xaaaab5d65990  (test_progs+0x185990)

0xffff927006db is located 11 bytes inside of 16-byte region [0xffff927006d0,0xffff927006e0)
freed by thread T0 here:
    #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
    #1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
    #2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
    #3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
    #4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
    #5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
    #6 0xaaaab6353e10 in btf__add_field tools/lib/bpf/btf.c:2032
    #7 0xaaaab6084fcc in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:232
    #8 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #9 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #10 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #11 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #12 0xaaaab5d65990  (test_progs+0x185990)

previously allocated by thread T0 here:
    #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
    #1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
    #2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
    #3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
    #4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
    #5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
    #6 0xaaaab6353ff0 in btf_add_enum_common tools/lib/bpf/btf.c:2070
    #7 0xaaaab6354080 in btf__add_enum tools/lib/bpf/btf.c:2102
    #8 0xaaaab6082f50 in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:162
    #9 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #10 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #11 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #12 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #13 0xaaaab5d65990  (test_progs+0x185990)

The reason is that the key stored in hash table name_map is a string
address, and the string memory is allocated by realloc() function, when
the memory is resized by realloc() later, the old memory may be freed,
so the address stored in name_map references to a freed memory, causing
use-after-free.

Fix it by storing duplicated string address in name_map.

Fixes: 919d2b1 ("libbpf: Allow modification of BTF and add btf__add_str API")
Signed-off-by: Xu Kuohai <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
drosdi pushed a commit that referenced this pull request Feb 28, 2023
…g the sock

[ Upstream commit 3cf7203 ]

There is a race condition in vxlan that when deleting a vxlan device
during receiving packets, there is a possibility that the sock is
released after getting vxlan_sock vs from sk_user_data. Then in
later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got
NULL pointer dereference. e.g.

   #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757
   #1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d
   #2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48
   #3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b
   #4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb
   #5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542
   #6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62
      [exception RIP: vxlan_ecn_decapsulate+0x3b]
      RIP: ffffffffc1014e7b  RSP: ffffa25ec6978cb0  RFLAGS: 00010246
      RAX: 0000000000000008  RBX: ffff8aa000888000  RCX: 0000000000000000
      RDX: 000000000000000e  RSI: ffff8a9fc7ab803e  RDI: ffff8a9fd1168700
      RBP: ffff8a9fc7ab803e   R8: 0000000000700000   R9: 00000000000010ae
      R10: ffff8a9fcb748980  R11: 0000000000000000  R12: ffff8a9fd1168700
      R13: ffff8aa000888000  R14: 00000000002a0000  R15: 00000000000010ae
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan]
   #8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507
   #9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45
  #10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807
  #11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951
  #12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde
  #13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b
  #14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139
  #15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a
  #16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3
  #17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca
  #18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3

Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh

Fix this by waiting for all sk_user_data reader to finish before
releasing the sock.

Reported-by: Jianlin Shi <[email protected]>
Suggested-by: Jakub Sitnicki <[email protected]>
Fixes: 6a93cc9 ("udp-tunnel: Add a few more UDP tunnel APIs")
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
drosdi pushed a commit that referenced this pull request Mar 30, 2023
ASAN reports an use-after-free in btf_dump_name_dups:

ERROR: AddressSanitizer: heap-use-after-free on address 0xffff927006db at pc 0xaaaab5dfb618 bp 0xffffdd89b890 sp 0xffffdd89b928
READ of size 2 at 0xffff927006db thread T0
    #0 0xaaaab5dfb614 in __interceptor_strcmp.part.0 (test_progs+0x21b614)
    #1 0xaaaab635f144 in str_equal_fn tools/lib/bpf/btf_dump.c:127
    #2 0xaaaab635e3e0 in hashmap_find_entry tools/lib/bpf/hashmap.c:143
    #3 0xaaaab635e72c in hashmap__find tools/lib/bpf/hashmap.c:212
    #4 0xaaaab6362258 in btf_dump_name_dups tools/lib/bpf/btf_dump.c:1525
    #5 0xaaaab636240c in btf_dump_resolve_name tools/lib/bpf/btf_dump.c:1552
    #6 0xaaaab6362598 in btf_dump_type_name tools/lib/bpf/btf_dump.c:1567
    #7 0xaaaab6360b48 in btf_dump_emit_struct_def tools/lib/bpf/btf_dump.c:912
    #8 0xaaaab6360630 in btf_dump_emit_type tools/lib/bpf/btf_dump.c:798
    #9 0xaaaab635f720 in btf_dump__dump_type tools/lib/bpf/btf_dump.c:282
    #10 0xaaaab608523c in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:236
    #11 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #12 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #13 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #14 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #15 0xaaaab5d65990  (test_progs+0x185990)

0xffff927006db is located 11 bytes inside of 16-byte region [0xffff927006d0,0xffff927006e0)
freed by thread T0 here:
    #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
    #1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
    #2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
    #3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
    #4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
    #5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
    #6 0xaaaab6353e10 in btf__add_field tools/lib/bpf/btf.c:2032
    #7 0xaaaab6084fcc in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:232
    #8 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #9 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #10 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #11 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #12 0xaaaab5d65990  (test_progs+0x185990)

previously allocated by thread T0 here:
    #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
    #1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
    #2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
    #3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
    #4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
    #5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
    #6 0xaaaab6353ff0 in btf_add_enum_common tools/lib/bpf/btf.c:2070
    #7 0xaaaab6354080 in btf__add_enum tools/lib/bpf/btf.c:2102
    #8 0xaaaab6082f50 in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:162
    #9 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    #10 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    #11 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    #12 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    #13 0xaaaab5d65990  (test_progs+0x185990)

The reason is that the key stored in hash table name_map is a string
address, and the string memory is allocated by realloc() function, when
the memory is resized by realloc() later, the old memory may be freed,
so the address stored in name_map references to a freed memory, causing
use-after-free.

Fix it by storing duplicated string address in name_map.

Fixes: 919d2b1 ("libbpf: Allow modification of BTF and add btf__add_str API")
Signed-off-by: Xu Kuohai <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
drosdi pushed a commit that referenced this pull request Mar 30, 2023
[ Upstream commit 5c30e81 ]

Skip the WRMSR fastpath in SVM's VM-Exit handler if the next RIP isn't
valid, e.g. because KVM is running with nrips=false.  SVM must decode and
emulate to skip the WRMSR if the CPU doesn't provide the next RIP.
Getting the instruction bytes to decode the WRMSR requires reading guest
memory, which in turn means dereferencing memslots, and that isn't safe
because KVM doesn't hold SRCU when the fastpath runs.

Don't bother trying to enable the fastpath for this case, e.g. by doing
only the WRMSR and leaving the "skip" until later.  NRIPS is supported on
all modern CPUs (KVM has considered making it mandatory), and the next
RIP will be valid the vast, vast majority of the time.

  =============================
  WARNING: suspicious RCU usage
  6.0.0-smp--4e557fcd3d80-skip #13 Tainted: G           O
  -----------------------------
  include/linux/kvm_host.h:954 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by stable/206475:
   #0: ffff9d9dfebcc0f0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x8b/0x620 [kvm]

  stack backtrace:
  CPU: 152 PID: 206475 Comm: stable Tainted: G           O       6.0.0-smp--4e557fcd3d80-skip #13
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
  Call Trace:
   <TASK>
   dump_stack_lvl+0x69/0xaa
   dump_stack+0x10/0x12
   lockdep_rcu_suspicious+0x11e/0x130
   kvm_vcpu_gfn_to_memslot+0x155/0x190 [kvm]
   kvm_vcpu_gfn_to_hva_prot+0x18/0x80 [kvm]
   paging64_walk_addr_generic+0x183/0x450 [kvm]
   paging64_gva_to_gpa+0x63/0xd0 [kvm]
   kvm_fetch_guest_virt+0x53/0xc0 [kvm]
   __do_insn_fetch_bytes+0x18b/0x1c0 [kvm]
   x86_decode_insn+0xf0/0xef0 [kvm]
   x86_emulate_instruction+0xba/0x790 [kvm]
   kvm_emulate_instruction+0x17/0x20 [kvm]
   __svm_skip_emulated_instruction+0x85/0x100 [kvm_amd]
   svm_skip_emulated_instruction+0x13/0x20 [kvm_amd]
   handle_fastpath_set_msr_irqoff+0xae/0x180 [kvm]
   svm_vcpu_run+0x4b8/0x5a0 [kvm_amd]
   vcpu_enter_guest+0x16ca/0x22f0 [kvm]
   kvm_arch_vcpu_ioctl_run+0x39d/0x900 [kvm]
   kvm_vcpu_ioctl+0x538/0x620 [kvm]
   __se_sys_ioctl+0x77/0xc0
   __x64_sys_ioctl+0x1d/0x20
   do_syscall_64+0x3d/0x80
   entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 404d5d7 ("KVM: X86: Introduce more exit_fastpath_completion enum values")
Signed-off-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
abaumfalk pushed a commit to ASTRO-Strobel/linux-socfpga that referenced this pull request Jul 7, 2025
commit 63de35a8fcfca59ae8750d469a7eb220c7557baf upstream.

An issue was identified in the dcn21_link_encoder_create function where
an out-of-bounds access could occur when the hpd_source index was used
to reference the link_enc_hpd_regs array. This array has a fixed size
and the index was not being checked against the array's bounds before
accessing it.

This fix adds a conditional check to ensure that the hpd_source index is
within the valid range of the link_enc_hpd_regs array. If the index is
out of bounds, the function now returns NULL to prevent undefined
behavior.

References:

[   65.920507] ------------[ cut here ]------------
[   65.920510] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn21/dcn21_resource.c:1312:29
[   65.920519] index 7 is out of range for type 'dcn10_link_enc_hpd_registers [5]'
[   65.920523] CPU: 3 PID: 1178 Comm: modprobe Tainted: G           OE      6.8.0-cleanershaderfeatureresetasdntipmi200nv2132 altera-fpga#13
[   65.920525] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS WMJ0429N_Weekly_20_04_2 04/29/2020
[   65.920527] Call Trace:
[   65.920529]  <TASK>
[   65.920532]  dump_stack_lvl+0x48/0x70
[   65.920541]  dump_stack+0x10/0x20
[   65.920543]  __ubsan_handle_out_of_bounds+0xa2/0xe0
[   65.920549]  dcn21_link_encoder_create+0xd9/0x140 [amdgpu]
[   65.921009]  link_create+0x6d3/0xed0 [amdgpu]
[   65.921355]  create_links+0x18a/0x4e0 [amdgpu]
[   65.921679]  dc_create+0x360/0x720 [amdgpu]
[   65.921999]  ? dmi_matches+0xa0/0x220
[   65.922004]  amdgpu_dm_init+0x2b6/0x2c90 [amdgpu]
[   65.922342]  ? console_unlock+0x77/0x120
[   65.922348]  ? dev_printk_emit+0x86/0xb0
[   65.922354]  dm_hw_init+0x15/0x40 [amdgpu]
[   65.922686]  amdgpu_device_init+0x26a8/0x33a0 [amdgpu]
[   65.922921]  amdgpu_driver_load_kms+0x1b/0xa0 [amdgpu]
[   65.923087]  amdgpu_pci_probe+0x1b7/0x630 [amdgpu]
[   65.923087]  local_pci_probe+0x4b/0xb0
[   65.923087]  pci_device_probe+0xc8/0x280
[   65.923087]  really_probe+0x187/0x300
[   65.923087]  __driver_probe_device+0x85/0x130
[   65.923087]  driver_probe_device+0x24/0x110
[   65.923087]  __driver_attach+0xac/0x1d0
[   65.923087]  ? __pfx___driver_attach+0x10/0x10
[   65.923087]  bus_for_each_dev+0x7d/0xd0
[   65.923087]  driver_attach+0x1e/0x30
[   65.923087]  bus_add_driver+0xf2/0x200
[   65.923087]  driver_register+0x64/0x130
[   65.923087]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   65.923087]  __pci_register_driver+0x61/0x70
[   65.923087]  amdgpu_init+0x7d/0xff0 [amdgpu]
[   65.923087]  do_one_initcall+0x49/0x310
[   65.923087]  ? kmalloc_trace+0x136/0x360
[   65.923087]  do_init_module+0x6a/0x270
[   65.923087]  load_module+0x1fce/0x23a0
[   65.923087]  init_module_from_file+0x9c/0xe0
[   65.923087]  ? init_module_from_file+0x9c/0xe0
[   65.923087]  idempotent_init_module+0x179/0x230
[   65.923087]  __x64_sys_finit_module+0x5d/0xa0
[   65.923087]  do_syscall_64+0x76/0x120
[   65.923087]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[   65.923087] RIP: 0033:0x7f2d80f1e88d
[   65.923087] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
[   65.923087] RSP: 002b:00007ffc7bc1aa78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   65.923087] RAX: ffffffffffffffda RBX: 0000564c9c1db130 RCX: 00007f2d80f1e88d
[   65.923087] RDX: 0000000000000000 RSI: 0000564c9c1e5480 RDI: 000000000000000f
[   65.923087] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000002
[   65.923087] R10: 000000000000000f R11: 0000000000000246 R12: 0000564c9c1e5480
[   65.923087] R13: 0000564c9c1db260 R14: 0000000000000000 R15: 0000564c9c1e54b0
[   65.923087]  </TASK>
[   65.923927] ---[ end trace ]---

Cc: Tom Chung <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Roman Li <[email protected]>
Cc: Alex Hung <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Hamza Mahfooz <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Bin Lan <[email protected]>
Signed-off-by: He Zhe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.