Skip to content
This repository was archived by the owner on Oct 21, 2022. It is now read-only.

Conversation

ghost
Copy link

@ghost ghost commented Jul 31, 2017

No description provided.

tndave and others added 30 commits April 21, 2017 15:45
Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
otherwise skbs with queue_mapping greater than real_num_tx_queues
can be sent to the underlying driver and can result in kernel panic.

One such event is running netconsole and enabling VF on the same
device. Or running netconsole and changing number of tx queues via
ethtool on same device.

e.g.
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 0000000000001525
tsk->{mm,active_mm}->pgd = fff800130ff9a000
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
kworker/48:1(475): Oops [#1]
CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G           OE
4.11.0-rc3-davem-net+ #7
Workqueue: events queue_process
task: fff80013113299c0 task.stack: fff800131132c000
TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y:
00000000    Tainted: G           OE
TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]>
g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3:
0000000000000001
g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
00000000000000c0
o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
0000000000000003
o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
000000000049ed94
RPC: <set_next_entity+0x34/0xb80>
l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3:
0000000000000000
l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7:
fff8001fa7605028
i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3:
0000000000000000
i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7:
00000000103fa4b0
I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]>
Call Trace:
 [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
 [0000000000998c74] netpoll_start_xmit+0xf4/0x200
 [0000000000998e10] queue_process+0x90/0x160
 [0000000000485fa8] process_one_work+0x188/0x480
 [0000000000486410] worker_thread+0x170/0x4c0
 [000000000048c6b8] kthread+0xd8/0x120
 [0000000000406064] ret_from_fork+0x1c/0x2c
 [0000000000000000]           (null)
Disabling lock debugging due to kernel taint
Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200
Caller[0000000000998e10]: queue_process+0x90/0x160
Caller[0000000000485fa8]: process_one_work+0x188/0x480
Caller[0000000000486410]: worker_thread+0x170/0x4c0
Caller[000000000048c6b8]: kthread+0xd8/0x120
Caller[0000000000406064]: ret_from_fork+0x1c/0x2c
Caller[0000000000000000]:           (null)

Signed-off-by: Tushar Dave <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Pull networking fixes from David Miller:

 1) Don't race in IPSEC dumps, from Yuejie Shi.

 2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu.

 3) Fix out of bounds access in ipv6 segment routing code, from David
    Lebrun.

 4) Don't write into the header of cloned SKBs in smsc95xx driver, from
    James Hughes.

 5) Several other drivers have this bug too, fix them. From Eric
    Dumazet.

 6) Fix access to uninitialized data in TC action cookie code, from
    Wolfgang Bumiller.

 7) Fix double free in IPV6 segment routing, again from David Lebrun.

 8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern.

 9) Fix use after free in qrtr code, from Dan Carpenter.

10) Don't double-destroy devices in ip6mr code, from Nikolay
    Aleksandrov.

11) Don't pass out-of-range TX queue indices into drivers, from Tushar
    Dave.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
  netpoll: Check for skb->queue_mapping
  ip6mr: fix notification device destruction
  bpf, doc: update bpf maintainers entry
  net: qrtr: potential use after free in qrtr_sendmsg()
  bpf: Fix values type used in test_maps
  net: ipv6: RTF_PCPU should not be settable from userspace
  gso: Validate assumption of frag_list segementation
  kaweth: use skb_cow_head() to deal with cloned skbs
  ch9200: use skb_cow_head() to deal with cloned skbs
  lan78xx: use skb_cow_head() to deal with cloned skbs
  sr9700: use skb_cow_head() to deal with cloned skbs
  cx82310_eth: use skb_cow_head() to deal with cloned skbs
  smsc75xx: use skb_cow_head() to deal with cloned skbs
  ipv6: sr: fix double free of skb after handling invalid SRH
  MAINTAINERS: Add "B:" field for networking.
  net sched actions: allocate act cookie early
  qed: Fix issue in populating the PFC config paramters.
  qed: Fix possible system hang in the dcbnl-getdcbx() path.
  qed: Fix sending an invalid PFC error mask to MFW.
  qed: Fix possible error in populating max_tc field.
  ...
Pull nfsd bugfix from Bruce Fields:
 "Fix a 4.11 regression that triggers a BUG() on an attempt to use an
  unsupported NFSv4 compound op"

* tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix oops on unsupported operation
If FW is stuck in initializing state we will skip the driver load, but
current error handling flow doesn't clean previously allocated command
interface resources.

Fixes: e329724 ('net/mlx5_core: Wait for FW readiness on startup')
Signed-off-by: Mohamad Haj Yahia <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
On ConnectX5 the wqe inline mode is "none" and hence the FW
reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED.

Fix our devlink callbacks to deal with that on get and set.

Also fix the tc flow parsing code not to fail anything when
inline isn't required.

Fixes: bffaa91 ('net/mlx5: E-Switch, Add control for inline mode')
Signed-off-by: Or Gerlitz <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Otherwise the code that fills the ipv4 encapsulation headers could be writing
beyond the allocated headers buffer.

Fixes: a54e20b ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Signed-off-by: Or Gerlitz <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Otherwise the code that fills the ipv6 encapsulation headers could be writing
beyond the allocated headers buffer.

Fixes: ce99f6b ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels')
Signed-off-by: Or Gerlitz <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
When UAR is released, we deallocate the device resource, but
don't unmmap the UAR mapping memory.
Fix the leak by unmapping this memory.

Fixes: a6d51b6 ('net/mlx5: Introduce blue flame register allocator)
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
RX packet headers are meant to be contained in SKB linear part,
and chose a threshold of 128.
It turns out this is not enough, i.e. for IPv6 packet over VxLAN.
In this case, UDP/IPv4 needs 42 bytes, GENEVE header is 8 bytes,
and 86 bytes for TCP/IPv6. In total 136 bytes that is more than
current 128 bytes. In this case expand header flow is reached.
The warning in skb_try_coalesce() caused by a wrong truesize
was already fixed here:
commit 158f323 ("net: adjust skb->truesize in pskb_expand_head()").
Still, we prefer to totally avoid the expand header flow for performance reasons.
Tested regular TCP_STREAM with iperf for 1 and 8 streams, no degradation was found.

Fixes: 461017c ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Signed-off-by: Eugenia Emantayev <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Handler for ETHTOOL_GRXCLSRLALL must set info->data to the size
of the table, regardless of the amount of entries in it.
Existing code does not do that, and this breaks all usage of ethtool -N
or -n without explicit location, with this error:
rmgr: Invalid RX class rules table size: Success

Set info->data to the table size.

Tested:
ethtool -n ens8
ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1
ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 loc 55
ethtool -n ens8
ethtool -N ens8 delete 1023
ethtool -N ens8 delete 55

Fixes: f913a72 ("net/mlx5e: Add support to get ethtool flow rules")
Signed-off-by: Ilan Tayari <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
…inux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "The (hopefully) final fix for the irq affinity spreading logic"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/affinity: Fix calculating vectors to assign
…inux/kernel/git/tip/tip

Pull RAS fix from Thomas Gleixner:
 "The MCE atomic notifier callchain invokes callbacks which might sleep.

  Convert it to a blocking notifier and prevent calls from atomic
  context"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Make the MCE notifier a blocking one
Pull UBI/UBIFS fixes from Richard Weinberger:
 "This contains fixes for issues in both UBI and UBIFS:

   - more O_TMPFILE fallout

   - RENAME_WHITEOUT regression due to a mis-merge

   - memory leak in ubifs_mknod()

   - power-cut problem in UBI's update volume feature"

* tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs:
  ubifs: Fix O_TMPFILE corner case in ubifs_link()
  ubifs: Fix RENAME_WHITEOUT support
  ubifs: Fix debug messages for an invalid filename in ubifs_dump_inode
  ubifs: Fix debug messages for an invalid filename in ubifs_dump_node
  ubifs: Remove filename from debug messages in ubifs_readdir
  ubifs: Fix memory leak in error path in ubifs_mknod
  ubi/upd: Always flush after prepared for an update
This lets us enable KPROBE_EVENTS.

Signed-off-by: David S. Miller <[email protected]>
Hook up statx.

Ignore pkeys system calls, we don't have protection keeys
on SPARC.

Signed-off-by: David S. Miller <[email protected]>
We have observed a sudden spike in rx/tx_packets and rx/tx_bytes
reported under /proc/net/dev.  There is a race in mlx5e_update_stats()
and some of the get-stats functions (the one that we hit is the
mlx5e_get_stats() which is called by ndo_get_stats64()).

In particular, the very first thing mlx5e_update_sw_counters()
does is 'memset(s, 0, sizeof(*s))'.  For example, if mlx5e_get_stats()
is unlucky at one point, rx_bytes and rx_packets could be 0.  One second
later, a normal (and much bigger than 0) value will be reported.

This patch is to use a 'struct mlx5e_sw_stats temp' to avoid
a direct memset zero on priv->stats.sw.

mlx5e_update_vport_counters() has a similar race.  Hence, addressed
together.  However, memset zero is removed instead because
it is not needed.

I am lucky enough to catch this 0-reset in rx multicast:
eth0: 41457665   76804   70    0    0    70          0     47085 15586634   87502    3    0    0     0       3          0
eth0: 41459860   76815   70    0    0    70          0     47094 15588376   87516    3    0    0     0       3          0
eth0: 41460577   76822   70    0    0    70          0         0 15589083   87521    3    0    0     0       3          0
eth0: 41463293   76838   70    0    0    70          0     47108 15595872   87538    3    0    0     0       3          0
eth0: 41463379   76839   70    0    0    70          0     47116 15596138   87539    3    0    0     0       3          0

v2: Remove memset zero from mlx5e_update_vport_counters()
v1: Use temp and memcpy

Fixes: 9218b44 ("net/mlx5e: Statistics handling refactoring")
Suggested-by: Eric Dumazet <[email protected]>
Suggested-by: Saeed Mahameed <[email protected]>
Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
We dereference "skb" to get "skb->len" so we should probably do that
step before freeing the skb.

Fixes: eea221c ("tc35815 driver update (take 2)")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
David reported that doing the following:

    ip li add red type vrf table 10
    ip link set dev eth1 vrf red
    ip addr add 127.0.0.1/8 dev red
    ip link set dev eth1 up
    ip li set red up
    ping -c1 -w1 -I red 127.0.0.1
    ip li del red

when either policy routing IP rules are present or the local table
lookup ip rule is before the l3mdev lookup results in a hang with
these messages:

    unregister_netdevice: waiting for red to become free. Usage count = 1

The problem is caused by caching the dst used for sending the packet
out of the specified interface on a local route with a different
nexthop interface. Thus the dst could stay around until the route in
the table the lookup was done is deleted which may be never.

Address the problem by not forcing output device to be the l3mdev in
the flow's output interface if the lookup didn't use the l3mdev. This
then results in the dst using the right device according to the route.

Changes in v2:
 - make the dev_out passed in by __ip_route_output_key_hash correct
   instead of checking the nh dev if FLOWI_FLAG_SKIP_NH_OIF is set as
   suggested by David.

Fixes: 5f02ce2 ("net: l3mdev: Allow the l3mdev to be a loopback")
Reported-by: David Ahern <[email protected]>
Suggested-by: David Ahern <[email protected]>
Signed-off-by: Robert Shearman <[email protected]>
Acked-by: David Ahern <[email protected]>
Tested-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
While this may appear as a humdrum one line change, it's actually quite
important. An sk_buff stores data in three places:

1. A linear chunk of allocated memory in skb->data. This is the easiest
   one to work with, but it precludes using scatterdata since the memory
   must be linear.
2. The array skb_shinfo(skb)->frags, which is of maximum length
   MAX_SKB_FRAGS. This is nice for scattergather, since these fragments
   can point to different pages.
3. skb_shinfo(skb)->frag_list, which is a pointer to another sk_buff,
   which in turn can have data in either (1) or (2).

The first two are rather easy to deal with, since they're of a fixed
maximum length, while the third one is not, since there can be
potentially limitless chains of fragments. Fortunately dealing with
frag_list is opt-in for drivers, so drivers don't actually have to deal
with this mess. For whatever reason, macsec decided it wanted pain, and
so it explicitly specified NETIF_F_FRAGLIST.

Because dealing with (1), (2), and (3) is insane, most users of sk_buff
doing any sort of crypto or paging operation calls a convenient function
called skb_to_sgvec (which happens to be recursive if (3) is in use!).
This takes a sk_buff as input, and writes into its output pointer an
array of scattergather list items. Sometimes people like to declare a
fixed size scattergather list on the stack; othertimes people like to
allocate a fixed size scattergather list on the heap. However, if you're
doing it in a fixed-size fashion, you really shouldn't be using
NETIF_F_FRAGLIST too (unless you're also ensuring the sk_buff and its
frag_list children arent't shared and then you check the number of
fragments in total required.)

Macsec specifically does this:

        size += sizeof(struct scatterlist) * (MAX_SKB_FRAGS + 1);
        tmp = kmalloc(size, GFP_ATOMIC);
        *sg = (struct scatterlist *)(tmp + sg_offset);
	...
        sg_init_table(sg, MAX_SKB_FRAGS + 1);
        skb_to_sgvec(skb, sg, 0, skb->len);

Specifying MAX_SKB_FRAGS + 1 is the right answer usually, but not if you're
using NETIF_F_FRAGLIST, in which case the call to skb_to_sgvec will
overflow the heap, and disaster ensues.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>
Otherwise, UDP checksum offloads could corrupt ESP packets by attempting
to calculate UDP checksum when this inner UDP packet is already protected
by IPsec.

One way to reproduce this bug is to have a VM with virtio_net driver (UFO
set to ON in the guest VM); and then encapsulate all guest's Ethernet
frames in Geneve; and then further encrypt Geneve with IPsec.  In this
case following symptoms are observed:
1. If using ixgbe NIC, then it will complain with following error message:
   ixgbe 0000:01:00.1: partial checksum but l4 proto=32!
2. Receiving IPsec stack will drop all the corrupted ESP packets and
   increase XfrmInStateProtoError counter in /proc/net/xfrm_stat.
3. iperf UDP test from the VM with packet sizes above MTU will not work at
   all.
4. iperf TCP test from the VM will get ridiculously low performance because.

Signed-off-by: Ansis Atteka <[email protected]>
Co-authored-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
If skb_put_padto() fails then it frees the skb.  I shifted that code
up a bit to make my error handling a little simpler.

Fixes: a0d2f20 ("Renesas Ethernet AVB PTP clock driver")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
When arp_notify is set to 1 for either a specific interface or for 'all'
interfaces, gratuitous arp requests are sent. Since ndisc_notify is the
ipv6 equivalent to arp_notify, it should follow the same semantics.
Commit 4a6e3c5 ("net: ipv6: send unsolicited NA on admin up") sends
the NA on admin up. The final piece is checking devconf_all->ndisc_notify
in addition to the per device setting. Add it.

Fixes: 5cb0443 ("ipv6: add knob to send unsolicited ND on link-layer address change")
Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
…ux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2017-04-22

This series contains some mlx5 fixes for net.

For your convenience, the series doesn't introduce any conflict with
the ongoing net-next pull request.

Please pull and let me know if there's any problem.

For -stable:
("net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5") kernels >= 4.10
("net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling") kernels >= 4.8
("net/mlx5e: Fix small packet threshold")       kernels >= 4.7
("net/mlx5: Fix driver load bad flow when having fw initializing timeout") kernels >= 4.4
====================

Signed-off-by: David S. Miller <[email protected]>
…ream-linus

Pull MIPS fixes from Ralf Baechle:
 "Another round of 4.11 for the MIPS architecture. This time around it's
  mostly arch but little platforms-specific code.

   - PCI: Register controllers in the right order to aoid a PCI error
   - KGDB: Use kernel context for sleeping threads
   - smp-cps: Fix potentially uninitialised value of core
   - KASLR: Fix build
   - ELF: Fix BUG() warning in arch_check_elf
   - Fix modversioning of _mcount symbol
   - fix out-of-tree defconfig target builds
   - cevt-r4k: Fix out-of-bounds array access
   - perf: fix deadlock
   - Malta: Fix i8259 irqchip setup"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: PCI: add controllers before the specified head
  MIPS: KGDB: Use kernel context for sleeping threads
  MIPS: smp-cps: Fix potentially uninitialised value of core
  MIPS: KASLR: Add missing header files
  MIPS: Avoid BUG warning in arch_check_elf
  MIPS: Fix modversioning of _mcount symbol
  MIPS: generic: fix out-of-tree defconfig target builds
  MIPS: cevt-r4k: Fix out-of-bounds array access
  MIPS: perf: fix deadlock
  MIPS: Malta: Fix i8259 irqchip setup
…it/jejb/scsi

Pull SCSI fix from James Bottomley:
 "Our final fix before the 4.12 release (hopefully).

  It's an error leg again: the fix to not bug on empty DMA transfers is
  returning the wrong code and confusing the block layer"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: return correct blkprep status code in case scsi_init_io() fails.
Since Broadcom tags are not enabled in b53 (DSA_PROTO_TAG_NONE), we need
to make sure that the IMP/CPU port is included in the forwarding
decision.

Without this change, switching between non-management ports would work,
but not between management ports and non-management ports thus breaking
the default state in which DSA switch are brought up.

Fixes: 967dd82 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Reported-by: Eric Anholt <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Implement the correct software reset sequence for 58xx devices by
setting all 3 reset bits and polling for the SW_RST bit to clear itself
without a given timeout. We cannot use is58xx() here because that would
also include the 7445/7278 Starfighter 2 which have their own driver
doing the reset earlier on due to the HW specific integration.

Fixes: 991a36b ("net: dsa: b53: Add support for BCM585xx/586xx/88312 integrated switch")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
The 58xx devices (Northstar Plus) do actually have their CPU port wired
at port 8, it was unfortunately set to port 5 (B53_CPU_PORT_25) which is
incorrect, since that is the second possible management port.

Fixes: 991a36b ("net: dsa: b53: Add support for BCM585xx/586xx/88312 integrated switch")
Reported-by: Eric Anholt <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Florian Fainelli says:

====================
net: dsa: b53: BCM58xx devices fixes

This patch series contains fixes for the 58xx devices (Broadcom Northstar
Plus), which were identified thanks to the help of Eric Anholt.
====================

Tested-by: Eric Anholt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.