Bfq v8 v4.11 #1

ghost · 2017-07-31T16:26:38Z

No description provided.

Reducing real_num_tx_queues needs to be in sync with skb queue_mapping otherwise skbs with queue_mapping greater than real_num_tx_queues can be sent to the underlying driver and can result in kernel panic. One such event is running netconsole and enabling VF on the same device. Or running netconsole and changing number of tx queues via ethtool on same device. e.g. Unable to handle kernel NULL pointer dereference tsk->{mm,active_mm}->context = 0000000000001525 tsk->{mm,active_mm}->pgd = fff800130ff9a000 \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ kworker/48:1(475): Oops [#1] CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G OE 4.11.0-rc3-davem-net+ #7 Workqueue: events queue_process task: fff80013113299c0 task.stack: fff800131132c000 TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y: 00000000 Tainted: G OE TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]> g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3: 0000000000000001 g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7: 00000000000000c0 o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3: 0000000000000003 o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc: 000000000049ed94 RPC: <set_next_entity+0x34/0xb80> l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3: 0000000000000000 l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7: fff8001fa7605028 i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7: 00000000103fa4b0 I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]> Call Trace: [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe] [0000000000998c74] netpoll_start_xmit+0xf4/0x200 [0000000000998e10] queue_process+0x90/0x160 [0000000000485fa8] process_one_work+0x188/0x480 [0000000000486410] worker_thread+0x170/0x4c0 [000000000048c6b8] kthread+0xd8/0x120 [0000000000406064] ret_from_fork+0x1c/0x2c [0000000000000000] (null) Disabling lock debugging due to kernel taint Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe] Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200 Caller[0000000000998e10]: queue_process+0x90/0x160 Caller[0000000000485fa8]: process_one_work+0x188/0x480 Caller[0000000000486410]: worker_thread+0x170/0x4c0 Caller[000000000048c6b8]: kthread+0xd8/0x120 Caller[0000000000406064]: ret_from_fork+0x1c/0x2c Caller[0000000000000000]: (null) Signed-off-by: Tushar Dave <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Pull networking fixes from David Miller: 1) Don't race in IPSEC dumps, from Yuejie Shi. 2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu. 3) Fix out of bounds access in ipv6 segment routing code, from David Lebrun. 4) Don't write into the header of cloned SKBs in smsc95xx driver, from James Hughes. 5) Several other drivers have this bug too, fix them. From Eric Dumazet. 6) Fix access to uninitialized data in TC action cookie code, from Wolfgang Bumiller. 7) Fix double free in IPV6 segment routing, again from David Lebrun. 8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern. 9) Fix use after free in qrtr code, from Dan Carpenter. 10) Don't double-destroy devices in ip6mr code, from Nikolay Aleksandrov. 11) Don't pass out-of-range TX queue indices into drivers, from Tushar Dave. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits) netpoll: Check for skb->queue_mapping ip6mr: fix notification device destruction bpf, doc: update bpf maintainers entry net: qrtr: potential use after free in qrtr_sendmsg() bpf: Fix values type used in test_maps net: ipv6: RTF_PCPU should not be settable from userspace gso: Validate assumption of frag_list segementation kaweth: use skb_cow_head() to deal with cloned skbs ch9200: use skb_cow_head() to deal with cloned skbs lan78xx: use skb_cow_head() to deal with cloned skbs sr9700: use skb_cow_head() to deal with cloned skbs cx82310_eth: use skb_cow_head() to deal with cloned skbs smsc75xx: use skb_cow_head() to deal with cloned skbs ipv6: sr: fix double free of skb after handling invalid SRH MAINTAINERS: Add "B:" field for networking. net sched actions: allocate act cookie early qed: Fix issue in populating the PFC config paramters. qed: Fix possible system hang in the dcbnl-getdcbx() path. qed: Fix sending an invalid PFC error mask to MFW. qed: Fix possible error in populating max_tc field. ...

Pull nfsd bugfix from Bruce Fields: "Fix a 4.11 regression that triggers a BUG() on an attempt to use an unsupported NFSv4 compound op" * tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux: nfsd: fix oops on unsupported operation

If FW is stuck in initializing state we will skip the driver load, but current error handling flow doesn't clean previously allocated command interface resources. Fixes: e329724 ('net/mlx5_core: Wait for FW readiness on startup') Signed-off-by: Mohamad Haj Yahia <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>

On ConnectX5 the wqe inline mode is "none" and hence the FW reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED. Fix our devlink callbacks to deal with that on get and set. Also fix the tc flow parsing code not to fail anything when inline isn't required. Fixes: bffaa91 ('net/mlx5: E-Switch, Add control for inline mode') Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>

Otherwise the code that fills the ipv4 encapsulation headers could be writing beyond the allocated headers buffer. Fixes: a54e20b ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads') Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>

Otherwise the code that fills the ipv6 encapsulation headers could be writing beyond the allocated headers buffer. Fixes: ce99f6b ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels') Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>

When UAR is released, we deallocate the device resource, but don't unmmap the UAR mapping memory. Fix the leak by unmapping this memory. Fixes: a6d51b6 ('net/mlx5: Introduce blue flame register allocator) Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>

RX packet headers are meant to be contained in SKB linear part, and chose a threshold of 128. It turns out this is not enough, i.e. for IPv6 packet over VxLAN. In this case, UDP/IPv4 needs 42 bytes, GENEVE header is 8 bytes, and 86 bytes for TCP/IPv6. In total 136 bytes that is more than current 128 bytes. In this case expand header flow is reached. The warning in skb_try_coalesce() caused by a wrong truesize was already fixed here: commit 158f323 ("net: adjust skb->truesize in pskb_expand_head()"). Still, we prefer to totally avoid the expand header flow for performance reasons. Tested regular TCP_STREAM with iperf for 1 and 8 streams, no degradation was found. Fixes: 461017c ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") Signed-off-by: Eugenia Emantayev <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>

Handler for ETHTOOL_GRXCLSRLALL must set info->data to the size of the table, regardless of the amount of entries in it. Existing code does not do that, and this breaks all usage of ethtool -N or -n without explicit location, with this error: rmgr: Invalid RX class rules table size: Success Set info->data to the table size. Tested: ethtool -n ens8 ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 loc 55 ethtool -n ens8 ethtool -N ens8 delete 1023 ethtool -N ens8 delete 55 Fixes: f913a72 ("net/mlx5e: Add support to get ethtool flow rules") Signed-off-by: Ilan Tayari <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>

…inux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "The (hopefully) final fix for the irq affinity spreading logic" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/affinity: Fix calculating vectors to assign

…inux/kernel/git/tip/tip Pull RAS fix from Thomas Gleixner: "The MCE atomic notifier callchain invokes callbacks which might sleep. Convert it to a blocking notifier and prevent calls from atomic context" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Make the MCE notifier a blocking one

Pull UBI/UBIFS fixes from Richard Weinberger: "This contains fixes for issues in both UBI and UBIFS: - more O_TMPFILE fallout - RENAME_WHITEOUT regression due to a mis-merge - memory leak in ubifs_mknod() - power-cut problem in UBI's update volume feature" * tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs: ubifs: Fix O_TMPFILE corner case in ubifs_link() ubifs: Fix RENAME_WHITEOUT support ubifs: Fix debug messages for an invalid filename in ubifs_dump_inode ubifs: Fix debug messages for an invalid filename in ubifs_dump_node ubifs: Remove filename from debug messages in ubifs_readdir ubifs: Fix memory leak in error path in ubifs_mknod ubi/upd: Always flush after prepared for an update

This lets us enable KPROBE_EVENTS. Signed-off-by: David S. Miller <[email protected]>

Hook up statx. Ignore pkeys system calls, we don't have protection keeys on SPARC. Signed-off-by: David S. Miller <[email protected]>

We have observed a sudden spike in rx/tx_packets and rx/tx_bytes reported under /proc/net/dev. There is a race in mlx5e_update_stats() and some of the get-stats functions (the one that we hit is the mlx5e_get_stats() which is called by ndo_get_stats64()). In particular, the very first thing mlx5e_update_sw_counters() does is 'memset(s, 0, sizeof(*s))'. For example, if mlx5e_get_stats() is unlucky at one point, rx_bytes and rx_packets could be 0. One second later, a normal (and much bigger than 0) value will be reported. This patch is to use a 'struct mlx5e_sw_stats temp' to avoid a direct memset zero on priv->stats.sw. mlx5e_update_vport_counters() has a similar race. Hence, addressed together. However, memset zero is removed instead because it is not needed. I am lucky enough to catch this 0-reset in rx multicast: eth0: 41457665 76804 70 0 0 70 0 47085 15586634 87502 3 0 0 0 3 0 eth0: 41459860 76815 70 0 0 70 0 47094 15588376 87516 3 0 0 0 3 0 eth0: 41460577 76822 70 0 0 70 0 0 15589083 87521 3 0 0 0 3 0 eth0: 41463293 76838 70 0 0 70 0 47108 15595872 87538 3 0 0 0 3 0 eth0: 41463379 76839 70 0 0 70 0 47116 15596138 87539 3 0 0 0 3 0 v2: Remove memset zero from mlx5e_update_vport_counters() v1: Use temp and memcpy Fixes: 9218b44 ("net/mlx5e: Statistics handling refactoring") Suggested-by: Eric Dumazet <[email protected]> Suggested-by: Saeed Mahameed <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>

We dereference "skb" to get "skb->len" so we should probably do that step before freeing the skb. Fixes: eea221c ("tc35815 driver update (take 2)") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>

David reported that doing the following: ip li add red type vrf table 10 ip link set dev eth1 vrf red ip addr add 127.0.0.1/8 dev red ip link set dev eth1 up ip li set red up ping -c1 -w1 -I red 127.0.0.1 ip li del red when either policy routing IP rules are present or the local table lookup ip rule is before the l3mdev lookup results in a hang with these messages: unregister_netdevice: waiting for red to become free. Usage count = 1 The problem is caused by caching the dst used for sending the packet out of the specified interface on a local route with a different nexthop interface. Thus the dst could stay around until the route in the table the lookup was done is deleted which may be never. Address the problem by not forcing output device to be the l3mdev in the flow's output interface if the lookup didn't use the l3mdev. This then results in the dst using the right device according to the route. Changes in v2: - make the dev_out passed in by __ip_route_output_key_hash correct instead of checking the nh dev if FLOWI_FLAG_SKIP_NH_OIF is set as suggested by David. Fixes: 5f02ce2 ("net: l3mdev: Allow the l3mdev to be a loopback") Reported-by: David Ahern <[email protected]> Suggested-by: David Ahern <[email protected]> Signed-off-by: Robert Shearman <[email protected]> Acked-by: David Ahern <[email protected]> Tested-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>

While this may appear as a humdrum one line change, it's actually quite important. An sk_buff stores data in three places: 1. A linear chunk of allocated memory in skb->data. This is the easiest one to work with, but it precludes using scatterdata since the memory must be linear. 2. The array skb_shinfo(skb)->frags, which is of maximum length MAX_SKB_FRAGS. This is nice for scattergather, since these fragments can point to different pages. 3. skb_shinfo(skb)->frag_list, which is a pointer to another sk_buff, which in turn can have data in either (1) or (2). The first two are rather easy to deal with, since they're of a fixed maximum length, while the third one is not, since there can be potentially limitless chains of fragments. Fortunately dealing with frag_list is opt-in for drivers, so drivers don't actually have to deal with this mess. For whatever reason, macsec decided it wanted pain, and so it explicitly specified NETIF_F_FRAGLIST. Because dealing with (1), (2), and (3) is insane, most users of sk_buff doing any sort of crypto or paging operation calls a convenient function called skb_to_sgvec (which happens to be recursive if (3) is in use!). This takes a sk_buff as input, and writes into its output pointer an array of scattergather list items. Sometimes people like to declare a fixed size scattergather list on the stack; othertimes people like to allocate a fixed size scattergather list on the heap. However, if you're doing it in a fixed-size fashion, you really shouldn't be using NETIF_F_FRAGLIST too (unless you're also ensuring the sk_buff and its frag_list children arent't shared and then you check the number of fragments in total required.) Macsec specifically does this: size += sizeof(struct scatterlist) * (MAX_SKB_FRAGS + 1); tmp = kmalloc(size, GFP_ATOMIC); *sg = (struct scatterlist *)(tmp + sg_offset); ... sg_init_table(sg, MAX_SKB_FRAGS + 1); skb_to_sgvec(skb, sg, 0, skb->len); Specifying MAX_SKB_FRAGS + 1 is the right answer usually, but not if you're using NETIF_F_FRAGLIST, in which case the call to skb_to_sgvec will overflow the heap, and disaster ensues. Signed-off-by: Jason A. Donenfeld <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>

Otherwise, UDP checksum offloads could corrupt ESP packets by attempting to calculate UDP checksum when this inner UDP packet is already protected by IPsec. One way to reproduce this bug is to have a VM with virtio_net driver (UFO set to ON in the guest VM); and then encapsulate all guest's Ethernet frames in Geneve; and then further encrypt Geneve with IPsec. In this case following symptoms are observed: 1. If using ixgbe NIC, then it will complain with following error message: ixgbe 0000:01:00.1: partial checksum but l4 proto=32! 2. Receiving IPsec stack will drop all the corrupted ESP packets and increase XfrmInStateProtoError counter in /proc/net/xfrm_stat. 3. iperf UDP test from the VM with packet sizes above MTU will not work at all. 4. iperf TCP test from the VM will get ridiculously low performance because. Signed-off-by: Ansis Atteka <[email protected]> Co-authored-by: Steffen Klassert <[email protected]> Signed-off-by: David S. Miller <[email protected]>

If skb_put_padto() fails then it frees the skb. I shifted that code up a bit to make my error handling a little simpler. Fixes: a0d2f20 ("Renesas Ethernet AVB PTP clock driver") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Sergei Shtylyov <[email protected]> Signed-off-by: David S. Miller <[email protected]>

When arp_notify is set to 1 for either a specific interface or for 'all' interfaces, gratuitous arp requests are sent. Since ndisc_notify is the ipv6 equivalent to arp_notify, it should follow the same semantics. Commit 4a6e3c5 ("net: ipv6: send unsolicited NA on admin up") sends the NA on admin up. The final piece is checking devconf_all->ndisc_notify in addition to the per device setting. Add it. Fixes: 5cb0443 ("ipv6: add knob to send unsolicited ND on link-layer address change") Signed-off-by: David Ahern <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>

…ux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2017-04-22 This series contains some mlx5 fixes for net. For your convenience, the series doesn't introduce any conflict with the ongoing net-next pull request. Please pull and let me know if there's any problem. For -stable: ("net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5") kernels >= 4.10 ("net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling") kernels >= 4.8 ("net/mlx5e: Fix small packet threshold") kernels >= 4.7 ("net/mlx5: Fix driver load bad flow when having fw initializing timeout") kernels >= 4.4 ==================== Signed-off-by: David S. Miller <[email protected]>

…ream-linus Pull MIPS fixes from Ralf Baechle: "Another round of 4.11 for the MIPS architecture. This time around it's mostly arch but little platforms-specific code. - PCI: Register controllers in the right order to aoid a PCI error - KGDB: Use kernel context for sleeping threads - smp-cps: Fix potentially uninitialised value of core - KASLR: Fix build - ELF: Fix BUG() warning in arch_check_elf - Fix modversioning of _mcount symbol - fix out-of-tree defconfig target builds - cevt-r4k: Fix out-of-bounds array access - perf: fix deadlock - Malta: Fix i8259 irqchip setup" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: PCI: add controllers before the specified head MIPS: KGDB: Use kernel context for sleeping threads MIPS: smp-cps: Fix potentially uninitialised value of core MIPS: KASLR: Add missing header files MIPS: Avoid BUG warning in arch_check_elf MIPS: Fix modversioning of _mcount symbol MIPS: generic: fix out-of-tree defconfig target builds MIPS: cevt-r4k: Fix out-of-bounds array access MIPS: perf: fix deadlock MIPS: Malta: Fix i8259 irqchip setup

…it/jejb/scsi Pull SCSI fix from James Bottomley: "Our final fix before the 4.12 release (hopefully). It's an error leg again: the fix to not bug on empty DMA transfers is returning the wrong code and confusing the block layer" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: return correct blkprep status code in case scsi_init_io() fails.

Since Broadcom tags are not enabled in b53 (DSA_PROTO_TAG_NONE), we need to make sure that the IMP/CPU port is included in the forwarding decision. Without this change, switching between non-management ports would work, but not between management ports and non-management ports thus breaking the default state in which DSA switch are brought up. Fixes: 967dd82 ("net: dsa: b53: Add support for Broadcom RoboSwitch") Reported-by: Eric Anholt <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Implement the correct software reset sequence for 58xx devices by setting all 3 reset bits and polling for the SW_RST bit to clear itself without a given timeout. We cannot use is58xx() here because that would also include the 7445/7278 Starfighter 2 which have their own driver doing the reset earlier on due to the HW specific integration. Fixes: 991a36b ("net: dsa: b53: Add support for BCM585xx/586xx/88312 integrated switch") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>

The 58xx devices (Northstar Plus) do actually have their CPU port wired at port 8, it was unfortunately set to port 5 (B53_CPU_PORT_25) which is incorrect, since that is the second possible management port. Fixes: 991a36b ("net: dsa: b53: Add support for BCM585xx/586xx/88312 integrated switch") Reported-by: Eric Anholt <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Florian Fainelli says: ==================== net: dsa: b53: BCM58xx devices fixes This patch series contains fixes for the 58xx devices (Broadcom Northstar Plus), which were identified thanks to the help of Eric Anholt. ==================== Tested-by: Eric Anholt <[email protected]> Signed-off-by: David S. Miller <[email protected]>

tndave and others added 30 commits April 21, 2017 15:45

Linux 4.11-rc8

5a7ad11

sparc64: Fill in rest of HAVE_REGS_AND_STACK_ACCESS_API

b7c02b7

This lets us enable KPROBE_EVENTS. Signed-off-by: David S. Miller <[email protected]>

sparc: Update syscall tables.

f6ebf0b

Hook up statx. Ignore pkeys system calls, we don't have protection keeys on SPARC. Signed-off-by: David S. Miller <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bfq v8 v4.11 #1

Bfq v8 v4.11 #1

Uh oh!

ghost commented Jul 31, 2017

Uh oh!

Uh oh!

Bfq v8 v4.11 #1

Are you sure you want to change the base?

Bfq v8 v4.11 #1

Uh oh!

Conversation

ghost commented Jul 31, 2017

Uh oh!

Uh oh!