summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-18tipc: fix issues with early FAILOVER_MSG from peerTuong Lien
It appears that a FAILOVER_MSG can come from peer even when the failure link is resetting (i.e. just after the 'node_write_unlock()'...). This means the failover procedure on the node has not been started yet. The situation is as follows: node1 node2 linkb linka linka linkb | | | | | | x failure | | | RESETTING | | | | | | x failure RESET | | RESETTING FAILINGOVER | | | (FAILOVER_MSG) | | |<-------------------------------------------------| | *FAILINGOVER | | | | | (dummy FAILOVER_MSG) | | |------------------------------------------------->| | RESET | | FAILOVER_END | FAILINGOVER RESET | . . . . . . . . . . . . Once this happens, the link failover procedure will be triggered wrongly on the receiving node since the node isn't in FAILINGOVER state but then another link failover will be carried out. The consequences are: 1) A peer might get stuck in FAILINGOVER state because the 'sync_point' was set, reset and set incorrectly, the criteria to end the failover would not be met, it could keep waiting for a message that has already received. 2) The early FAILOVER_MSG(s) could be queued in the link failover deferdq but would be purged or not pulled out because the 'drop_point' was not set correctly. 3) The early FAILOVER_MSG(s) could be dropped too. 4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER state shortly, but later on it would be restarted. The same situation can also happen when the link is in PEER_RESET state and a FAILOVER_MSG arrives. The commit resolves the issues by forcing the link down immediately, so the failover procedure will be started normally (which is the same as when receiving a FAILOVER_MSG and the link is in up state). Also, the function "tipc_node_link_failover()" is toughen to avoid such a situation from happening. Acked-by: Jon Maloy <jon.maloy@ericsson.se> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18bnx2x: Check if transceiver implements DDM before accessMauro S. M. Rodrigues
Some transceivers may comply with SFF-8472 even though they do not implement the Digital Diagnostic Monitoring (DDM) interface described in the spec. The existence of such area is specified by the 6th bit of byte 92, set to 1 if implemented. Currently, without checking this bit, bnx2x fails trying to read sfp module's EEPROM with the follow message: ethtool -m enP5p1s0f1 Cannot get Module EEPROM data: Input/output error Because it fails to read the additional 256 bytes in which it is assumed to exist the DDM data. This issue was noticed using a Mellanox Passive DAC PN 01FT738. The EEPROM data was confirmed by Mellanox as correct and similar to other Passive DACs from other manufacturers. Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: "MS_MOVE regression fix + breakage in fsmount(2) (also introduced in this cycle, along with fsmount(2) itself). I'm still digging through the piles of mail, so there might be more fixes to follow, but these two are obvious and self-contained, so there's no point delaying those..." * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/namespace: fix unprivileged mount propagation vfs: fsmount: add missing mntget()
2019-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Lots of bug fixes here: 1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer. 2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John Crispin. 3) Use after free in psock backlog workqueue, from John Fastabend. 4) Fix source port matching in fdb peer flow rule of mlx5, from Raed Salem. 5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet. 6) Network header needs to be set for packet redirect in nfp, from John Hurley. 7) Fix udp zerocopy refcnt, from Willem de Bruijn. 8) Don't assume linear buffers in vxlan and geneve error handlers, from Stefano Brivio. 9) Fix TOS matching in mlxsw, from Jiri Pirko. 10) More SCTP cookie memory leak fixes, from Neil Horman. 11) Fix VLAN filtering in rtl8366, from Linus Walluij. 12) Various TCP SACK payload size and fragmentation memory limit fixes from Eric Dumazet. 13) Use after free in pneigh_get_next(), also from Eric Dumazet. 14) LAPB control block leak fix from Jeremy Sowden" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits) lapb: fixed leak of control-blocks. tipc: purge deferredq list for each grp member in tipc_group_delete ax25: fix inconsistent lock state in ax25_destroy_timer neigh: fix use-after-free read in pneigh_get_next tcp: fix compile error if !CONFIG_SYSCTL hv_sock: Suppress bogus "may be used uninitialized" warnings be2net: Fix number of Rx queues used for flow hashing net: handle 802.1P vlan 0 packets properly tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() tcp: add tcp_min_snd_mss sysctl tcp: tcp_fragment() should apply sane memory limits tcp: limit payload size of sacked skbs Revert "net: phylink: set the autoneg state in phylink_phy_change" bpf: fix nested bpf tracepoints with per-cpu data bpf: Fix out of bounds memory access in bpf_sk_storage vsock/virtio: set SOCK_DONE on peer shutdown net: dsa: rtl8366: Fix up VLAN filtering net: phylink: set the autoneg state in phylink_phy_change net: add high_order_alloc_disable sysctl/static key tcp: add tcp_tx_skb_cache sysctl ...
2019-06-17fs/namespace: fix unprivileged mount propagationChristian Brauner
When propagating mounts across mount namespaces owned by different user namespaces it is not possible anymore to move or umount the mount in the less privileged mount namespace. Here is a reproducer: sudo mount -t tmpfs tmpfs /mnt sudo --make-rshared /mnt # create unprivileged user + mount namespace and preserve propagation unshare -U -m --map-root --propagation=unchanged # now change back to the original mount namespace in another terminal: sudo mkdir /mnt/aaa sudo mount -t tmpfs tmpfs /mnt/aaa # now in the unprivileged user + mount namespace mount --move /mnt/aaa /opt Unfortunately, this is a pretty big deal for userspace since this is e.g. used to inject mounts into running unprivileged containers. So this regression really needs to go away rather quickly. The problem is that a recent change falsely locked the root of the newly added mounts by setting MNT_LOCKED. Fix this by only locking the mounts on copy_mnt_ns() and not when adding a new mount. Fixes: 3bd045cc9c4b ("separate copying and locking mount tree on cross-userns copies") Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Tested-by: Christian Brauner <christian@brauner.io> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-06-17vfs: fsmount: add missing mntget()Eric Biggers
sys_fsmount() needs to take a reference to the new mount when adding it to the anonymous mount namespace. Otherwise the filesystem can be unmounted while it's still in use, as found by syzkaller. Reported-by: Mark Rutland <mark.rutland@arm.com> Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-06-17Merge branch 'tcp-fixes'David S. Miller
Eric Dumazet says: ==================== tcp: make sack processing more robust Jonathan Looney brought to our attention multiple problems in TCP stack at the sender side. SACK processing can be abused by malicious peers to either cause overflows, or increase of memory usage. First two patches fix the immediate problems. Since the malicious peers abuse senders by advertizing a very small MSS in their SYN or SYNACK packet, the last two patches add a new sysctl so that admins can chose a higher limit for MSS clamping. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17Merge tag 'riscv-for-v5.2/fixes-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "This contains fixes, defconfig, and DT data changes for the v5.2-rc series. The fixes are relatively straightforward: - Addition of a TLB fence in the vmalloc_fault path, so the CPU doesn't enter an infinite page fault loop - Readdition of the pm_power_off export, so device drivers that reassign it can now be built as modules - A udelay() fix for RV32, fixing a miscomputation of the delay time - Removal of deprecated smp_mb__*() barriers This also adds initial DT data infrastructure for arch/riscv, along with initial data for the SiFive FU540-C000 SoC and the corresponding HiFive Unleashed board. We also update the RV64 defconfig to include some core drivers for the FU540 in the build" * tag 'riscv-for-v5.2/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: remove unused barrier defines riscv: mm: synchronize MMU after pte change riscv: dts: add initial board data for the SiFive HiFive Unleashed riscv: dts: add initial support for the SiFive FU540-C000 SoC dt-bindings: riscv: convert cpu binding to json-schema dt-bindings: riscv: sifive: add YAML documentation for the SiFive FU540 arch: riscv: add support for building DTB files from DT source data riscv: Fix udelay in RV32. riscv: export pm_power_off again RISC-V: defconfig: enable clocks, serial console
2019-06-17riscv: remove unused barrier definesRolf Eike Beer
They were introduced in commit fab957c11efe ("RISC-V: Atomic and Locking Code") long after commit 2e39465abc4b ("locking: Remove deprecated smp_mb__() barriers") removed the remnants of all previous instances from the tree. Signed-off-by: Rolf Eike Beer <eb@emlix.com> [paul.walmsley@sifive.com: stripped spurious mbox header from patch description; fixed commit references in patch header] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-06-17riscv: mm: synchronize MMU after pte changeShihPo Hung
Because RISC-V compliant implementations can cache invalid entries in TLB, an SFENCE.VMA is necessary after changes to the page table. This patch adds an SFENCE.vma for the vmalloc_fault path. Signed-off-by: ShihPo Hung <shihpo.hung@sifive.com> [paul.walmsley@sifive.com: reversed tab->whitespace conversion, wrapped comment lines] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-riscv@lists.infradead.org Cc: stable@vger.kernel.org
2019-06-17riscv: dts: add initial board data for the SiFive HiFive UnleashedPaul Walmsley
Add initial board data for the SiFive HiFive Unleashed A00. Currently the data populated in this DT file describes the board DRAM configuration and the external clock sources that supply the PRCI. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Paul Walmsley <paul@pwsan.com> Tested-by: Loys Ollivier <lollivier@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Antony Pavlov <antonynpavlov@gmail.com> Cc: devicetree@vger.kernel.org Cc: linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org
2019-06-17riscv: dts: add initial support for the SiFive FU540-C000 SoCPaul Walmsley
Add initial support for the SiFive FU540-C000 SoC. This is a 28nm SoC based around the SiFive U54-MC core complex and a TileLink interconnect. This file is expected to grow as more device drivers are added to the kernel. This patch includes a fix to the QSPI memory map due to a documentation bug, found by ShihPo Hung <shihpo.hung@sifive.com>, adds entries for the I2C controller, and merges all DT changes that formerly were made dynamically by the riscv-pk BBL proxy kernel. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Paul Walmsley <paul@pwsan.com> Tested-by: Loys Ollivier <lollivier@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: ShihPo Hung <shihpo.hung@sifive.com> Cc: devicetree@vger.kernel.org Cc: linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org
2019-06-17dt-bindings: riscv: convert cpu binding to json-schemaPaul Walmsley
At Rob's request, we're starting to migrate our DT binding documentation to json-schema YAML format. Start by converting our cpu binding documentation. While doing so, document more properties and nodes. This includes adding binding documentation support for the E51 and U54 CPU cores ("harts") that are present on this SoC. These cores are described in: https://static.dev.sifive.com/FU540-C000-v1.0.pdf This cpus.yaml file is intended to be a starting point and to evolve over time. It passes dt-doc-validate as of the yaml-bindings commit 4c79d42e9216. This patch was originally based on the ARM json-schema binding documentation as added by commit 672951cbd1b7 ("dt-bindings: arm: Convert cpu binding to json-schema"). Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Paul Walmsley <paul@pwsan.com> Reviewed-by: Rob Herring <robh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: devicetree@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-riscv@lists.infradead.org
2019-06-17dt-bindings: riscv: sifive: add YAML documentation for the SiFive FU540Paul Walmsley
Add YAML DT binding documentation for the SiFive FU540 SoC. This SoC is documented at: https://static.dev.sifive.com/FU540-C000-v1.0.pdf Passes dt-doc-validate, as of yaml-bindings commit 4c79d42e9216. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Paul Walmsley <paul@pwsan.com> Reviewed-by: Rob Herring <robh@kernel.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: devicetree@vger.kernel.org Cc: linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org
2019-06-17arch: riscv: add support for building DTB files from DT source dataPaul Walmsley
Similar to ARM64, add support for building DTB files from DT source data for RISC-V boards. This patch starts with the infrastructure needed for SiFive boards. Boards from other vendors would add support here in a similar form. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Paul Walmsley <paul@pwsan.com> Tested-by: Loys Ollivier <lollivier@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu>
2019-06-16lapb: fixed leak of control-blocks.Jeremy Sowden
lapb_register calls lapb_create_cb, which initializes the control- block's ref-count to one, and __lapb_insert_cb, which increments it when adding the new block to the list of blocks. lapb_unregister calls __lapb_remove_cb, which decrements the ref-count when removing control-block from the list of blocks, and calls lapb_put itself to decrement the ref-count before returning. However, lapb_unregister also calls __lapb_devtostruct to look up the right control-block for the given net_device, and __lapb_devtostruct also bumps the ref-count, which means that when lapb_unregister returns the ref-count is still 1 and the control-block is leaked. Call lapb_put after __lapb_devtostruct to fix leak. Reported-by: syzbot+afb980676c836b4a0afa@syzkaller.appspotmail.com Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-16tipc: purge deferredq list for each grp member in tipc_group_deleteXin Long
Syzbot reported a memleak caused by grp members' deferredq list not purged when the grp is be deleted. The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in tipc_group_filter_msg() and the skb will stay in deferredq. So fix it by calling __skb_queue_purge for each member's deferredq in tipc_group_delete() when a tipc sk leaves the grp. Fixes: b87a5ea31c93 ("tipc: guarantee group unicast doesn't bypass group broadcast") Reported-by: syzbot+78fbe679c8ca8d264a8d@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-16ax25: fix inconsistent lock state in ax25_destroy_timerEric Dumazet
Before thread in process context uses bh_lock_sock() we must disable bh. sysbot reported : WARNING: inconsistent lock state 5.2.0-rc3+ #32 Not tainted inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. blkid/26581 [HC0[0]:SC1[1]:HE1:SE0] takes: 00000000e0da85ee (slock-AF_AX25){+.?.}, at: spin_lock include/linux/spinlock.h:338 [inline] 00000000e0da85ee (slock-AF_AX25){+.?.}, at: ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275 {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:338 [inline] ax25_rt_autobind+0x3ca/0x720 net/ax25/ax25_route.c:429 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1221 __sys_connect+0x264/0x330 net/socket.c:1834 __do_sys_connect net/socket.c:1845 [inline] __se_sys_connect net/socket.c:1842 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:1842 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe irq event stamp: 2272 hardirqs last enabled at (2272): [<ffffffff810065f3>] trace_hardirqs_on_thunk+0x1a/0x1c hardirqs last disabled at (2271): [<ffffffff8100660f>] trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (1522): [<ffffffff87400654>] __do_softirq+0x654/0x94c kernel/softirq.c:320 softirqs last disabled at (2267): [<ffffffff81449010>] invoke_softirq kernel/softirq.c:374 [inline] softirqs last disabled at (2267): [<ffffffff81449010>] irq_exit+0x180/0x1d0 kernel/softirq.c:414 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_AX25); <Interrupt> lock(slock-AF_AX25); *** DEADLOCK *** 1 lock held by blkid/26581: #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:175 [inline] #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: call_timer_fn+0xe0/0x720 kernel/time/timer.c:1312 stack backtrace: CPU: 1 PID: 26581 Comm: blkid Not tainted 5.2.0-rc3+ #32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_usage_bug.cold+0x393/0x4a2 kernel/locking/lockdep.c:2935 valid_state kernel/locking/lockdep.c:2948 [inline] mark_lock_irq kernel/locking/lockdep.c:3138 [inline] mark_lock+0xd46/0x1370 kernel/locking/lockdep.c:3513 mark_irqflags kernel/locking/lockdep.c:3391 [inline] __lock_acquire+0x159f/0x5490 kernel/locking/lockdep.c:3745 lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:338 [inline] ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275 call_timer_fn+0x193/0x720 kernel/time/timer.c:1322 expire_timers kernel/time/timer.c:1366 [inline] __run_timers kernel/time/timer.c:1685 [inline] __run_timers kernel/time/timer.c:1653 [inline] run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698 __do_softirq+0x25c/0x94c kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806 </IRQ> RIP: 0033:0x7f858d5c3232 Code: 8b 61 08 48 8b 84 24 d8 00 00 00 4c 89 44 24 28 48 8b ac 24 d0 00 00 00 4c 8b b4 24 e8 00 00 00 48 89 7c 24 68 48 89 4c 24 78 <48> 89 44 24 58 8b 84 24 e0 00 00 00 89 84 24 84 00 00 00 8b 84 24 RSP: 002b:00007ffcaf0cf5c0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 RAX: 00007f858d7d27a8 RBX: 00007f858d7d8820 RCX: 00007f858d3940d8 RDX: 00007ffcaf0cf798 RSI: 00000000f5e616f3 RDI: 00007f858d394fee RBP: 0000000000000000 R08: 00007ffcaf0cf780 R09: 00007f858d7db480 R10: 0000000000000000 R11: 0000000009691a75 R12: 0000000000000005 R13: 00000000f5e616f3 R14: 0000000000000000 R15: 00007ffcaf0cf798 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-16neigh: fix use-after-free read in pneigh_get_nextEric Dumazet
Nine years ago, I added RCU handling to neighbours, not pneighbours. (pneigh are not commonly used) Unfortunately I missed that /proc dump operations would use a common entry and exit point : neigh_seq_start() and neigh_seq_stop() We need to read_lock(tbl->lock) or risk use-after-free while iterating the pneigh structures. We might later convert pneigh to RCU and revert this patch. sysbot reported : BUG: KASAN: use-after-free in pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158 Read of size 8 at addr ffff888097f2a700 by task syz-executor.0/9825 CPU: 1 PID: 9825 Comm: syz-executor.0 Not tainted 5.2.0-rc4+ #32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158 neigh_seq_next+0xdb/0x210 net/core/neighbour.c:3240 seq_read+0x9cf/0x1110 fs/seq_file.c:258 proc_reg_read+0x1fc/0x2c0 fs/proc/inode.c:221 do_loop_readv_writev fs/read_write.c:714 [inline] do_loop_readv_writev fs/read_write.c:701 [inline] do_iter_read+0x4a4/0x660 fs/read_write.c:935 vfs_readv+0xf0/0x160 fs/read_write.c:997 kernel_readv fs/splice.c:359 [inline] default_file_splice_read+0x475/0x890 fs/splice.c:414 do_splice_to+0x127/0x180 fs/splice.c:877 splice_direct_to_actor+0x2d2/0x970 fs/splice.c:954 do_splice_direct+0x1da/0x2a0 fs/splice.c:1063 do_sendfile+0x597/0xd00 fs/read_write.c:1464 __do_sys_sendfile64 fs/read_write.c:1525 [inline] __se_sys_sendfile64 fs/read_write.c:1511 [inline] __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4592c9 Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f4aab51dc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00000000004592c9 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000246 R12: 00007f4aab51e6d4 R13: 00000000004c689d R14: 00000000004db828 R15: 00000000ffffffff Allocated by task 9827: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc mm/slab.c:3660 [inline] __kmalloc+0x15c/0x740 mm/slab.c:3669 kmalloc include/linux/slab.h:552 [inline] pneigh_lookup+0x19c/0x4a0 net/core/neighbour.c:731 arp_req_set_public net/ipv4/arp.c:1010 [inline] arp_req_set+0x613/0x720 net/ipv4/arp.c:1026 arp_ioctl+0x652/0x7f0 net/ipv4/arp.c:1226 inet_ioctl+0x2a0/0x340 net/ipv4/af_inet.c:926 sock_do_ioctl+0xd8/0x2f0 net/socket.c:1043 sock_ioctl+0x3ed/0x780 net/socket.c:1194 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0xd5f/0x1380 fs/ioctl.c:696 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9824: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 pneigh_ifdown_and_unlock net/core/neighbour.c:812 [inline] __neigh_ifdown+0x236/0x2f0 net/core/neighbour.c:356 neigh_ifdown+0x20/0x30 net/core/neighbour.c:372 arp_ifdown+0x1d/0x21 net/ipv4/arp.c:1274 inetdev_destroy net/ipv4/devinet.c:319 [inline] inetdev_event+0xa14/0x11f0 net/ipv4/devinet.c:1544 notifier_call_chain+0xc2/0x230 kernel/notifier.c:95 __raw_notifier_call_chain kernel/notifier.c:396 [inline] raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:403 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1749 call_netdevice_notifiers_extack net/core/dev.c:1761 [inline] call_netdevice_notifiers net/core/dev.c:1775 [inline] rollback_registered_many+0x9b9/0xfc0 net/core/dev.c:8178 rollback_registered+0x109/0x1d0 net/core/dev.c:8220 unregister_netdevice_queue net/core/dev.c:9267 [inline] unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9260 unregister_netdevice include/linux/netdevice.h:2631 [inline] __tun_detach+0xd8a/0x1040 drivers/net/tun.c:724 tun_detach drivers/net/tun.c:741 [inline] tun_chr_close+0xe0/0x180 drivers/net/tun.c:3451 __fput+0x2ff/0x890 fs/file_table.c:280 ____fput+0x16/0x20 fs/file_table.c:313 task_work_run+0x145/0x1c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:185 [inline] exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:168 prepare_exit_to_usermode arch/x86/entry/common.c:199 [inline] syscall_return_slowpath arch/x86/entry/common.c:279 [inline] do_syscall_64+0x58e/0x680 arch/x86/entry/common.c:304 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888097f2a700 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff888097f2a700, ffff888097f2a740) The buggy address belongs to the page: page:ffffea00025fca80 refcount:1 mapcount:0 mapping:ffff8880aa400340 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea000250d548 ffffea00025726c8 ffff8880aa400340 raw: 0000000000000000 ffff888097f2a000 0000000100000020 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888097f2a600: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff888097f2a680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888097f2a700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff888097f2a780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888097f2a800: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-16tcp: fix compile error if !CONFIG_SYSCTLEric Dumazet
tcp_tx_skb_cache_key and tcp_rx_skb_cache_key must be available even if CONFIG_SYSCTL is not set. Fixes: 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl") Fixes: ede61ca474a0 ("tcp: add tcp_rx_skb_cache sysctl") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-16hv_sock: Suppress bogus "may be used uninitialized" warningsDexuan Cui
gcc 8.2.0 may report these bogus warnings under some condition: warning: ‘vnew’ may be used uninitialized in this function warning: ‘hvs_new’ may be used uninitialized in this function Actually, the 2 pointers are only initialized and used if the variable "conn_from_host" is true. The code is not buggy here. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-16be2net: Fix number of Rx queues used for flow hashingIvan Vecera
Number of Rx queues used for flow hashing returned by the driver is incorrect and this bug prevents user to use the last Rx queue in indirection table. Let's say we have a NIC with 6 combined queues: [root@sm-03 ~]# ethtool -l enp4s0f0 Channel parameters for enp4s0f0: Pre-set maximums: RX: 5 TX: 5 Other: 0 Combined: 6 Current hardware settings: RX: 0 TX: 0 Other: 0 Combined: 6 Default indirection table maps all (6) queues equally but the driver reports only 5 rings available. [root@sm-03 ~]# ethtool -x enp4s0f0 RX flow hash indirection table for enp4s0f0 with 5 RX ring(s): 0: 0 1 2 3 4 5 0 1 8: 2 3 4 5 0 1 2 3 16: 4 5 0 1 2 3 4 5 24: 0 1 2 3 4 5 0 1 ... Now change indirection table somehow: [root@sm-03 ~]# ethtool -X enp4s0f0 weight 1 1 [root@sm-03 ~]# ethtool -x enp4s0f0 RX flow hash indirection table for enp4s0f0 with 6 RX ring(s): 0: 0 0 0 0 0 0 0 0 ... 64: 1 1 1 1 1 1 1 1 ... Now it is not possible to change mapping back to equal (default) state: [root@sm-03 ~]# ethtool -X enp4s0f0 equal 6 Cannot set RX flow hash configuration: Invalid argument Fixes: 594ad54a2c3b ("be2net: Add support for setting and getting rx flow hash options") Reported-by: Tianhao <tizhao@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-16net: handle 802.1P vlan 0 packets properlyGovindarajulu Varadarajan
When stack receives pkt: [802.1P vlan 0][802.1AD vlan 100][IPv4], vlan_do_receive() returns false if it does not find vlan_dev. Later __netif_receive_skb_core() fails to find packet type handler for skb->protocol 801.1AD and drops the packet. 801.1P header with vlan id 0 should be handled as untagged packets. This patch fixes it by checking if vlan_id is 0 and processes next vlan header. Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-16Linux 5.2-rc5Linus Torvalds
2019-06-16Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "The accumulated fixes from this and last week: - Fix vmalloc TLB flush and map range calculations which lead to stale TLBs, spurious faults and other hard to diagnose issues. - Use fault_in_pages_writable() for prefaulting the user stack in the FPU code as it's less fragile than the current solution - Use the PF_KTHREAD flag when checking for a kernel thread instead of current->mm as the latter can give the wrong answer due to use_mm() - Compute the vmemmap size correctly for KASLR and 5-Level paging. Otherwise this can end up with a way too small vmemmap area. - Make KASAN and 5-level paging work again by making sure that all invalid bits are masked out when computing the P4D offset. This worked before but got broken recently when the LDT remap area was moved. - Prevent a NULL pointer dereference in the resource control code which can be triggered with certain mount options when the requested resource is not available. - Enforce ordering of microcode loading vs. perf initialization on secondary CPUs. Otherwise perf tries to access a non-existing MSR as the boot CPU marked it as available. - Don't stop the resource control group walk early otherwise the control bitmaps are not updated correctly and become inconsistent. - Unbreak kgdb by returning 0 on success from kgdb_arch_set_breakpoint() instead of an error code. - Add more Icelake CPU model defines so depending changes can be queued in other trees" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback x86/kasan: Fix boot with 5-level paging and KASAN x86/fpu: Don't use current->mm to check for a kthread x86/kgdb: Return 0 from kgdb_arch_set_breakpoint() x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled x86/resctrl: Don't stop walking closids when a locksetup group is found x86/fpu: Update kernel's FPU state before using for the fsave header x86/mm/KASLR: Compute the size of the vmemmap section properly x86/fpu: Use fault_in_pages_writeable() for pre-faulting x86/CPU: Add more Icelake model numbers mm/vmalloc: Avoid rare case of flushing TLB with weird arguments mm/vmalloc: Fix calculation of direct map addr range
2019-06-16Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A set of small fixes: - Repair the ktime_get_coarse() functions so they actually deliver what they are supposed to: tick granular time stamps. The current code missed to add the accumulated nanoseconds part of the timekeeper so the resulting granularity was 1 second. - Prevent the tracer from infinitely recursing into time getter functions in the arm architectured timer by marking these functions notrace - Fix a trivial compiler warning caused by wrong qualifier ordering" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Repair ktime_get_coarse*() granularity clocksource/drivers/arm_arch_timer: Don't trace count reader functions clocksource/drivers/timer-ti-dm: Change to new style declaration
2019-06-16Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two small fixes for RAS: - Use a proper search algorithm to find the correct element in the CEC array. The replacement was a better choice than fixing the crash causes by the original search function with horrible duct tape. - Move the timer based decay function into thread context so it can actually acquire the mutex which protects the CEC array to prevent corruption" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: RAS/CEC: Convert the timer callback to a workqueue RAS/CEC: Fix binary search function
2019-06-15tcp: enforce tcp_min_snd_mss in tcp_mtu_probing()Eric Dumazet
If mtu probing is enabled tcp_mtu_probing() could very well end up with a too small MSS. Use the new sysctl tcp_min_snd_mss to make sure MSS search is performed in an acceptable range. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15tcp: add tcp_min_snd_mss sysctlEric Dumazet
Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages. This forces the stack to send packets with a very high network/cpu overhead. Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload. In some cases, it can be useful to increase the minimal value to a saner value. We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons. Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88. We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15tcp: tcp_fragment() should apply sane memory limitsEric Dumazet
Jonathan Looney reported that a malicious peer can force a sender to fragment its retransmit queue into tiny skbs, inflating memory usage and/or overflow 32bit counters. TCP allows an application to queue up to sk_sndbuf bytes, so we need to give some allowance for non malicious splitting of retransmit queue. A new SNMP counter is added to monitor how many times TCP did not allow to split an skb if the allowance was exceeded. Note that this counter might increase in the case applications use SO_SNDBUF socket option to lower sk_sndbuf. CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the socket is already using more than half the allowed space Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15tcp: limit payload size of sacked skbsEric Dumazet
Jonathan Looney reported that TCP can trigger the following crash in tcp_shifted_skb() : BUG_ON(tcp_skb_pcount(skb) < pcount); This can happen if the remote peer has advertized the smallest MSS that linux TCP accepts : 48 An skb can hold 17 fragments, and each fragment can hold 32KB on x86, or 64KB on PowerPC. This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs can overflow. Note that tcp_sendmsg() builds skbs with less than 64KB of payload, so this problem needs SACK to be enabled. SACK blocks allow TCP to coalesce multiple skbs in the retransmit queue, thus filling the 17 fragments to maximal capacity. CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2019-06-15 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix stack layout of JITed x64 bpf code, from Alexei. 2) fix out of bounds memory access in bpf_sk_storage, from Arthur. 3) fix lpm trie walk, from Jonathan. 4) fix nested bpf_perf_event_output, from Matt. 5) and several other fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15Revert "net: phylink: set the autoneg state in phylink_phy_change"David S. Miller
This reverts commit ef7bfa84725d891bbdb88707ed55b2cbf94942bb. Russell King espressed some strong opposition to this change, explaining that this is trying to make phylink behave outside of how it has been designed. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15bpf: fix nested bpf tracepoints with per-cpu dataMatt Mullins
BPF_PROG_TYPE_RAW_TRACEPOINTs can be executed nested on the same CPU, as they do not increment bpf_prog_active while executing. This enables three levels of nesting, to support - a kprobe or raw tp or perf event, - another one of the above that irq context happens to call, and - another one in nmi context (at most one of which may be a kprobe or perf event). Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data") Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-15bpf: Fix out of bounds memory access in bpf_sk_storageArthur Fabre
bpf_sk_storage maps use multiple spin locks to reduce contention. The number of locks to use is determined by the number of possible CPUs. With only 1 possible CPU, bucket_log == 0, and 2^0 = 1 locks are used. When updating elements, the correct lock is determined with hash_ptr(). Calling hash_ptr() with 0 bits is undefined behavior, as it does: x >> (64 - bits) Using the value results in an out of bounds memory access. In my case, this manifested itself as a page fault when raw_spin_lock_bh() is called later, when running the self tests: ./tools/testing/selftests/bpf/test_verifier 773 775 [ 16.366342] BUG: unable to handle page fault for address: ffff8fe7a66f93f8 Force the minimum number of locks to two. Signed-off-by: Arthur Fabre <afabre@cloudflare.com> Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-15vsock/virtio: set SOCK_DONE on peer shutdownStephen Barber
Set the SOCK_DONE flag to match the TCP_CLOSING state when a peer has shut down and there is nothing left to read. This fixes the following bug: 1) Peer sends SHUTDOWN(RDWR). 2) Socket enters TCP_CLOSING but SOCK_DONE is not set. 3) read() returns -ENOTCONN until close() is called, then returns 0. Signed-off-by: Stephen Barber <smbarber@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net: dsa: rtl8366: Fix up VLAN filteringLinus Walleij
We get this regression when using RTL8366RB as part of a bridge with OpenWrt: WARNING: CPU: 0 PID: 1347 at net/switchdev/switchdev.c:291 switchdev_port_attr_set_now+0x80/0xa4 lan0: Commit of attribute (id=7) failed. (...) realtek-smi switch lan0: failed to initialize vlan filtering on this port This is because it is trying to disable VLAN filtering on VLAN0, as we have forgot to add 1 to the port number to get the right VLAN in rtl8366_vlan_filtering(): when we initialize the VLAN we associate VLAN1 with port 0, VLAN2 with port 1 etc, so we need to add 1 to the port offset. Fixes: d8652956cf37 ("net: dsa: realtek-smi: Add Realtek SMI driver") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net: phylink: set the autoneg state in phylink_phy_changeIoana Ciornei
The phy_state field of phylink should carry only valid information especially when this can be passed to the .mac_config callback. Update the an_enabled field with the autoneg state in the phylink_phy_change function. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15Merge tag 'platform-drivers-x86-v5.2-3' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver fixes from Andy Shevchenko: - fix a couple of Mellanox driver enumeration issues - fix ASUS laptop regression with backlight - fix Dell computers that got a wrong mode (tablet versus laptop) after resume * tag 'platform-drivers-x86-v5.2-3' of git://git.infradead.org/linux-platform-drivers-x86: platform/mellanox: mlxreg-hotplug: Add devm_free_irq call to remove flow platform/x86: mlx-platform: Fix parent device in i2c-mux-reg device registration platform/x86: intel-vbtn: Report switch events when event wakes device platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from asus_nb_wmi
2019-06-15Merge tag 'usb-5.2-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB driver fixes for 5.2-rc5 Nothing major, just some small gadget fixes, usb-serial new device ids, a few new quirks, and some small fixes for some regressions that have been found after the big 5.2-rc1 merge. All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: Make sure an alt mode exist before getting its partner usb: gadget: udc: lpc32xx: fix return value check in lpc32xx_udc_probe() usb: gadget: dwc2: fix zlp handling usb: dwc2: Set actual frame number for completed ISOC transfer for none DDMA usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i] usb: phy: mxs: Disable external charger detect in mxs_phy_hw_init() usb: dwc2: Fix DMA cache alignment issues usb: dwc2: host: Fix wMaxPacketSize handling (fix webcam regression) USB: Fix chipmunk-like voice when using Logitech C270 for recording audio. USB: usb-storage: Add new ID to ums-realtek usb: typec: ucsi: ccg: fix memory leak in do_flash USB: serial: option: add Telit 0x1260 and 0x1261 compositions USB: serial: pl2303: add Allied Telesis VT-Kit3 USB: serial: option: add support for Simcom SIM7500/SIM7600 RNDIS mode
2019-06-15Merge tag 'powerpc-5.2-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One fix for a regression introduced by our 32-bit KASAN support, which broke booting on machines with "bootx" early debugging enabled. A fix for a bug which broke kexec on 32-bit, introduced by changes to the 32-bit STRICT_KERNEL_RWX support in v5.1. Finally two fixes going to stable for our THP split/collapse handling, discovered by Nick. The first fixes random crashes and/or corruption in guests under sufficient load. Thanks to: Nicholas Piggin, Christophe Leroy, Aaro Koskinen, Mathieu Malaterre" * tag 'powerpc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX powerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate() powerpc/64s: Fix THP PMD collapse serialisation powerpc: Fix kexec failure on book3s/32
2019-06-15Merge tag 'trace-v5.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Out of range read of stack trace output - Fix for NULL pointer dereference in trace_uprobe_create() - Fix to a livepatching / ftrace permission race in the module code - Fix for NULL pointer dereference in free_ftrace_func_mapper() - A couple of build warning clean ups * tag 'trace-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper() module: Fix livepatch/ftrace module text permissions race tracing/uprobe: Fix obsolete comment on trace_uprobe_create() tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create() tracing: Make two symbols static tracing: avoid build warning with HAVE_NOP_MCOUNT tracing: Fix out-of-range read in trace_stack_print()
2019-06-15x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callbackBorislav Petkov
Adric Blake reported the following warning during suspend-resume: Enabling non-boot CPUs ... x86: Booting SMP configuration: smpboot: Booting Node 0 Processor 1 APIC 0x2 unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \ at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20) Call Trace: intel_set_tfa intel_pmu_cpu_starting ? x86_pmu_dead_cpu x86_pmu_starting_cpu cpuhp_invoke_callback ? _raw_spin_lock_irqsave notify_cpu_starting start_secondary secondary_startup_64 microcode: sig=0x806ea, pf=0x80, revision=0x96 microcode: updated to revision 0xb4, date = 2019-04-01 CPU1 is up The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated by microcode. The log above shows that the microcode loader callback happens after the PMU restoration, leading to the conjecture that because the microcode hasn't been updated yet, that MSR is not present yet, leading to the #GP. Add a microcode loader-specific hotplug vector which comes before the PERF vectors and thus executes earlier and makes sure the MSR is present. Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort") Reported-by: Adric Blake <promarbler14@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: x86@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
2019-06-14Merge branch 'for-5.2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "This has an unusually high density of tricky fixes: - task_get_css() could deadlock when it races against a dying cgroup. - cgroup.procs didn't list thread group leaders with live threads. This could mislead readers to think that a cgroup is empty when it's not. Fixed by making PROCS iterator include dead tasks. I made a couple mistakes making this change and this pull request contains a couple follow-up patches. - When cpusets run out of online cpus, it updates cpusmasks of member tasks in bizarre ways. Joel improved the behavior significantly" * 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: restore sanity to cpuset_cpus_allowed_fallback() cgroup: Fix css_task_iter_advance_css_set() cset skip condition cgroup: css_task_iter_skip()'d iterators must be advanced before accessed cgroup: Include dying leaders with live threads in PROCS iterations cgroup: Implement css_task_iter_skip() cgroup: Call cgroup_release() before __exit_signal() docs cgroups: add another example size for hugetlb cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
2019-06-14Merge tag 'drm-fixes-2019-06-14' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Daniel Vetter: "Nothing unsettling here, also not aware of anything serious still pending. The edid override regression fix took a bit longer since this seems to be an area with an overabundance of bad options. But the fix we have now seems like a good path forward. Next week it should be back to Dave. Summary: - fix regression on amdgpu on SI - fix edid override regression - driver fixes: amdgpu, i915, mediatek, meson, panfrost - fix writecombine for vmap in gem-shmem helper (used by panfrost) - add more panel quirks" * tag 'drm-fixes-2019-06-14' of git://anongit.freedesktop.org/drm/drm: (25 commits) drm/amdgpu: return 0 by default in amdgpu_pm_load_smu_firmware drm/amdgpu: Fix bounds checking in amdgpu_ras_is_supported() drm: add fallback override/firmware EDID modes workaround drm/edid: abstract override/firmware EDID retrieval drm/i915/perf: fix whitelist on Gen10+ drm/i915/sdvo: Implement proper HDMI audio support for SDVO drm/i915: Fix per-pixel alpha with CCS drm/i915/dmc: protect against reading random memory drm/i915/dsi: Use a fuzzy check for burst mode clock check drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc drm/panfrost: Require the simple_ondemand governor drm/panfrost: make devfreq optional again drm/gem_shmem: Use a writecombine mapping for ->vaddr drm: panel-orientation-quirks: Add quirk for GPD MicroPC drm: panel-orientation-quirks: Add quirk for GPD pocket2 drm/meson: fix G12A primary plane disabling drm/meson: fix primary plane disabling drm/meson: fix G12A HDMI PLL settings for 4K60 1000/1001 variations drm/mediatek: call mtk_dsi_stop() after mtk_drm_crtc_atomic_disable() drm/mediatek: clear num_pipes when unbind driver ...
2019-06-14Merge tag 'gfs2-v5.2.fixes2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: "Fix rounding error in gfs2_iomap_page_prepare" * tag 'gfs2-v5.2.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix rounding error in gfs2_iomap_page_prepare
2019-06-14Merge branch 'tcp-add-three-static-keys'David S. Miller
Eric Dumazet says: ==================== tcp: add three static keys Recent addition of per TCP socket rx/tx cache brought regressions for some workloads, as reported by Feng Tang. It seems better to make them opt-in, before we adopt better heuristics. The last patch adds high_order_alloc_disable sysctl to ask TCP sendmsg() to exclusively use order-0 allocations, as mm layer has specific optimizations. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14net: add high_order_alloc_disable sysctl/static keyEric Dumazet
>From linux-3.7, (commit 5640f7685831 "net: use a per task frag allocator") TCP sendmsg() has preferred using order-3 allocations. While it gives good results for most cases, we had reports that heavy uses of TCP over loopback were hitting a spinlock contention in page allocations/freeing. This commits adds a sysctl so that admins can opt-in for order-0 allocations. Hopefully mm layer might optimize order-3 allocations in the future since it could give us a nice boost (see 8 lines of following benchmark) The following benchmark shows a win when more than 8 TCP_STREAM threads are running (56 x86 cores server in my tests) for thr in {1..30} do sysctl -wq net.core.high_order_alloc_disable=0 T0=`./super_netperf $thr -H 127.0.0.1 -l 15` sysctl -wq net.core.high_order_alloc_disable=1 T1=`./super_netperf $thr -H 127.0.0.1 -l 15` echo $thr:$T0:$T1 done 1: 49979: 37267 2: 98745: 76286 3: 141088: 110051 4: 177414: 144772 5: 197587: 173563 6: 215377: 208448 7: 241061: 234087 8: 267155: 263373 9: 295069: 297402 10: 312393: 335213 11: 340462: 368778 12: 371366: 403954 13: 412344: 443713 14: 426617: 473580 15: 474418: 507861 16: 503261: 538539 17: 522331: 563096 18: 532409: 567084 19: 550824: 605240 20: 525493: 641988 21: 564574: 665843 22: 567349: 690868 23: 583846: 710917 24: 588715: 736306 25: 603212: 763494 26: 604083: 792654 27: 602241: 796450 28: 604291: 797993 29: 611610: 833249 30: 577356: 841062 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14tcp: add tcp_tx_skb_cache sysctlEric Dumazet
Feng Tang reported a performance regression after introduction of per TCP socket tx/rx caches, for TCP over loopback (netperf) There is high chance the regression is caused by a change on how well the 32 KB per-thread page (current->task_frag) can be recycled, and lack of pcp caches for order-3 pages. I could not reproduce the regression myself, cpus all being spinning on the mm spinlocks for page allocs/freeing, regardless of enabling or disabling the per tcp socket caches. It seems best to disable the feature by default, and let admins enabling it. MM layer either needs to provide scalable order-3 pages allocations, or could attempt a trylock on zone->lock if the caller only attempts to get a high-order page and is able to fallback to order-0 ones in case of pressure. Tests run on a 56 cores host (112 hyper threads) - 35.49% netperf [kernel.vmlinux] [k] queued_spin_lock_slowpath - 35.49% queued_spin_lock_slowpath - 18.18% get_page_from_freelist - __alloc_pages_nodemask - 18.18% alloc_pages_current skb_page_frag_refill sk_page_frag_refill tcp_sendmsg_locked tcp_sendmsg inet_sendmsg sock_sendmsg __sys_sendto __x64_sys_sendto do_syscall_64 entry_SYSCALL_64_after_hwframe __libc_send + 17.31% __free_pages_ok + 31.43% swapper [kernel.vmlinux] [k] intel_idle + 9.12% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string + 6.53% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string + 0.69% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath + 0.68% netperf [kernel.vmlinux] [k] skb_release_data + 0.52% netperf [kernel.vmlinux] [k] tcp_sendmsg_locked 0.46% netperf [kernel.vmlinux] [k] _raw_spin_lock_irqsave Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Feng Tang <feng.tang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14tcp: add tcp_rx_skb_cache sysctlEric Dumazet
Instead of relying on rps_needed, it is safer to use a separate static key, since we do not want to enable TCP rx_skb_cache by default. This feature can cause huge increase of memory usage on hosts with millions of sockets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>