summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-04-15MIPS: scall64-o32: Fix indirect syscall number loadAurelien Jarno
Commit 4c21b8fd8f14 (MIPS: seccomp: Handle indirect system calls (o32)) added indirect syscall detection for O32 processes running on MIPS64, but it did not work correctly for big endian kernel/processes. The reason is that the syscall number is loaded from ARG1 using the lw instruction while this is a 64-bit value, so zero is loaded instead of the syscall number. Fix the code by using the ld instruction instead. When running a 32-bit processes on a 64 bit CPU, the values are properly sign-extended, so it ensures the value passed to syscall_trace_enter is correct. Recent systemd versions with seccomp enabled whitelist the getpid syscall for their internal processes (e.g. systemd-journald), but call it through syscall(SYS_getpid). This fix therefore allows O32 big endian systems with a 64-bit kernel to run recent systemd versions. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Cc: <stable@vger.kernel.org> # v3.15+ Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org
2019-04-15io_uring: fix possible deadlock between io_uring_{enter,register}Jens Axboe
If we have multiple threads, one doing io_uring_enter() while the other is doing io_uring_register(), we can run into a deadlock between the two. io_uring_register() must wait for existing users of the io_uring instance to exit. But it does so while holding the io_uring mutex. Callers of io_uring_enter() may need this mutex to make progress (and eventually exit). If we wait for users to exit in io_uring_register(), we can't do so with the io_uring mutex held without potentially risking a deadlock. Drop the io_uring mutex while waiting for existing callers to exit. This is safe and guaranteed to make forward progress, since we already killed the percpu ref before doing so. Hence later callers of io_uring_enter() will be rejected. Reported-by: syzbot+16dc03452dee970a0c3e@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-15bridge: only include nf_queue.h if neededStephen Rothwell
After merging the netfilter-next tree, today's linux-next build (powerpc ppc44x_defconfig) failed like this: In file included from net/bridge/br_input.c:19: include/net/netfilter/nf_queue.h:16:23: error: field 'state' has incomplete type struct nf_hook_state state; ^~~~~ Fixes: 971502d77faa ("bridge: netfilter: unroll NF_HOOK helper in bridge input path") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-15drm/i915: Restore correct bxt_ddi_phy_calc_lane_lat_optim_mask() calculationVille Syrjälä
We are no longer calling bxt_ddi_phy_calc_lane_lat_optim_mask() when intel{hdmi,dp}_compute_config() succeeds, and instead only call it when those fail. This is fallout from the bool->int .compute_config() conversion which failed to invert the return value check before calling bxt_ddi_phy_calc_lane_lat_optim_mask(). Let's just replace it with an early bailout so that it's harder to miss. This restores the correct latency optim setting calculation (which could fix some real failures), and avoids the MISSING_CASE() from bxt_ddi_phy_calc_lane_lat_optim_mask() after intel{hdmi,dp}_compute_config() has failed. Cc: Lyude Paul <lyude@redhat.com> Fixes: 204474a6b859 ("drm/i915: Pass down rc in intel_encoder->compute_config()") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109373 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190411164925.28491-1-ville.syrjala@linux.intel.com Reviewed-by: Lyude Paul <lyude@redhat.com> (cherry picked from commit 7a412b8f60cd57ab7dcb72ab701fde2bf81752eb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2019-04-15drm/i915: Do not enable FEC without DSCVille Syrjälä
Currently we enable FEC even when DSC is no used. While that is theoretically valid supposedly there isn't much of a benefit from this. But more importantly we do not account for the FEC link bandwidth overhead (2.4%) in the non-DSC link bandwidth computations. So the code may think we have enough bandwidth when we in fact do not. Cc: stable@vger.kernel.org Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: Manasi Navare <manasi.d.navare@intel.com> Fixes: 240999cf339f ("i915/dp/fec: Add fec_enable to the crtc state.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190326144903.6617-1-ville.syrjala@linux.intel.com Reviewed-by: Manasi Navare <manasi.d.navare@intel.com> (cherry picked from commit 6fd3134ae3551d4802a04669c0f39f2f5c56f77d) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2019-04-15arch: add pidfd and io_uring syscalls everywhereArnd Bergmann
Add the io_uring and pidfd_send_signal system calls to all architectures. These system calls are designed to handle both native and compat tasks, so all entries are the same across architectures, only arm-compat and the generic tale still use an old format. Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> (s390) Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-04-15KVM: x86/mmu: Fix an inverted list_empty() check when zapping sptesSean Christopherson
A recently introduced helper for handling zap vs. remote flush incorrectly bails early, effectively leaking defunct shadow pages. Manifests as a slab BUG when exiting KVM due to the shadow pages being alive when their associated cache is destroyed. ========================================================================== BUG kvm_mmu_page_header: Objects remaining in kvm_mmu_page_header on ... -------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Slab 0x00000000fc436387 objects=26 used=23 fp=0x00000000d023caee ... CPU: 6 PID: 4315 Comm: rmmod Tainted: G B 5.1.0-rc2+ #19 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x46/0x5b slab_err+0xad/0xd0 ? on_each_cpu_mask+0x3c/0x50 ? ksm_migrate_page+0x60/0x60 ? on_each_cpu_cond_mask+0x7c/0xa0 ? __kmalloc+0x1ca/0x1e0 __kmem_cache_shutdown+0x13a/0x310 shutdown_cache+0xf/0x130 kmem_cache_destroy+0x1d5/0x200 kvm_mmu_module_exit+0xa/0x30 [kvm] kvm_arch_exit+0x45/0x60 [kvm] kvm_exit+0x6f/0x80 [kvm] vmx_exit+0x1a/0x50 [kvm_intel] __x64_sys_delete_module+0x153/0x1f0 ? exit_to_usermode_loop+0x88/0xc0 do_syscall_64+0x4f/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a21136345cb6f ("KVM: x86/mmu: Split remote_flush+zap case out of kvm_mmu_flush_or_zap()") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-15drivers: power: supply: goldfish_battery: Fix bogus SPDX identifierThomas Gleixner
spdxcheck.py complains: drivers/power/supply/goldfish_battery.c: 1:28 Invalid License ID: GPL which is correct because GPL is not a valid identifier. Of course this could have been caught by checkpatch.pl _before_ submitting or merging the patch. WARNING: 'SPDX-License-Identifier: GPL' is not supported in LICENSES/... #19: FILE: drivers/power/supply/goldfish_battery.c:1: +// SPDX-License-Identifier: GPL Which is absolutely hillarious as the commit introducing this wreckage says in the changelog: There was a checkpatch complain: "Missing or malformed SPDX-License-Identifier tag". Oh well. Replacing a checkpatch warning by a different checkpatch warning is a really useful exercise. Use the proper GPL-2.0 identifier which is what the boiler plate in the file had originally. Fixes: e75e3a125b40 ("drivers: power: supply: goldfish_battery: Put an SPDX tag") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-15xfrm: kconfig: make xfrm depend on inetFlorian Westphal
when CONFIG_INET is not enabled: net/xfrm/xfrm_output.c: In function ‘xfrm4_tunnel_encap_add’: net/xfrm/xfrm_output.c:234:2: error: implicit declaration of function ‘ip_select_ident’ [-Werror=implicit-function-declaration] ip_select_ident(dev_net(dst->dev), skb, NULL); XFRM only supports ipv4 and ipv6 so change dependency to INET and place user-visible options (pfkey sockets, migrate support and the like) under 'if INET' guard as well. Fixes: 1de70830066b7 ("xfrm: remove output2 indirection from xfrm_mode") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-15s390/pkey: add one more argument space for debug feature entryHarald Freudenberger
The debug feature entries have been used with up to 5 arguents (including the pointer to the format string) but there was only space reserved for 4 arguemnts. So now the registration does reserve space for 5 times a long value. This fixes a sometime appearing weired value as the last value of an debug feature entry like this: ... pkey_sec2protkey zcrypt_send_cprb (cardnr=10 domain=12) failed with errno -2143346254 Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reported-by: Christian Rund <Christian.Rund@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-04-15netfilter: nat: fix icmp id randomizationFlorian Westphal
Sven Auhagen reported that a 2nd ping request will fail if 'fully-random' mode is used. Reason is that if no proto information is given, min/max are both 0, so we set the icmp id to 0 instead of chosing a random value between 0 and 65535. Update test case as well to catch this, without fix this yields: [..] ERROR: cannot ping ns1 from ns2 with ip masquerade fully-random (attempt 2) ERROR: cannot ping ns1 from ns2 with ipv6 masquerade fully-random (attempt 2) ... becaus 2nd ping clashes with existing 'id 0' icmp conntrack and gets dropped. Fixes: 203f2e78200c27e ("netfilter: nat: remove l4proto->unique_tuple") Reported-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-15netfilter: nf_tables: prevent shift wrap in nft_chain_parse_hook()Dan Carpenter
I believe that "hook->num" can be up to UINT_MAX. Shifting more than 31 bits would is undefined in C but in practice it would lead to shift wrapping. That would lead to an array overflow in nf_tables_addchain(): ops->hook = hook.type->hooks[ops->hooknum]; Fixes: fe19c04ca137 ("netfilter: nf_tables: remove nhooks field from struct nft_af_info") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-15netfilter: ctnetlink: don't use conntrack/expect object addresses as idFlorian Westphal
else, we leak the addresses to userspace via ctnetlink events and dumps. Compute an ID on demand based on the immutable parts of nf_conn struct. Another advantage compared to using an address is that there is no immediate re-use of the same ID in case the conntrack entry is freed and reallocated again immediately. Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID") Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-14drm/amd/display: If one stream full updates, full update all planesDavid Francis
[Why] On some compositors, with two monitors attached, VT terminal switch can cause a graphical issue by the following means: There are two streams, one for each monitor. Each stream has one plane current state: M1:S1->P1 M2:S2->P2 The user calls for a terminal switch and a commit is made to change both planes to linear swizzle mode. In atomic check, a new dc_state is constructed with new planes on each stream new state: M1:S1->P3 M2:S2->P4 In commit tail, each stream is committed, one at a time. The first stream (S1) updates properly, triggerring a full update and replacing the state current state: M1:S1->P3 M2:S2->P4 The update for S2 comes in, but dc detects that there is no difference between the stream and plane in the new and current states, and so triggers a fast update. The fast update does not program swizzle, so the second monitor is corrupted [How] Add a flag to dc_plane_state that forces full updates When a stream undergoes a full update, set this flag on all changed planes, then clear it on the current stream Subsequent streams will get full updates as a result Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Roman Li <Roman.Li@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet Lakha@amd.com> Acked-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-04-14Linux 5.1-rc5v5.1-rc5Linus Torvalds
2019-04-14Merge branch 'page-refs' (page ref overflow)Linus Torvalds
Merge page ref overflow branch. Jann Horn reported that he can overflow the page ref count with sufficient memory (and a filesystem that is intentionally extremely slow). Admittedly it's not exactly easy. To have more than four billion references to a page requires a minimum of 32GB of kernel memory just for the pointers to the pages, much less any metadata to keep track of those pointers. Jann needed a total of 140GB of memory and a specially crafted filesystem that leaves all reads pending (in order to not ever free the page references and just keep adding more). Still, we have a fairly straightforward way to limit the two obvious user-controllable sources of page references: direct-IO like page references gotten through get_user_pages(), and the splice pipe page duplication. So let's just do that. * branch page-refs: fs: prevent page refcount overflow in pipe_buf_get mm: prevent get_user_pages() from overflowing page refcount mm: add 'try_get_page()' helper function mm: make page ref count overflow check tighter and more explicit
2019-04-14Merge tag 'mlx5-fixes-2019-04-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-04-09 This series provides some fixes to mlx5 driver. I've cc'ed some of the checksum fixes to Eric Dumazet and i would like to get his feedback before you pull. For -stable v4.19 ('net/mlx5: FPGA, tls, idr remove on flow delete') ('net/mlx5: FPGA, tls, hold rcu read lock a bit longer') For -stable v4.20 ('net/mlx5e: Rx, Check ip headers sanity') ('Revert "net/mlx5e: Enable reporting checksum unnecessary also for L3 packets"') ('net/mlx5e: Rx, Fixup skb checksum for packets with tail padding') For -stable v5.0 ('net/mlx5e: Switch to Toeplitz RSS hash by default') ('net/mlx5e: Protect against non-uplink representor for encap') ('net/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14rtnetlink: fix rtnl_valid_stats_req() nlmsg_len checkEric Dumazet
Jakub forgot to either use nlmsg_len() or nlmsg_msg_size(), allowing KMSAN to detect a possible uninit-value in rtnl_stats_get BUG: KMSAN: uninit-value in rtnl_stats_get+0x6d9/0x11d0 net/core/rtnetlink.c:4997 CPU: 0 PID: 10428 Comm: syz-executor034 Not tainted 5.1.0-rc2+ #24 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x131/0x2a0 mm/kmsan/kmsan.c:619 __msan_warning+0x7a/0xf0 mm/kmsan/kmsan_instr.c:310 rtnl_stats_get+0x6d9/0x11d0 net/core/rtnetlink.c:4997 rtnetlink_rcv_msg+0x115b/0x1550 net/core/rtnetlink.c:5192 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2485 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5210 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1925 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg net/socket.c:632 [inline] ___sys_sendmsg+0xdb3/0x1220 net/socket.c:2137 __sys_sendmsg net/socket.c:2175 [inline] __do_sys_sendmsg net/socket.c:2184 [inline] __se_sys_sendmsg+0x305/0x460 net/socket.c:2182 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2182 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 Fixes: 51bc860d4a99 ("rtnetlink: stats: validate attributes in get as well as dumps") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14x86/speculation: Prevent deadlock on ssb_state::lockThomas Gleixner
Mikhail reported a lockdep splat related to the AMD specific ssb_state lock: CPU0 CPU1 lock(&st->lock); local_irq_disable(); lock(&(&sighand->siglock)->rlock); lock(&st->lock); <Interrupt> lock(&(&sighand->siglock)->rlock); *** DEADLOCK *** The connection between sighand->siglock and st->lock comes through seccomp, which takes st->lock while holding sighand->siglock. Make sure interrupts are disabled when __speculation_ctrl_update() is invoked via prctl() -> speculation_ctrl_update(). Add a lockdep assert to catch future offenders. Fixes: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1904141948200.4917@nanos.tec.linutronix.de
2019-04-14Merge branch 'qed-doorbell-overflow-recovery'David S. Miller
Denis Bolotin says: ==================== qed: Fix the Doorbell Overflow Recovery mechanism This patch series fixes and improves the doorbell recovery mechanism. The main goals of this series are to fix missing attentions from the doorbells block (DORQ) or not handling them properly, and execute the recovery from periodic handler instead of the attention handler. Please consider applying the series to net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14qed: Fix the DORQ's attentions handlingDenis Bolotin
Separate the overflow handling from the hardware interrupt status analysis. The interrupt status is a single register and is common for all PFs. The first PF reading the register is not necessarily the one who overflowed. All PFs must check their overflow status on every attention. In this change we clear the sticky indication in the attention handler to allow doorbells to be processed again as soon as possible, but running the doorbell recovery is scheduled for the periodic handler to reduce the time spent in the attention handler. Checking the need for DORQ flush was changed to "db_bar_no_edpm" because qed_edpm_enabled()'s result could change dynamically and might have prevented a needed flush. Signed-off-by: Denis Bolotin <dbolotin@marvell.com> Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14qed: Fix missing DORQ attentionsDenis Bolotin
When the DORQ (doorbell block) is overflowed, all PFs get attentions at the same time. If one PF finished handling the attention before another PF even started, the second PF might miss the DORQ's attention bit and not handle the attention at all. If the DORQ attention is missed and the issue is not resolved, another attention will not be sent, therefore each attention is treated as a potential DORQ attention. As a result, the attention callback is called more frequently so the debug print was moved to reduce its quantity. The number of periodic doorbell recovery handler schedules was reduced because it was the previous way to mitigating the missed attention issue. Signed-off-by: Denis Bolotin <dbolotin@marvell.com> Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14qed: Fix the doorbell address sanity checkDenis Bolotin
Fix the condition which verifies that doorbell address is inside the doorbell bar by checking that the end of the address is within range as well. Signed-off-by: Denis Bolotin <dbolotin@marvell.com> Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14qed: Delete redundant doorbell recovery typesDenis Bolotin
DB_REC_DRY_RUN (running doorbell recovery without sending doorbells) is never used. DB_REC_ONCE (send a single doorbell from the doorbell recovery) is not needed anymore because by running the periodic handler we make sure we check the overflow status later instead. This patch is needed because in the next patches, the only doorbell recovery type being used is DB_REC_REAL_DEAL, and the fixes are much cleaner without this enum. Signed-off-by: Denis Bolotin <dbolotin@marvell.com> Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14r8169: change irq handler to always trigger NAPI pollingHeiner Kallweit
This check isn't really needed and we can simplify the code and save some CPU cycles by removing it. Only in case of an error none of these bits are set, and calling the NAPI callback doesn't hurt in this case. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14Merge branch 'r8169-phy-func-ptr-arrays'David S. Miller
Heiner Kallweit says: ==================== r8169: create function pointer arrays for PHY and chip hw init functions Using function pointer arrays makes the code easier to read and better maintainable. AFAIK function pointer arrays cause some performance drawback due to Spectre mitigation, but we're not in a hot path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14r8169: create function pointer array for chip hw init functionsHeiner Kallweit
Using a function pointer array makes this easier to read and better maintainable. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14r8169: create function pointer array for PHY init functionsHeiner Kallweit
Using a function pointer array makes this easier to read and better maintainable. AFAIK function pointer arrays cause some performance drawback due to Spectre mitigation, but we're not in a hot path here. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14Merge branch 'hns3-next'David S. Miller
Huazhong Tan says: ==================== code optimizations & bugfixes for HNS3 driver This patch-set includes code optimizations and bugfixes for the HNS3 ethernet controller driver. [patch 1/12 - 4/12] optimizes the VLAN freature and adds support for port based VLAN, fixes some related bugs about the current implementation. [patch 5/12 - 12/12] includes some other code optimizations for the HNS3 ethernet controller driver. Change log: V1->V2: modifies some patches' commint log and code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: code optimization for command queue' spin lockPeng Li
This patch removes some redundant BH disable when initializing and uninitializing command queue. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: free the pending skb when clean RX ringPeng Li
If there is pending skb in RX flow when close the port, and the pending buffer is not cleaned, the new packet will be added to the pending skb when the port opens again, and the first new packet has error data. This patch cleans the pending skb when clean RX ring. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: do not initialize MDIO bus when PHY is inexistentJian Shen
For some cases, PHY may not be connected to MDIO bus, then the driver will initialize fail since MDIO bus initialization fails. This patch fixes it by skipping the MDIO bus initialization when PHY is inexistent. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: set dividual reset level for all RAS and MSI-X errorsWeihang Li
According to hardware description, reset level that should be triggered are not consistent in a module. For example, in SSU common errors, the first two bits has no need to do reset, but the other bits need global reset. This patch sets separate reset level for all RAS and MSI-X interrupts by adding a reset_lvel field in struct hclge_hw_error, and fixes some incorrect reset level. Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: divide shared buffer between TCYunsheng Lin
Currently hardware may have not enough buffer to receive packet when it has used more than two MPS(maximum packet size) of buffer, but there are still a lot of shared buffer left unused when TC num is small. This patch divides shared buffer to be used between TC when the port supports DCB, and adjusts the waterline and threshold according to user manual for the port that does not support DCB. This patch also change hclge_get_tc_num's return type to u32 to avoid signed-unsigned mix with divide. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: always assume no drop TC for performance reasonYunsheng Lin
Currently RX shared buffer' threshold size for speific TC is set to smaller value when the TC's PFC is not enabled, which may cause performance problem because hardware may not have enough hardware buffer when PFC is not enabled. This patch sets the same threshold size for all TC no matter if the specific TC's PFC is enabled. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: add hns3_gro_complete for HW GRO processYunsheng Lin
When a GRO packet is received by driver, the cwr field in the struct tcphdr needs to be checked to decide whether to set the SKB_GSO_TCP_ECN for skb_shinfo(skb)->gso_type. So this patch adds hns3_gro_complete to do that, and adds the hns3_handle_bdinfo to handle the hns3_gro_complete and hns3_rx_checksum. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: minor refactor for hns3_rx_checksumYunsheng Lin
Change the parameters of hns3_rx_checksum to be more specific to what is used internally, rather than passing in a pointer to the whole hns3_desc. Reduces duplicate code and bring this function inline with the approach used in hns3_set_gro_param. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: fix set port based VLAN issue for VFJian Shen
In original codes, ndo_set_vf_vlan() in hns3 driver was implemented wrong. It adds or removes VLAN into VLAN filter for VF, but VF is unaware of it. This patch fixes it. When VF loads up, it firstly queries the port based VLAN state from PF. When user change port based VLAN state from PF, PF firstly checks whether the VF is alive. If the VF is alive, then PF notifies the VF the modification; otherwise PF configure the port based VLAN state directly. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: fix set port based VLAN for PFJian Shen
In original codes, ndo_set_vf_vlan() in hns3 driver was implemented wrong. It adds or removes VLAN into VLAN filter for VF, but VF is unaware of it. Indeed, ndo_set_vf_vlan() is expected to enable or disable port based VLAN (hardware inserts a specified VLAN tag to all TX packets for a specified VF) . When enable port based VLAN, we use port based VLAN id as VLAN filter entry. When disable port based VLAN, we use VLAN id of VLAN device. This patch fixes it for PF, enable/disable port based VLAN when calls ndo_set_vf_vlan(). Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: fix VLAN offload handle for VLAN inserted by portJian Shen
Currently, in TX direction, driver implements the TX VLAN offload by checking the VLAN header in skb, and filling it into TX descriptor. Usually it works well, but if enable inserting VLAN header based on port, it may conflict when out_tag field of TX descriptor is already used, and cause RAS error. In RX direction, hardware supports stripping max two VLAN headers. For vlan_tci in skb can only store one VLAN tag, when RX VLAN offload enabled, driver tells hardware to strip one VLAN header from RX packet; when RX VLAN offload disabled, driver tells hardware not to strip VLAN header from RX packet. Now if port based insert VLAN enabled, all RX packets will have the port based VLAN header. This header is useless for stack, driver needs to ask hardware to strip it. Unfortunately, hardware can't drop this VLAN header, and always fill it into RX descriptor, so driver has to identify and drop it. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: hns3: modify VLAN initialization to be compatible with port based VLANJian Shen
Our hardware supports inserting a specified VLAN header for each function when sending packets. User can enable it with command "ip link set <devname> vf <vfid> vlan <vlan id>". For this VLAN header is inserted by hardware, not from stack, hardware also needs to strip it from received packets before sending to stack. In this case, driver needs to tell hardware which VLAN to insert or strip. The current VLAN initialization doesn't allow inserting VLAN header by hardware, this patch modifies it, in order be compatible with VLAN inserted base on port. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14ipv4: ensure rcu_read_lock() in ipv4_link_failure()Eric Dumazet
fib_compute_spec_dst() needs to be called under rcu protection. syzbot reported : WARNING: suspicious RCU usage 5.1.0-rc4+ #165 Not tainted include/linux/inetdevice.h:220 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by swapper/0/0: #0: 0000000051b67925 ((&n->timer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:170 [inline] #0: 0000000051b67925 ((&n->timer)){+.-.}, at: call_timer_fn+0xda/0x720 kernel/time/timer.c:1315 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.1.0-rc4+ #165 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5162 __in_dev_get_rcu include/linux/inetdevice.h:220 [inline] fib_compute_spec_dst+0xbbd/0x1030 net/ipv4/fib_frontend.c:294 spec_dst_fill net/ipv4/ip_options.c:245 [inline] __ip_options_compile+0x15a7/0x1a10 net/ipv4/ip_options.c:343 ipv4_link_failure+0x172/0x400 net/ipv4/route.c:1195 dst_link_failure include/net/dst.h:427 [inline] arp_error_report+0xd1/0x1c0 net/ipv4/arp.c:297 neigh_invalidate+0x24b/0x570 net/core/neighbour.c:995 neigh_timer_handler+0xc35/0xf30 net/core/neighbour.c:1081 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325 expire_timers kernel/time/timer.c:1362 [inline] __run_timers kernel/time/timer.c:1681 [inline] __run_timers kernel/time/timer.c:1649 [inline] run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694 __do_softirq+0x266/0x95a kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807 Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14Merge branch 'net-phy-shrink-PHY-settings-array-and-add-200Gbps-support'David S. Miller
Heiner Kallweit says: ==================== net: phy: shrink PHY settings array and add 200Gbps support The definition of array settings[] is quite lengthy meanwhile. Add a macro to shrink the definition. When doing this I saw that the new 200Gbps modes and few 100Gbps/50Gbps modes aren't supported in phylib yet. So add this. To avoid ethtool and phylib mode definitions getting out of sync, add a build bug to check for this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14phy: warn if phylib and ethtool PHY mode definitions are out of syncHeiner Kallweit
If new PHY modes are added people may miss to update all relevant places in the kernel. Therefore add a build bug check for new modes in enum ethtool_link_mode_bit_indices that haven't been added to phylib yet. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: phy: add support for new modes in phylibHeiner Kallweit
Recently new modes have been added to ethtool.h, but the related extension to phylib hasn't been done yet. So add support for these modes. v2: - add missing 100Gbps and 50Gbps modes Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14net: phy: shrink PHY settings arrayHeiner Kallweit
The definition of array settings[] is quite lengthy meanwhile. Add a macro to shrink the definition. v2: - Fix an indentation issue Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14fs: prevent page refcount overflow in pipe_buf_getMatthew Wilcox
Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14mm: prevent get_user_pages() from overflowing page refcountLinus Torvalds
If the page refcount wraps around past zero, it will be freed while there are still four billion references to it. One of the possible avenues for an attacker to try to make this happen is by doing direct IO on a page multiple times. This patch makes get_user_pages() refuse to take a new page reference if there are already more than two billion references to the page. Reported-by: Jann Horn <jannh@google.com> Acked-by: Matthew Wilcox <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14mm: add 'try_get_page()' helper functionLinus Torvalds
This is the same as the traditional 'get_page()' function, but instead of unconditionally incrementing the reference count of the page, it only does so if the count was "safe". It returns whether the reference count was incremented (and is marked __must_check, since the caller obviously has to be aware of it). Also like 'get_page()', you can't use this function unless you already had a reference to the page. The intent is that you can use this exactly like get_page(), but in situations where you want to limit the maximum reference count. The code currently does an unconditional WARN_ON_ONCE() if we ever hit the reference count issues (either zero or negative), as a notification that the conditional non-increment actually happened. NOTE! The count access for the "safety" check is inherently racy, but that doesn't matter since the buffer we use is basically half the range of the reference count (ie we look at the sign of the count). Acked-by: Matthew Wilcox <willy@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14mm: make page ref count overflow check tighter and more explicitLinus Torvalds
We have a VM_BUG_ON() to check that the page reference count doesn't underflow (or get close to overflow) by checking the sign of the count. That's all fine, but we actually want to allow people to use a "get page ref unless it's already very high" helper function, and we want that one to use the sign of the page ref (without triggering this VM_BUG_ON). Change the VM_BUG_ON to only check for small underflows (or _very_ close to overflowing), and ignore overflows which have strayed into negative territory. Acked-by: Matthew Wilcox <willy@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>