summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-04-30Merge tag 'for-5.7/dm-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Document DM integrity allow_discard feature that was added during 5.7 merge window. - Fix potential for DM writecache data corruption during DM table reloads. - Fix DM verity's FEC support's hash block number calculation in verity_fec_decode(). - Fix bio-based DM multipath crash due to use of stale copy of MPATHF_QUEUE_IO flag state in __map_bio(). * tag 'for-5.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm multipath: use updated MPATHF_QUEUE_IO on mapping for bio-based mpath dm verity fec: fix hash block number in verity_fec_decode dm writecache: fix data corruption when reloading the target dm integrity: document allow_discard option
2020-04-30Merge tag 'selinux-pr-20200430' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux fixes from Paul Moore: "Two more SELinux patches to fix problems in the v5.7-rcX releases. Wei Yongjun's patch fixes a return code in an error path, and my patch fixes a problem where we were not correctly applying access controls to all of the netlink messages in the netlink_send LSM hook" * tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: properly handle multiple messages in selinux_netlink_send() selinux: fix error return code in cond_read_list()
2020-04-30Merge tag 'linux-kselftest-kunit-5.7-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kunit fix from Shuah Khan: "A single fix to flush the test summary to the console log without delay" * tag 'linux-kselftest-kunit-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Add missing newline in summary message
2020-04-30Merge tag 'linux-kselftest-5.7-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest updates from Shuah Khan: - ftrace test fixes to check for required filter files and kprobe args. - Kselftest build/cross-build dependency check script to make it easier for test ring admins/users to configure build systems correctly for build/cross-build kselftests. Currently checks library dependencies. - Checks if Kselftests can be built/cross-built on a system running compile test on a trivial C file with LDLIBS specified for each individual test in their Makefiles. - Prints suggested target list for a system filtering out tests failed the build dependency check from the TARGETS in Selftests the main Makefile when optional -p is specified. - Prints pass/fail dependency check for each tests/sub-test. - Prints pass/fail targets and libraries. - Default: runs dependency checks on all tests. - Optional test name can be specified to check dependencies for it. * tag 'linux-kselftest-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/ftrace: Check the first record for kprobe_args_type.tc selftests: add build/cross-build dependency check script selftests/ftrace: Check required filter files before running test
2020-04-30ibmvnic: Skip fatal error reset after passive initJuliet Kim
During MTU change, the following events may happen. Client-driven CRQ initialization fails due to partner’s CRQ closed, causing client to enqueue a reset task for FATAL_ERROR. Then passive (server-driven) CRQ initialization succeeds, causing client to release CRQ and enqueue a reset task for failover. If the passive CRQ initialization occurs before the FATAL reset task is processed, the FATAL error reset task would try to access a CRQ message queue that was freed, causing an oops. The problem may be most likely to occur during DLPAR add vNIC with a non-default MTU, because the DLPAR process will automatically issue a change MTU request. Fix this by not processing fatal error reset if CRQ is passively initialized after client-driven CRQ initialization fails. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30selinux: properly handle multiple messages in selinux_netlink_send()Paul Moore
Fix the SELinux netlink_send hook to properly handle multiple netlink messages in a single sk_buff; each message is parsed and subject to SELinux access control. Prior to this patch, SELinux only inspected the first message in the sk_buff. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-30Merge tag 'mlx5-fixes-2020-04-29' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2020-04-29 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. v2: - Dropped the ktls patch, Tariq has to check if it is fixable in the stack For -stable v4.12 ('net/mlx5: Fix forced completion access non initialized command entry') ('net/mlx5: Fix command entry leak in Internal Error State') For -stable v5.4 ('net/mlx5: DR, On creation set CQ's arm_db member to right value') For -stable v5.6 ('net/mlx5e: Fix q counters on uplink representors') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: fix uninitialized value accessPaolo Abeni
tcp_v{4,6}_syn_recv_sock() set 'own_req' only when returning a not NULL 'child', let's check 'own_req' only if child is available to avoid an - unharmful - UBSAN splat. v1 -> v2: - reference the correct hash Fixes: 4c8941de781c ("mptcp: avoid flipping mp_capable field in syn_recv_sock()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30Merge branch 'mptcp-fix-incoming-options-parsing'David S. Miller
Paolo Abeni says: ==================== mptcp: fix incoming options parsing This series addresses a serious issue in MPTCP option parsing. This is bigger than the usual -net change, but I was unable to find a working, sane, smaller fix. The core change is inside patch 2/5 which moved MPTCP options parsing from the TCP code inside existing MPTCP hooks and clean MPTCP options status on each processed packet. The patch 1/5 is a needed pre-requisite, and patches 3,4,5 are smaller, related fixes. v1 -> v2: - cleaned-up patch 1/5 - rebased on top of current -net ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: initialize the data_fin field for mpc packetsPaolo Abeni
When parsing MPC+data packets we set the dss field, so we must also initialize the data_fin, or we can find stray value there. Fixes: 9a19371bf029 ("mptcp: fix data_fin handing in RX path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: fix 'use_ack' option access.Paolo Abeni
The mentioned RX option field is initialized only for DSS packet, we must access it only if 'dss' is set too, or the subflow will end-up in a bad status, leading to RFC violations. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: avoid a WARN on bad input.Paolo Abeni
Syzcaller has found a way to trigger the WARN_ON_ONCE condition in check_fully_established(). The root cause is a legit fallback to TCP scenario, so replace the WARN with a plain message on a more strict condition. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: move option parsing into mptcp_incoming_options()Paolo Abeni
The mptcp_options_received structure carries several per packet flags (mp_capable, mp_join, etc.). Such fields must be cleared on each packet, even on dropped ones or packet not carrying any MPTCP options, but the current mptcp code clears them only on TCP option reset. On several races/corner cases we end-up with stray bits in incoming options, leading to WARN_ON splats. e.g.: [ 171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713 [ 171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531) [ 171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata [ 171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95 [ 171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531) [ 171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c [ 171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282 [ 171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e [ 171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955 [ 171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca [ 171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020 [ 171.228460] FS: 00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000 [ 171.230065] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0 [ 171.232586] Call Trace: [ 171.233109] <IRQ> [ 171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691) [ 171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832) [ 171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1)) [ 171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217) [ 171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822) [ 171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730) [ 171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009) [ 171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1)) [ 171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232) [ 171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252) [ 171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539) [ 171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135) [ 171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640) [ 171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293) [ 171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083) [ 171.282358] </IRQ> We could address the issue clearing explicitly the relevant fields in several places - tcp_parse_option, tcp_fast_parse_options, possibly others. Instead we move the MPTCP option parsing into the already existing mptcp ingress hook, so that we need to clear the fields in a single place. This allows us dropping an MPTCP hook from the TCP code and removing the quite large mptcp_options_received from the tcp_sock struct. On the flip side, the MPTCP sockets will traverse the option space twice (in tcp_parse_option() and in mptcp_incoming_options(). That looks acceptable: we already do that for syn and 3rd ack packets, plain TCP socket will benefit from it, and even MPTCP sockets will experience better code locality, reducing the jumps between TCP and MPTCP code. v1 -> v2: - rebased on current '-net' tree Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30mptcp: consolidate synack processing.Paolo Abeni
Currently the MPTCP code uses 2 hooks to process syn-ack packets, mptcp_rcv_synsent() and the sk_rx_dst_set() callback. We can drop the first, moving the relevant code into the latter, reducing the hooking into the TCP code. This is also needed by the next patch. v1 -> v2: - use local tcp sock ptr instead of casting the sk variable several times - DaveM Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30NFS: Fix a race in __nfs_list_for_each_server()Trond Myklebust
The struct nfs_server gets put on the cl_superblocks list before the server->super field has been initialised, in which case the call to nfs_sb_active() will Oops. Add a check to ensure that we skip such a list entry. Fixes: 3c9e502b59fb ("NFS: Add a helper nfs_client_for_each_server()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-30sched/core: Simplify sched_init()Wei Yang
Currently root_task_group.shares and cfs_bandwidth are initialized for each online cpu, which not necessary. Let's take it out to do it only once. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200423214443.29994-1-richard.weiyang@gmail.com
2020-04-30sched/swait: Reword some of the main descriptionDavidlohr Bueso
With both the increased use of swait and kvm no longer using it, we can reword some of the comments. While removing Linus' valid rant, I've also cared to explicitly mention that swait is very different than regular wait. In addition it is mentioned against using swait in favor of the regular flavor. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200422040739.18601-6-dave@stgolabs.net
2020-04-30sched/fair: Use __this_cpu_read() in wake_wide()Muchun Song
The code is executed with preemption(and interrupts) disabled, so it's safe to use __this_cpu_write(). Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421144123.33580-1-songmuchun@bytedance.com
2020-04-30sched/core: Fix illegal RCU from offline CPUsPeter Zijlstra
In the CPU-offline process, it calls mmdrop() after idle entry and the subsequent call to cpuhp_report_idle_dead(). Once execution passes the call to rcu_report_dead(), RCU is ignoring the CPU, which results in lockdep complaining when mmdrop() uses RCU from either memcg or debugobjects below. Fix it by cleaning up the active_mm state from BP instead. Every arch which has CONFIG_HOTPLUG_CPU should have already called idle_task_exit() from AP. The only exception is parisc because it switches them to &init_mm unconditionally (see smp_boot_one_cpu() and smp_cpu_init()), but the patch will still work there because it calls mmgrab(&init_mm) in smp_cpu_init() and then should call mmdrop(&init_mm) in finish_cpu(). WARNING: suspicious RCU usage ----------------------------- kernel/workqueue.c:710 RCU or wq_pool_mutex should be held! other info that might help us debug this: RCU used illegally from offline CPU! Call Trace: dump_stack+0xf4/0x164 (unreliable) lockdep_rcu_suspicious+0x140/0x164 get_work_pool+0x110/0x150 __queue_work+0x1bc/0xca0 queue_work_on+0x114/0x120 css_release+0x9c/0xc0 percpu_ref_put_many+0x204/0x230 free_pcp_prepare+0x264/0x570 free_unref_page+0x38/0xf0 __mmdrop+0x21c/0x2c0 idle_task_exit+0x170/0x1b0 pnv_smp_cpu_kill_self+0x38/0x2e0 cpu_die+0x48/0x64 arch_cpu_idle_dead+0x30/0x50 do_idle+0x2f4/0x470 cpu_startup_entry+0x38/0x40 start_secondary+0x7a8/0xa80 start_secondary_resume+0x10/0x14 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lkml.kernel.org/r/20200401214033.8448-1-cai@lca.pw
2020-04-30sched/fair: Mark sched_init_granularity __initMuchun Song
Function sched_init_granularity() is only called from __init functions, so mark it __init as well. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20200406074750.56533-1-songmuchun@bytedance.com
2020-04-30sched/fair: Refill bandwidth before scalingHuaixin Chang
In order to prevent possible hardlockup of sched_cfs_period_timer() loop, loop count is introduced to denote whether to scale quota and period or not. However, scale is done between forwarding period timer and refilling cfs bandwidth runtime, which means that period timer is forwarded with old "period" while runtime is refilled with scaled "quota". Move do_sched_cfs_period_timer() before scaling to solve this. Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup") Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/20200420024421.22442-3-changhuaixin@linux.alibaba.com
2020-04-30sched: Extract the task putting code from pick_next_task()Chen Yu
Introduce a new function put_prev_task_balance() to do the balance when necessary, and then put previous task back to the run queue. This function is extracted from pick_next_task() to prepare for future usage by other type of task picking logic. No functional change. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/5a99860cf66293db58a397d6248bcb2eee326776.1587464698.git.yu.c.chen@intel.com
2020-04-30sched: Make newidle_balance() static againChen Yu
After Commit 6e2df0581f56 ("sched: Fix pick_next_task() vs 'change' pattern race"), there is no need to expose newidle_balance() as it is only used within fair.c file. Change this function back to static again. No functional change. Reported-by: kbuild test robot <lkp@intel.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/83cd3030b031ca5d646cd5e225be10e7a0fdd8f5.1587464698.git.yu.c.chen@intel.com
2020-04-30sched/topology: Kill SD_LOAD_BALANCEValentin Schneider
That flag is set unconditionally in sd_init(), and no one checks for it anymore. Remove it. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200415210512.805-5-valentin.schneider@arm.com
2020-04-30sched: Remove checks against SD_LOAD_BALANCEValentin Schneider
The SD_LOAD_BALANCE flag is set unconditionally for all domains in sd_init(). By making the sched_domain->flags syctl interface read-only, we have removed the last piece of code that could clear that flag - as such, it will now be always present. Rather than to keep carrying it along, we can work towards getting rid of it entirely. cpusets don't need it because they can make CPUs be attached to the NULL domain (e.g. cpuset with sched_load_balance=0), or to a partitioned root_domain, i.e. a sched_domain hierarchy that doesn't span the entire system (e.g. root cpuset with sched_load_balance=0 and sibling cpusets with sched_load_balance=1). isolcpus apply the same "trick": isolated CPUs are explicitly taken out of the sched_domain rebuild (using housekeeping_cpumask()), so they get the NULL domain treatment as well. Remove the checks against SD_LOAD_BALANCE. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200415210512.805-4-valentin.schneider@arm.com
2020-04-30sched/debug: Make sd->flags sysctl read-onlyValentin Schneider
Writing to the sysctl of a sched_domain->flags directly updates the value of the field, and goes nowhere near update_top_cache_domain(). This means that the cached domain pointers can end up containing stale data (e.g. the domain pointed to doesn't have the relevant flag set anymore). Explicit domain walks that check for flags will be affected by the write, but this won't be in sync with the cached pointers which will still point to the domains that were cached at the last sched_domain build. In other words, writing to this interface is playing a dangerous game. It could be made to trigger an update of the cached sched_domain pointers when written to, but this does not seem to be worth the trouble. Make it read-only. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200415210512.805-3-valentin.schneider@arm.com
2020-04-30sched/fair: find_idlest_group(): Remove unused sd_flag parameterValentin Schneider
The last use of that parameter was removed by commit 57abff067a08 ("sched/fair: Rework find_idlest_group()") Get rid of the parameter. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20200415210512.805-2-valentin.schneider@arm.com
2020-04-30exit: Move preemption fixup up, move blocking operations downJann Horn
With CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_CGROUPS=y, kernel oopses in non-preemptible context look untidy; after the main oops, the kernel prints a "sleeping function called from invalid context" report because exit_signals() -> cgroup_threadgroup_change_begin() -> percpu_down_read() can sleep, and that happens before the preempt_count_set(PREEMPT_ENABLED) fixup. It looks like the same thing applies to profile_task_exit() and kcov_task_exit(). Fix it by moving the preemption fixup up and the calls to profile_task_exit() and kcov_task_exit() down. Fixes: 1dc0fffc48af ("sched/core: Robustify preemption leak checks") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200305220657.46800-1-jannh@google.com
2020-04-30sched/fair: Simplify the code of should_we_balance()Peng Wang
We only consider group_balance_cpu() after there is no idle cpu. So, just do comparison before return at these two cases. Signed-off-by: Peng Wang <rocking@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/245c792f0e580b3ca342ad61257f4c066ee0f84f.1586594833.git.rocking@linux.alibaba.com
2020-04-30sched/fair: Remove distribute_running from CFS bandwidthJosh Don
This is mostly a revert of commit: baa9be4ffb55 ("sched/fair: Fix throttle_list starvation with low CFS quota") The primary use of distribute_running was to determine whether to add throttled entities to the head or the tail of the throttled list. Now that we always add to the tail, we can remove this field. The other use of distribute_running is in the slack_timer, so that we don't start a distribution while one is already running. However, even in the event that this race occurs, it is fine to have two distributions running (especially now that distribute grabs the cfs_b->lock to determine remaining quota before assigning). Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Tested-by: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/20200410225208.109717-3-joshdon@google.com
2020-04-30sched/fair: Eliminate bandwidth race between throttling and distributionPaul Turner
There is a race window in which an entity begins throttling before quota is added to the pool, but does not finish throttling until after we have finished with distribute_cfs_runtime(). This entity is not observed by distribute_cfs_runtime() because it was not on the throttled list at the time that distribution was running. This race manifests as rare period-length statlls for such entities. Rather than heavy-weight the synchronization with the progress of distribution, we can fix this by aborting throttling if bandwidth has become available. Otherwise, we immediately add the entity to the throttled list so that it can be observed by a subsequent distribution. Additionally, we can remove the case of adding the throttled entity to the head of the throttled list, and simply always add to the tail. Thanks to 26a8b12747c97, distribute_cfs_runtime() no longer holds onto its own pool of runtime. This means that if we do hit the !assign and distribute_running case, we know that distribution is about to end. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Ben Segall <bsegall@google.com> Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/20200410225208.109717-2-joshdon@google.com
2020-04-30sched/debug: Fix trival print_task() formatXie XiuQi
Ensure leave one space between state and task name. w/o patch: runnable tasks: S task PID tree-key switches prio wait Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200414125721.195801-1-xiexiuqi@huawei.com
2020-04-30x86/mm/cpa: Flush direct map alias during cpaRick Edgecombe
As an optimization, cpa_flush() was changed to optionally only flush the range in @cpa if it was small enough. However, this range does not include any direct map aliases changed in cpa_process_alias(). So small set_memory_() calls that touch that alias don't get the direct map changes flushed. This situation can happen when the virtual address taking variants are passed an address in vmalloc or modules space. In these cases, force a full TLB flush. Note this issue does not extend to cases where the set_memory_() calls are passed a direct map address, or page array, etc, as the primary target. In those cases the direct map would be flushed. Fixes: 935f5839827e ("x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation") Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200424105343.GA20730@hirez.programming.kicks-ass.net
2020-04-30Merge tag 'mmc-v5.7-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - meson-mx-sdio: Fix support for HW busy detection - sdhci-msm: Fix support for HW busy detection - cqhci: Fix polling loop by converting to readx_poll_timeout() - sdhci-xenon: Fix annoying 1.8V regulator warning - sdhci-pci: Fix eMMC driver strength for BYT-based controllers * tag 'mmc-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-pci: Fix eMMC driver strength for BYT-based controllers mmc: sdhci-xenon: fix annoying 1.8V regulator warning mmc: sdhci-msm: Enable host capabilities pertains to R1b response mmc: cqhci: Avoid false "cqhci: CQE stuck on" by not open-coding timeout loop mmc: meson-mx-sdio: remove the broken ->card_busy() op mmc: meson-mx-sdio: Set MMC_CAP_WAIT_WHILE_BUSY mmc: core: make mmc_interrupt_hpi() static
2020-04-30arm64: vdso: Add -fasynchronous-unwind-tables to cflagsVincenzo Frascino
On arm64 linux gcc uses -fasynchronous-unwind-tables -funwind-tables by default since gcc-8, so now the de facto platform ABI is to allow unwinding from async signal handlers. However on bare metal targets (aarch64-none-elf), and on old gcc, async and sync unwind tables are not enabled by default to avoid runtime memory costs. This means if linux is built with a baremetal toolchain the vdso.so may not have unwind tables which breaks the gcc platform ABI guarantee in userspace. Add -fasynchronous-unwind-tables explicitly to the vgettimeofday.o cflags to address the ABI change. Fixes: 28b1a824a4f4 ("arm64: vdso: Substitute gettimeofday() with C implementation") Cc: Will Deacon <will@kernel.org> Reported-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-04-30block: remove the bd_openers checks in blk_drop_partitionsChristoph Hellwig
When replacing the bd_super check with a bd_openers I followed a logical conclusion, which turns out to be utterly wrong. When a block device has bd_super sets it has a mount file system on it (although not every mounted file system sets bd_super), but that also implies it doesn't even have partitions to start with. So instead of trying to come up with a logical check for all openers, just remove the check entirely. Fixes: d3ef5536274f ("block: fix busy device checking in blk_drop_partitions") Fixes: cb6b771b05c3 ("block: fix busy device checking in blk_drop_partitions again") Reported-by: Michal Koutný <mkoutny@suse.com> Reported-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-30net/mlx5e: Fix q counters on uplink representorsRoi Dayan
Need to allocate the q counters before init_rx which needs them when creating the rq. Fixes: 8520fa57a4e9 ("net/mlx5e: Create q counters on uplink representors") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30net/mlx5: Fix command entry leak in Internal Error StateMoshe Shemesh
Processing commands by cmd_work_handler() while already in Internal Error State will result in entry leak, since the handler process force completion without doorbell. Forced completion doesn't release the entry and event completion will never arrive, so entry should be released. Fixes: 73dd3a4839c1 ("net/mlx5: Avoid using pending command interface slots") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30net/mlx5: Fix forced completion access non initialized command entryMoshe Shemesh
mlx5_cmd_flush() will trigger forced completions to all valid command entries. Triggered by an asynch event such as fast teardown it can happen at any stage of the command, including command initialization. It will trigger forced completion and that can lead to completion on an uninitialized command entry. Setting MLX5_CMD_ENT_STATE_PENDING_COMP only after command entry is initialized will ensure force completion is treated only if command entry is initialized. Fixes: 73dd3a4839c1 ("net/mlx5: Avoid using pending command interface slots") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30net/mlx5: DR, On creation set CQ's arm_db member to right valueErez Shitrit
In polling mode, set arm_db member to a value that will avoid CQ event recovery by the HW. Otherwise we might get event without completion function. In addition,empty completion function to was added to protect from unexpected events. Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30net/mlx5: E-switch, Fix mutex init orderParav Pandit
In cited patch mutex is initialized after its used. Below call trace is observed. Fix the order to initialize the mutex early enough. Similarly follow mirror sequence during cleanup. kernel: DEBUG_LOCKS_WARN_ON(lock->magic != lock) kernel: WARNING: CPU: 5 PID: 45916 at kernel/locking/mutex.c:938 __mutex_lock+0x7d6/0x8a0 kernel: Call Trace: kernel: ? esw_vport_tbl_get+0x3b/0x250 [mlx5_core] kernel: ? mark_held_locks+0x55/0x70 kernel: ? __slab_free+0x274/0x400 kernel: ? lockdep_hardirqs_on+0x140/0x1d0 kernel: esw_vport_tbl_get+0x3b/0x250 [mlx5_core] kernel: ? mlx5_esw_chains_create_fdb_prio+0xa57/0xc20 [mlx5_core] kernel: mlx5_esw_vport_tbl_get+0x88/0xf0 [mlx5_core] kernel: mlx5_esw_chains_create+0x2f3/0x3e0 [mlx5_core] kernel: esw_create_offloads_fdb_tables+0x11d/0x580 [mlx5_core] kernel: esw_offloads_enable+0x26d/0x540 [mlx5_core] kernel: mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core] kernel: mlx5_devlink_eswitch_mode_set+0x1af/0x320 [mlx5_core] kernel: devlink_nl_cmd_eswitch_set_doit+0x41/0xb0 Fixes: 96e326878fa5 ("net/mlx5e: Eswitch, Use per vport tables for mirroring") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30net/mlx5: E-switch, Fix printing wrong error valueParav Pandit
When mlx5_modify_header_alloc() fails, instead of printing the error value returned, current error log prints 0. Fix by printing correct error value returned by mlx5_modify_header_alloc(). Fixes: 6724e66b90ee ("net/mlx5: E-Switch, Get reg_c1 value on miss") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30net/mlx5: E-switch, Fix error unwinding flow for steering init failureParav Pandit
Error unwinding is done incorrectly in the cited commit. When steering init fails, there is no need to perform steering cleanup. When vport error exists, error cleanup should be mirror of the setup routine, i.e. to perform steering cleanup before metadata cleanup. This avoids the call trace in accessing uninitialized objects which are skipped during steering_init() due to failure in steering_init(). Call trace: mlx5_cmd_modify_header_alloc:805:(pid 21128): too many modify header actions 1, max supported 0 E-Switch: Failed to create restore mod header BUG: kernel NULL pointer dereference, address: 00000000000000d0 [ 677.263079] mlx5_destroy_flow_group+0x13/0x80 [mlx5_core] [ 677.268921] esw_offloads_steering_cleanup+0x51/0xf0 [mlx5_core] [ 677.275281] esw_offloads_enable+0x1a5/0x800 [mlx5_core] [ 677.280949] mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core] [ 677.287227] mlx5_devlink_eswitch_mode_set+0x1af/0x320 [ 677.293741] devlink_nl_cmd_eswitch_set_doit+0x41/0xb0 [ 677.299217] genl_rcv_msg+0x1eb/0x430 Fixes: 7983a675ba65 ("net/mlx5: E-Switch, Enable chains only if regs loopback is enabled") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30Merge branch 'nvme-5.7' of git://git.infradead.org/nvme into block-5.7Jens Axboe
Pull NVMe fix from Christoph. * 'nvme-5.7' of git://git.infradead.org/nvme: nvme: prevent double free in nvme_alloc_ns() error handling
2020-04-30fibmap: Warn and return an error in case of block > INT_MAXRitesh Harjani
We better warn the fibmap user and not return a truncated and therefore an incorrect block map address if the bmap() returned block address is greater than INT_MAX (since user supplied integer pointer). It's better to pr_warn() all user of ioctl_fibmap() and return a proper error code rather than silently letting a FS corruption happen if the user tries to fiddle around with the returned block map address. We fix this by returning an error code of -ERANGE and returning 0 as the block mapping address in case if it is > INT_MAX. Now iomap_bmap() could be called from either of these two paths. Either when a user is calling an ioctl_fibmap() interface to get the block mapping address or by some filesystem via use of bmap() internal kernel API. bmap() kernel API is well equipped with handling of u64 addresses. WARN condition in iomap_bmap_actor() was mainly added to warn all the fibmap users. But now that we have directly added this warning for all fibmap users and also made sure to return 0 as block map address in case if addr > INT_MAX. So we can now remove this logic from iomap_bmap_actor(). Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-30bpf: Fix error return code in map_lookup_and_delete_elem()Wei Yongjun
Fix to return negative error code -EFAULT from the copy_to_user() error handling case instead of 0, as done elsewhere in this function. Fixes: bd513cd08f10 ("bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200430081851.166996-1-weiyongjun1@huawei.com
2020-04-30dma-buf: fix documentation build warningsRandy Dunlap
Fix documentation warnings in dma-buf.[hc]: ../drivers/dma-buf/dma-buf.c:678: warning: Function parameter or member 'importer_ops' not described in 'dma_buf_dynamic_attach' ../drivers/dma-buf/dma-buf.c:678: warning: Function parameter or member 'importer_priv' not described in 'dma_buf_dynamic_attach' ../include/linux/dma-buf.h:339: warning: Incorrect use of kernel-doc format: * @move_notify Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/7bcbe6fe-0b4b-87da-d003-b68a26eb4cf0@infradead.org
2020-04-30i2c: aspeed: Avoid i2c interrupt status clear race condition.ryan_chen
In AST2600 there have a slow peripheral bus between CPU and i2c controller. Therefore GIC i2c interrupt status clear have delay timing, when CPU issue write clear i2c controller interrupt status. To avoid this issue, the driver need have read after write clear at i2c ISR. Fixes: f327c686d3ba ("i2c: aspeed: added driver for Aspeed I2C") Signed-off-by: ryan_chen <ryan_chen@aspeedtech.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [wsa: added Fixes tag] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-04-30i2c: amd-mp2-pci: Fix Oops in amd_mp2_pci_init() error handlingDan Carpenter
The problem is that we dereference "privdata->pci_dev" when we print the error messages in amd_mp2_pci_init(): dev_err(ndev_dev(privdata), "Failed to enable MP2 PCI device\n"); ^^^^^^^^^^^^^^^^^ Fixes: 529766e0a011 ("i2c: Add drivers for the AMD PCIe MP2 I2C controller") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2020-04-30Merge tag 'phy-for-5.7-rc' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy into char-misc-linus phy: for 5.7 -rc *) Update MAINTAINER to include Vinod Koul as co-maintainer of PHY *) Fix Kconfig dependencies in seen with PHY_TEGRA_XUSB *) Re-add "qcom,sdm845-qusb2-phy" compatible in phy-qcom-qusb2.c to make it work with existing dtbs *) Move clock enable from ->poweron() to ->init() in Qualcomm usb-hs-28nm driver to initialize HW in ->init() Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> * tag 'phy-for-5.7-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: phy: qualcomm: usb-hs-28nm: Prepare clocks in init MAINTAINERS: Add Vinod Koul as Generic PHY co-maintainer phy: qcom-qusb2: Re add "qcom,sdm845-qusb2-phy" compat string phy: tegra: Select USB_COMMON for usb_get_maximum_speed()