summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2019-11-12sched/Kconfig: Fix spelling mistake in user-visible help textSrivatsa S. Bhat (VMware)
Fix a spelling mistake in the help text for PREEMPT_RT. Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/157204450499.10518.4542293884417101528.stgit@srivatsa-ubuntu
2019-11-12time: Fix spelling mistake in commentMukesh Ojha
witin => within Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1571124819-9639-1-git-send-email-mojha@codeaurora.org
2019-11-12time: Optimize ns_to_timespec64()Arnd Bergmann
ns_to_timespec64() calls div_s64_rem(), which is a rather slow function on 32-bit architectures, as it cannot take advantage of the do_div() optimizations for constant arguments. Open-code the div_s64_rem() function in ns_to_timespec64(), so a constant divider can be passed into the optimized div_u64_rem() function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191108203435.112759-3-arnd@arndb.de
2019-11-12ntp/y2038: Remove incorrect time_t truncationArnd Bergmann
A cast to 'time_t' was accidentally left in place during the conversion of __do_adjtimex() to 64-bit timestamps, so the resulting value is incorrectly truncated. Remove the cast so the 64-bit time gets propagated correctly. Fixes: ead25417f82e ("timex: use __kernel_timex internally") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20191108203435.112759-2-arnd@arndb.de
2019-11-11cpuidle: Use nanoseconds as the unit of timeRafael J. Wysocki
Currently, the cpuidle subsystem uses microseconds as the unit of time which (among other things) causes the idle loop to incur some integer division overhead for no clear benefit. In order to allow cpuidle to measure time in nanoseconds, add two new fields, exit_latency_ns and target_residency_ns, to represent the exit latency and target residency of an idle state in nanoseconds, respectively, to struct cpuidle_state and initialize them with the help of the corresponding values in microseconds provided by drivers. Additionally, change cpuidle_governor_latency_req() to return the idle state exit latency constraint in nanoseconds. Also meeasure idle state residency (last_residency_ns in struct cpuidle_device and time_ns in struct cpuidle_driver) in nanoseconds and update the cpuidle core and governors accordingly. However, the menu governor still computes typical intervals in microseconds to avoid integer overflows. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net>
2019-11-11Merge branch 'for-5.4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "There's an inadvertent preemption point in ptrace_stop() which was reliably triggering for a test scenario significantly slowing it down. This contains Oleg's fix to remove the unwanted preemption point" * 'for-5.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop()
2019-11-11kheaders: explain why include/config/autoconf.h is excluded from md5sumMasahiro Yamada
This comment block explains why include/generated/compile.h is omitted, but nothing about include/generated/autoconf.h, which might be more difficult to understand. Add more comments. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-11kheaders: remove the last bashism to allow sh to run itMasahiro Yamada
'pushd' ... 'popd' is the last bash-specific code in this script. One way to avoid it is to run the code in a sub-shell. With that addressed, you can run this script with sh. I replaced $(BASH) with $(CONFIG_SHELL), and I changed the hashbang to #!/bin/sh. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-11kheaders: optimize header copy for in-tree buildsMasahiro Yamada
This script copies headers by the cpio command twice; first from srctree, and then from objtree. However, when we building in-tree, we know the srctree and the objtree are the same. That is, all the headers copied by the first cpio are overwritten by the second one. Skip the first cpio when we are building in-tree. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-11kheaders: optimize md5sum calculation for in-tree buildsMasahiro Yamada
This script computes md5sum of headers in srctree and in objtree. However, when we are building in-tree, we know the srctree and the objtree are the same. That is, we end up with the same computation twice. In fact, the first two lines of kernel/kheaders.md5 are always the same for in-tree builds. Unify the two md5sum calculations. For in-tree builds ($building_out_of_srctree is empty), we check only two directories, "include", and "arch/$SRCARCH/include". For out-of-tree builds ($building_out_of_srctree is 1), we check 4 directories, "$srctree/include", "$srctree/arch/$SRCARCH/include", "include", and "arch/$SRCARCH/include" since we know they are all different. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-11kheaders: remove unneeded 'cat' command piped to 'head' / 'tail'Masahiro Yamada
The 'head' and 'tail' commands can take a file path directly. So, you do not need to run 'cat'. cat kernel/kheaders.md5 | head -1 ... is equivalent to: head -1 kernel/kheaders.md5 and the latter saves forking one process. While I was here, I replaced 'head -1' with 'head -n 1'. I also replaced '==' with '=' since we do not have a good reason to use the bashism. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-11dma-debug: increase HASH_SIZEEric Dumazet
With modern NIC, it is not unusual having about ~256,000 active dma mappings and a hash size of 1024 buckets is too small. Forcing full cache line per bucket does not seem useful, especially now that we have contention on free_entries_lock for allocations and freeing of entries. Better use the space to fit more buckets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-11-11dma-debug: reorder struct dma_debug_entry fieldsEric Dumazet
Move all fields used during exact match lookups to the first cache line. This makes debug_dma_mapping_error() and friends about 50% faster. Since it removes two 32bit holes, force a cacheline alignment on struct dma_debug_entry. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-11-11dma-mapping: merge the generic remapping helpers into dma-directChristoph Hellwig
Integrate the generic dma remapping implementation into the main flow. This prepares for architectures like xtensa that use an uncached segment for pages in the kernel mapping, but can also remap highmem from CMA. To simplify that implementation we now always deduct the page from the physical address via the DMA address instead of the virtual address. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
2019-11-11dma-direct: provide mmap and get_sgtable method overridesChristoph Hellwig
For dma-direct we know that the DMA address is an encoding of the physical address that we can trivially decode. Use that fact to provide implementations that do not need the arch_dma_coherent_to_pfn architecture hook. Note that we still can only support mmap of non-coherent memory only if the architecture provides a way to set an uncached bit in the page tables. This must be true for architectures that use the generic remap helpers, but other architectures can also manually select it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
2019-11-11stacktrace: Get rid of unneeded '!!' patternJiri Slaby
My commit b0c51f158455 ("stacktrace: Don't skip first entry on noncurrent tasks") adds one or zero to skipnr by "!!(current == tsk)". But the C99 standard says: The == (equal to) and != (not equal to) operators are ... Each of the operators yields 1 if the specified relation is true and 0 if it is false. So there is no need to prepend the above expression by "!!" -- remove it. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191111092647.27419-1-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11irq_work: Slightly simplify IRQ_WORK_PENDING clearingFrederic Weisbecker
Instead of fetching the value of flags and perform an xchg() to clear a bit, just use atomic_fetch_andnot() that is more suitable to do that job in one operation while keeping the full ordering. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191108160858.31665-4-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11irq_work: Fix irq_work_claim() memory orderingFrederic Weisbecker
When irq_work_claim() finds IRQ_WORK_PENDING flag already set, we just return and don't raise a new IPI. We expect the destination to see and handle our latest updades thanks to the pairing atomic_xchg() in irq_work_run_list(). But cmpxchg() doesn't guarantee a full memory barrier upon failure. So it's possible that the destination misses our latest updates. So use atomic_fetch_or() instead that is unconditionally fully ordered and also performs exactly what we want here and simplify the code. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191108160858.31665-3-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11irq_work: Convert flags to atomic_tFrederic Weisbecker
We need to convert flags to atomic_t in order to later fix an ordering issue on atomic_cmpxchg() failure. This will allow us to use atomic_fetch_or(). Also clarify the nature of those flags. [ mingo: Converted two more usage site the original patch missed. ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191108160858.31665-2-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11sched/core: Further clarify sched_class::set_next_task()Peter Zijlstra
It turns out there really is something special to the first set_next_task() invocation. In specific the 'change' pattern really should not cause balance callbacks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Fixes: f95d4eaee6d0 ("sched/{rt,deadline}: Fix set_next_task vs pick_next_task") Link: https://lkml.kernel.org/r/20191108131909.775434698@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11sched/fair: Use mul_u32_u32()Peter Zijlstra
While reading the code I encountered another site where we should be using mul_u32_u32() because GCC just won't take a hint. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20191108131909.717931380@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11sched/core: Simplify sched_class::pick_next_task()Peter Zijlstra
Now that the indirect class call never uses the last two arguments of pick_next_task(), remove them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20191108131909.660595546@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11sched/core: Optimize pick_next_task()Peter Zijlstra
Ever since we moved the sched_class definitions into their own files, the constant expression {fair,idle}_sched_class.pick_next_task() is not in fact a compile time constant anymore and results in an indirect call (barring LTO). Fix that by exposing pick_next_task_{fair,idle}() directly, this gets rid of the indirect call (and RETPOLINE) on the fast path. Also remove the unlikely() from the idle case, it is in fact /the/ way we select idle -- and that is a very common thing to do. Performance for will-it-scale/sched_yield improves by 2% (as reported by 0-day). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20191108131909.603037345@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11sched/core: Make pick_next_task_idle() more consistentPeter Zijlstra
Only pick_next_task_fair() needs the @prev and @rf argument; these are required to implement the cpu-cgroup optimization. None of the other pick_next_task() methods need this. Make pick_next_task_idle() more consistent. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20191108131909.545730862@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11sched/fair: Better document newidle_balance()Peter Zijlstra
Whilst chasing the pick_next_task() race, there was some confusion about the newidle_balance() return values. Document them. [ mingo: Minor edits. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: ktkhai@virtuozzo.com Cc: mgorman@suse.de Cc: qais.yousef@arm.com Cc: qperret@google.com Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20191108131909.488364308@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11Merge tag 'v5.4-rc7' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11Merge tag 'v5.4-rc7' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-10Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A small set of fixes for timekeepoing and clocksource drivers: - VDSO data was updated conditional on the availability of a VDSO capable clocksource. This causes the VDSO functions which do not depend on a VDSO capable clocksource to operate on stale data. Always update unconditionally. - Prevent a double free in the mediatek driver - Use the proper helper in the sh_mtu2 driver so it won't attempt to initialize non-existing interrupts" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping/vsyscall: Update VDSO data unconditionally clocksource/drivers/sh_mtu2: Do not loop using platform_get_irq_by_name() clocksource/drivers/mediatek: Fix error handling
2019-11-10Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "Two fixes for scheduler regressions: - Plug a subtle race condition which was introduced with the rework of the next task selection functionality. The change of task properties became unprotected which can be observed inconsistently causing state corruption. - A trivial compile fix for CONFIG_CGROUPS=n" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix pick_next_task() vs 'change' pattern race sched/core: Fix compilation error when cgroup not selected
2019-11-10Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixlet from Thomas Gleixner: "A trivial fix for a kernel doc regression where an argument change was not reflected in the documentation" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irq/irqdomain: Update __irq_domain_alloc_fwnode() function documentation
2019-11-10Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull stacktrace fix from Thomas Gleixner: "A small fix for a stacktrace regression. Saving a stacktrace for a foreign task skipped an extra entry which makes e.g. the output of /proc/$PID/stack incomplete" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stacktrace: Don't skip first entry on noncurrent tasks
2019-11-10audit_get_nd(): don't unlock parent too earlyAl Viro
if the child has been negative and just went positive under us, we want coherent d_is_positive() and ->d_inode. Don't unlock the parent until we'd done that work... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-11-10cgroup: don't put ERR_PTR() into fc->rootAl Viro
the caller of ->get_tree() expects NULL left there on error... Reported-by: Thibaut Sautereau <thibaut@sautereau.fr> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-11-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
One conflict in the BPF samples Makefile, some fixes in 'net' whilst we were converting over to Makefile.target rules in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) BPF sample build fixes from Björn Töpel 2) Fix powerpc bpf tail call implementation, from Eric Dumazet. 3) DCCP leaks jiffies on the wire, fix also from Eric Dumazet. 4) Fix crash in ebtables when using dnat target, from Florian Westphal. 5) Fix port disable handling whne removing bcm_sf2 driver, from Florian Fainelli. 6) Fix kTLS sk_msg trim on fallback to copy mode, from Jakub Kicinski. 7) Various KCSAN fixes all over the networking, from Eric Dumazet. 8) Memory leaks in mlx5 driver, from Alex Vesker. 9) SMC interface refcounting fix, from Ursula Braun. 10) TSO descriptor handling fixes in stmmac driver, from Jose Abreu. 11) Add a TX lock to synchonize the kTLS TX path properly with crypto operations. From Jakub Kicinski. 12) Sock refcount during shutdown fix in vsock/virtio code, from Stefano Garzarella. 13) Infinite loop in Intel ice driver, from Colin Ian King. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits) ixgbe: need_wakeup flag might not be set for Tx i40e: need_wakeup flag might not be set for Tx igb/igc: use ktime accessors for skb->tstamp i40e: Fix for ethtool -m issue on X722 NIC iavf: initialize ITRN registers with correct values ice: fix potential infinite loop because loop counter being too small qede: fix NULL pointer deref in __qede_remove() net: fix data-race in neigh_event_send() vsock/virtio: fix sock refcnt holding during the shutdown net: ethernet: octeon_mgmt: Account for second possible VLAN header mac80211: fix station inactive_time shortly after boot net/fq_impl: Switch to kvmalloc() for memory allocation mac80211: fix ieee80211_txq_setup_flows() failure path ipv4: Fix table id reference in fib_sync_down_addr ipv6: fixes rt6_probe() and fib6_nh->last_probe init net: hns: Fix the stray netpoll locks causing deadlock in NAPI path net: usb: qmi_wwan: add support for DW5821e with eSIM support CDC-NCM: handle incomplete transfer of MTU nfc: netlink: fix double device reference drop NFC: st21nfca: fix double free ...
2019-11-08sched: Fix pick_next_task() vs 'change' pattern racePeter Zijlstra
Commit 67692435c411 ("sched: Rework pick_next_task() slow-path") inadvertly introduced a race because it changed a previously unexplored dependency between dropping the rq->lock and sched_class::put_prev_task(). The comments about dropping rq->lock, in for example newidle_balance(), only mentions the task being current and ->on_cpu being set. But when we look at the 'change' pattern (in for example sched_setnuma()): queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */ running = task_current(rq, p); /* rq->curr == p */ if (queued) dequeue_task(...); if (running) put_prev_task(...); /* change task properties */ if (queued) enqueue_task(...); if (running) set_next_task(...); It becomes obvious that if we do this after put_prev_task() has already been called on @p, things go sideways. This is exactly what the commit in question allows to happen when it does: prev->sched_class->put_prev_task(rq, prev, rf); if (!rq->nr_running) newidle_balance(rq, rf); The newidle_balance() call will drop rq->lock after we've called put_prev_task() and that allows the above 'change' pattern to interleave and mess up the state. Furthermore, it turns out we lost the RT-pull when we put the last DL task. Fix both problems by extracting the balancing from put_prev_task() and doing a multi-class balance() pass before put_prev_task(). Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path") Reported-by: Quentin Perret <qperret@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Quentin Perret <qperret@google.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com>
2019-11-08sched/core: Fix compilation error when cgroup not selectedQais Yousef
When cgroup is disabled the following compilation error was hit kernel/sched/core.c: In function ‘uclamp_update_active_tasks’: kernel/sched/core.c:1081:23: error: storage size of ‘it’ isn’t known struct css_task_iter it; ^~ kernel/sched/core.c:1084:2: error: implicit declaration of function ‘css_task_iter_start’; did you mean ‘__sg_page_iter_start’? [-Werror=implicit-function-declaration] css_task_iter_start(css, 0, &it); ^~~~~~~~~~~~~~~~~~~ __sg_page_iter_start kernel/sched/core.c:1085:14: error: implicit declaration of function ‘css_task_iter_next’; did you mean ‘__sg_page_iter_next’? [-Werror=implicit-function-declaration] while ((p = css_task_iter_next(&it))) { ^~~~~~~~~~~~~~~~~~ __sg_page_iter_next kernel/sched/core.c:1091:2: error: implicit declaration of function ‘css_task_iter_end’; did you mean ‘get_task_cred’? [-Werror=implicit-function-declaration] css_task_iter_end(&it); ^~~~~~~~~~~~~~~~~ get_task_cred kernel/sched/core.c:1081:23: warning: unused variable ‘it’ [-Wunused-variable] struct css_task_iter it; ^~ cc1: some warnings being treated as errors make[2]: *** [kernel/sched/core.o] Error 1 Fix by protetion uclamp_update_active_tasks() with CONFIG_UCLAMP_TASK_GROUP Fixes: babbe170e053 ("sched/uclamp: Update CPU's refcount on TG's clamp changes") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Patrick Bellasi <patrick.bellasi@matbug.net> Cc: Mel Gorman <mgorman@suse.de> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Ben Segall <bsegall@google.com> Link: https://lkml.kernel.org/r/20191105112212.596-1-qais.yousef@arm.com
2019-11-08Merge branches 'for-next/elf-hwcap-docs', 'for-next/smccc-conduit-cleanup', ↵Catalin Marinas
'for-next/zone-dma', 'for-next/relax-icc_pmr_el1-sync', 'for-next/double-page-fault', 'for-next/misc', 'for-next/kselftest-arm64-signal' and 'for-next/kaslr-diagnostics' into for-next/core * for-next/elf-hwcap-docs: : Update the arm64 ELF HWCAP documentation docs/arm64: cpu-feature-registers: Rewrite bitfields that don't follow [e, s] docs/arm64: cpu-feature-registers: Documents missing visible fields docs/arm64: elf_hwcaps: Document HWCAP_SB docs/arm64: elf_hwcaps: sort the HWCAP{, 2} documentation by ascending value * for-next/smccc-conduit-cleanup: : SMC calling convention conduit clean-up firmware: arm_sdei: use common SMCCC_CONDUIT_* firmware/psci: use common SMCCC_CONDUIT_* arm: spectre-v2: use arm_smccc_1_1_get_conduit() arm64: errata: use arm_smccc_1_1_get_conduit() arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit() * for-next/zone-dma: : Reintroduction of ZONE_DMA for Raspberry Pi 4 support arm64: mm: reserve CMA and crashkernel in ZONE_DMA32 dma/direct: turn ARCH_ZONE_DMA_BITS into a variable arm64: Make arm64_dma32_phys_limit static arm64: mm: Fix unused variable warning in zone_sizes_init mm: refresh ZONE_DMA and ZONE_DMA32 comments in 'enum zone_type' arm64: use both ZONE_DMA and ZONE_DMA32 arm64: rename variables used to calculate ZONE_DMA32's size arm64: mm: use arm64_dma_phys_limit instead of calling max_zone_dma_phys() * for-next/relax-icc_pmr_el1-sync: : Relax ICC_PMR_EL1 (GICv3) accesses when ICC_CTLR_EL1.PMHE is clear arm64: Document ICC_CTLR_EL3.PMHE setting requirements arm64: Relax ICC_PMR_EL1 accesses when ICC_CTLR_EL1.PMHE is clear * for-next/double-page-fault: : Avoid a double page fault in __copy_from_user_inatomic() if hw does not support auto Access Flag mm: fix double page fault on arm64 if PTE_AF is cleared x86/mm: implement arch_faults_on_old_pte() stub on x86 arm64: mm: implement arch_faults_on_old_pte() on arm64 arm64: cpufeature: introduce helper cpu_has_hw_af() * for-next/misc: : Various fixes and clean-ups arm64: kpti: Add NVIDIA's Carmel core to the KPTI whitelist arm64: mm: Remove MAX_USER_VA_BITS definition arm64: mm: simplify the page end calculation in __create_pgd_mapping() arm64: print additional fault message when executing non-exec memory arm64: psci: Reduce the waiting time for cpu_psci_cpu_kill() arm64: pgtable: Correct typo in comment arm64: docs: cpu-feature-registers: Document ID_AA64PFR1_EL1 arm64: cpufeature: Fix typos in comment arm64/mm: Poison initmem while freeing with free_reserved_area() arm64: use generic free_initrd_mem() arm64: simplify syscall wrapper ifdeffery * for-next/kselftest-arm64-signal: : arm64-specific kselftest support with signal-related test-cases kselftest: arm64: fake_sigreturn_misaligned_sp kselftest: arm64: fake_sigreturn_bad_size kselftest: arm64: fake_sigreturn_duplicated_fpsimd kselftest: arm64: fake_sigreturn_missing_fpsimd kselftest: arm64: fake_sigreturn_bad_size_for_magic0 kselftest: arm64: fake_sigreturn_bad_magic kselftest: arm64: add helper get_current_context kselftest: arm64: extend test_init functionalities kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht] kselftest: arm64: mangle_pstate_invalid_daif_bits kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils kselftest: arm64: extend toplevel skeleton Makefile * for-next/kaslr-diagnostics: : Provide diagnostics on boot for KASLR arm64: kaslr: Check command line before looking for a seed arm64: kaslr: Announce KASLR status on boot
2019-11-08ftrace: Separate out functionality from ftrace_location_range()Steven Rostedt (VMware)
Create a new function called lookup_rec() from the functionality of ftrace_location_range(). The difference between lookup_rec() is that it returns the record that it finds, where as ftrace_location_range() returns only if it found a match or not. The lookup_rec() is static, and can be used for new functionality where ftrace needs to find a record of a specific address. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-08ftrace: Separate out the copying of a ftrace_hash from __ftrace_hash_move()Steven Rostedt (VMware)
Most of the functionality of __ftrace_hash_move() can be reused, but not all of it. That is, __ftrace_hash_move() is used to simply make a new hash from an existing one, using the same size as the original. Creating a dup_hash(), where we can specify a new size will be useful when we want to create a hash with a default size, or simply copy the old one. Signed-off-by: Steven Rostedt (VMWare) <rostedt@goodmis.org>
2019-11-07bpf: Add array support to btf_struct_accessMartin KaFai Lau
This patch adds array support to btf_struct_access(). It supports array of int, array of struct and multidimensional array. It also allows using u8[] as a scratch space. For example, it allows access the "char cb[48]" with size larger than the array's element "char". Another potential use case is "u64 icsk_ca_priv[]" in the tcp congestion control. btf_resolve_size() is added to resolve the size of any type. It will follow the modifier if there is any. Please see the function comment for details. This patch also adds the "off < moff" check at the beginning of the for loop. It is to reject cases when "off" is pointing to a "hole" in a struct. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191107180903.4097702-1-kafai@fb.com
2019-11-07dma-direct: remove the dma_handle argument to __dma_direct_alloc_pagesChristoph Hellwig
The argument isn't used anywhere, so stop passing it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
2019-11-07dma-direct: remove __dma_direct_free_pagesChristoph Hellwig
We can just call dma_free_contiguous directly instead of wrapping it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
2019-11-07cgroup: freezer: don't change task and cgroups status unnecessarilyHonglei Wang
It's not necessary to adjust the task state and revisit the state of source and destination cgroups if the cgroups are not in freeze state and the task itself is not frozen. And in this scenario, it wakes up the task who's not supposed to be ready to run. Don't do the unnecessary task state adjustment can help stop waking up the task without a reason. Signed-off-by: Honglei Wang <honglei.wang@oracle.com> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-11-07cpufreq: Initialize the governors in core_initcallAmit Kucheria
Initialize the cpufreq governors earlier to allow for earlier performance control during the boot process. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/b98eae9b44eb2f034d7f5d12a161f5f831be1eb7.1571656015.git.amit.kucheria@linaro.org
2019-11-06bpf: Account for insn->off when doing bpf_probe_read_kernelMartin KaFai Lau
In the bpf interpreter mode, bpf_probe_read_kernel is used to read from PTR_TO_BTF_ID's kernel object. It currently missed considering the insn->off. This patch fixes it. Fixes: 2a02759ef5f8 ("bpf: Add support for BTF pointers to interpreter") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191107014640.384083-1-kafai@fb.com
2019-11-07bpf, offload: Unlock on error in bpf_offload_dev_create()Dan Carpenter
We need to drop the bpf_devs_lock on error before returning. Fixes: 9fd7c5559165 ("bpf: offload: aggregate offloads per-device") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Link: https://lore.kernel.org/bpf/20191104091536.GB31509@mwanda
2019-11-06hrtimer: Annotate lockless access to timer->stateEric Dumazet
syzbot reported various data-race caused by hrtimer_is_queued() reading timer->state. A READ_ONCE() is required there to silence the warning. Also add the corresponding WRITE_ONCE() when timer->state is set. In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid loading timer->state twice. KCSAN reported these cases: BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0: __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991 __run_hrtimer kernel/time/hrtimer.c:1496 [inline] __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593 __do_softirq+0x115/0x33f kernel/softirq.c:292 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1: tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline] tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225 tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044 tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558 tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717 tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561 sk_backlog_rcv include/net/sock.h:945 [inline] __release_sock+0x135/0x1e0 net/core/sock.c:2435 release_sock+0x61/0x160 net/core/sock.c:2951 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0x9f/0xc0 net/socket.c:657 BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0: __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991 __run_hrtimer kernel/time/hrtimer.c:1496 [inline] __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1: __tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265 tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline] tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561 sk_backlog_rcv include/net/sock.h:945 [inline] __release_sock+0x135/0x1e0 net/core/sock.c:2435 release_sock+0x61/0x160 net/core/sock.c:2951 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0x9f/0xc0 net/socket.c:657 __sys_sendto+0x21f/0x320 net/socket.c:1952 __do_sys_sendto net/socket.c:1964 [inline] __se_sys_sendto net/socket.c:1960 [inline] __x64_sys_sendto+0x89/0xb0 net/socket.c:1960 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ tglx: Added comments ] Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191106174804.74723-1-edumazet@google.com
2019-11-06cgroup: use cgroup->last_bstat instead of cgroup->bstat_pending for consistencyTejun Heo
cgroup->bstat_pending is used to determine the base stat delta to propagate to the parent. While correct, this is different from how percpu delta is determined for no good reason and the inconsistency makes the code more difficult to understand. This patch makes parent propagation delta calculation use the same method as percpu to global propagation. * cgroup_base_stat_accumulate() is renamed to cgroup_base_stat_add() and cgroup_base_stat_sub() is added. * percpu propagation calculation is updated to use the above helpers. * cgroup->bstat_pending is replaced with cgroup->last_bstat and updated to use the same calculation as percpu propagation. Signed-off-by: Tejun Heo <tj@kernel.org>
2019-11-06module/ftrace: handle patchable-function-entryMark Rutland
When using patchable-function-entry, the compiler will record the callsites into a section named "__patchable_function_entries" rather than "__mcount_loc". Let's abstract this difference behind a new FTRACE_CALLSITE_SECTION, so that architectures don't have to handle this explicitly (e.g. with custom module linker scripts). As parisc currently handles this explicitly, it is fixed up accordingly, with its custom linker script removed. Since FTRACE_CALLSITE_SECTION is only defined when DYNAMIC_FTRACE is selected, the parisc module loading code is updated to only use the definition in that case. When DYNAMIC_FTRACE is not selected, modules shouldn't have this section, so this removes some redundant work in that case. To make sure that this is keep up-to-date for modules and the main kernel, a comment is added to vmlinux.lds.h, with the existing ifdeffery simplified for legibility. I built parisc generic-{32,64}bit_defconfig with DYNAMIC_FTRACE enabled, and verified that the section made it into the .ko files for modules. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Helge Deller <deller@gmx.de> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Torsten Duwe <duwe@suse.de> Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Tested-by: Sven Schnelle <svens@stackframe.org> Tested-by: Torsten Duwe <duwe@suse.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: linux-parisc@vger.kernel.org