summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2018-05-15rcu: Drop early GP request check from rcu_gp_kthread()Paul E. McKenney
Now that grace-period requests use funnel locking and now that they set ->gp_flags to RCU_GP_FLAG_INIT even when the RCU grace-period kthread has not yet started, rcu_gp_kthread() no longer needs to check need_any_future_gp() at startup time. This commit therefore removes this check. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Simplify and inline cpu_needs_another_gp()Paul E. McKenney
Now that RCU no longer relies on failsafe checks, cpu_needs_another_gp() can be greatly simplified. This simplification eliminates the last call to rcu_future_needs_gp() and to rcu_segcblist_future_gp_needed(), both of which which can then be eliminated. And then, because cpu_needs_another_gp() is called only from __rcu_pending(), it can be inlined and eliminated. This commit carries out the simplification, inlining, and elimination called out above. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp()Paul E. McKenney
All of the cpu_needs_another_gp() function's checks (except for newly arrived callbacks) have been subsumed into the rcu_gp_cleanup() function's scan of the rcu_node tree. This commit therefore drops the call to cpu_needs_another_gp(). The check for newly arrived callbacks is supplied by rcu_accelerate_cbs(). Any needed advancing (as in the earlier rcu_advance_cbs() call) will be supplied when the corresponding CPU becomes aware of the end of the now-completed grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Make rcu_start_this_gp() check for out-of-range requestsPaul E. McKenney
If rcu_start_this_gp() is invoked with a requested grace period more than three in the future, then either the ->need_future_gp[] array needs to be bigger or the caller needs to be repaired. This commit therefore adds a WARN_ON_ONCE() checking for this condition. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Add funnel locking to rcu_start_this_gp()Paul E. McKenney
The rcu_start_this_gp() function had a simple form of funnel locking that used only the leaves and root of the rcu_node tree, which is fine for systems with only a few hundred CPUs, but sub-optimal for systems having thousands of CPUs. This commit therefore adds full-tree funnel locking. This variant of funnel locking is unusual in the following ways: 1. The leaf-level rcu_node structure's ->lock is held throughout. Other funnel-locking implementations drop the leaf-level lock before progressing to the next level of the tree. 2. Funnel locking can be started at the root, which is convenient for code that already holds the root rcu_node structure's ->lock. Other funnel-locking implementations start at the leaves. 3. If an rcu_node structure other than the initial one believes that a grace period is in progress, it is not necessary to go further up the tree. This is because grace-period cleanup scans the full tree, so that marking the need for a subsequent grace period anywhere in the tree suffices -- but only if a grace period is currently in progress. 4. It is possible that the RCU grace-period kthread has not yet started, and this case must be handled appropriately. However, the general approach of using a tree to control lock contention is still in place. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Make rcu_start_future_gp() caller select grace periodPaul E. McKenney
The rcu_accelerate_cbs() function selects a grace-period target, which it uses to have rcu_segcblist_accelerate() assign numbers to recently queued callbacks. Then it invokes rcu_start_future_gp(), which selects a grace-period target again, which is a bit pointless. This commit therefore changes rcu_start_future_gp() to take the grace-period target as a parameter, thus avoiding double selection. This commit also changes the name of rcu_start_future_gp() to rcu_start_this_gp() to reflect this change in functionality, and also makes a similar change to the name of trace_rcu_future_gp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp()Paul E. McKenney
The rcu_start_gp_advanced() is invoked only from rcu_start_future_gp() and much of its code is redundant when invoked from that context. This commit therefore inlines rcu_start_gp_advanced() into rcu_start_future_gp(), then removes rcu_start_gp_advanced(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Clear request other than RCU_GP_FLAG_INIT at GP endPaul E. McKenney
Once the grace period has ended, any RCU_GP_FLAG_FQS requests are irrelevant: The grace period has ended, so there is no longer any point in forcing quiescent states in order to try to make it end sooner. This commit therefore causes rcu_gp_cleanup() to clear any bits other than RCU_GP_FLAG_INIT from ->gp_flags at the end of the grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Cleanup, don't put ->completed into an intPaul E. McKenney
It is true that currently only the low-order two bits are used, so there should be no problem given modern machines and compilers, but good hygiene and maintainability dictates use of an unsigned long instead of an int. This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs()Paul E. McKenney
The __rcu_process_callbacks() function currently checks to see if the current CPU needs a grace period and also if there is any other reason to kick off a new grace period. This is one of the fail-safe checks that has been rendered unnecessary by the changes that increase the accuracy of rcu_gp_cleanup()'s estimate as to whether another grace period is required. Because this particular fail-safe involved acquiring the root rcu_node structure's ->lock, which has seen excessive contention in real life, this fail-safe needs to go. However, one check must remain, namely the check for newly arrived RCU callbacks that have not yet been associated with a grace period. One might hope that the checks in __note_gp_changes(), which is invoked indirectly from rcu_check_quiescent_state(), would suffice, but this function won't be invoked at all if RCU is idle. It is therefore necessary to replace the fail-safe checks with a simpler check for newly arrived callbacks during an RCU idle period, which is exactly what this commit does. This change removes the final call to rcu_start_gp(), so this function is removed as well. Note that lockless use of cpu_needs_another_gp() is racy, but that these races are harmless in this case. If RCU really is idle, the values will not change, so the return value from cpu_needs_another_gp() will be correct. If RCU is not idle, the resulting redundant call to rcu_accelerate_cbs() will be harmless, and might even have the benefit of reducing grace-period latency a bit. This commit also moves interrupt disabling into the "if" statement to improve real-time response a bit. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisitionPaul E. McKenney
When __call_rcu_core() notices excessive numbers of callbacks pending on the current CPU, we know that at least one of them is not yet classified, namely the one that was just now queued. Therefore, it is not necessary to invoke rcu_start_gp() and thus not necessary to acquire the root rcu_node structure's ->lock. This commit therefore replaces the rcu_start_gp() with rcu_accelerate_cbs(), thus replacing an acquisition of the root rcu_node structure's ->lock with that of this CPU's leaf rcu_node structure. This decreases contention on the root rcu_node structure's ->lock. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Make rcu_migrate_callbacks wake GP kthread when neededPaul E. McKenney
The rcu_migrate_callbacks() function invokes rcu_advance_cbs() twice, ignoring the return value. This is OK at pressent because of failsafe code that does the wakeup when needed. However, this failsafe code acquires the root rcu_node structure's lock frequently, while rcu_migrate_callbacks() does so only once per CPU-offline operation. This commit therefore makes rcu_migrate_callbacks() wake up the RCU GP kthread when either call to rcu_advance_cbs() returns true, thus removing need for the failsafe code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Convert ->need_future_gp[] array to booleanPaul E. McKenney
There is no longer any need for ->need_future_gp[] to count the number of requests for future grace periods, so this commit converts the additions to assignments to "true" and reduces the size of each element to one byte. While we are in the area, fix an obsolete comment. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Make rcu_future_needs_gp() check all ->need_future_gps[] elementsPaul E. McKenney
Currently, the rcu_future_needs_gp() function checks only the current element of the ->need_future_gps[] array, which might miss elements that were offset from the expected element, for example, due to races with the start or the end of a grace period. This commit therefore makes rcu_future_needs_gp() use the need_any_future_gp() macro to check all of the elements of this array. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Avoid losing ->need_future_gp[] values due to GP start/end racesPaul E. McKenney
The rcu_cbs_completed() function provides the value of ->completed at which new callbacks can safely be invoked. This is recorded in two-element ->need_future_gp[] arrays in the rcu_node structure, and the elements of these arrays corresponding to the just-completed grace period are zeroed at the end of that grace period. However, the rcu_cbs_completed() function can return the current ->completed value plus either one or two, so it is possible for the corresponding ->need_future_gp[] entry to be cleared just after it was set, thus losing a request for a future grace period. This commit avoids this race by expanding ->need_future_gp[] to four elements. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Make rcu_gp_cleanup() more accurately predict need for new GPPaul E. McKenney
Currently, rcu_gp_cleanup() scans the rcu_node tree in order to reset state to reflect the end of the grace period. It also checks to see whether a new grace period is needed, but in a number of cases, rather than directly cause the new grace period to be immediately started, it instead leaves the grace-period-needed state where various fail-safes can find it. This works fine, but results in higher contention on the root rcu_node structure's ->lock, which is undesirable, and contention on that lock has recently become noticeable. This commit therefore makes rcu_gp_cleanup() immediately start a new grace period if there is any need for one. It is quite possible that it will later be necessary to throttle the grace-period rate, but that can be dealt with when and if. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Make rcu_gp_kthread() check for early-boot activityPaul E. McKenney
The rcu_gp_kthread() function immediately sleeps waiting to be notified of the need for a new grace period, which currently works because there are a number of code sequences that will provide the needed wakeup later. However, some of these code sequences need to acquire the root rcu_node structure's ->lock, and contention on that lock has started manifesting. This commit therefore makes rcu_gp_kthread() check for early-boot activity when it starts up, omitting the initial sleep in that case. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Add accessor macros for the ->need_future_gp[] arrayPaul E. McKenney
Accessors for the ->need_future_gp[] array are currently open-coded, which makes them difficult to change. To improve maintainability, this commit adds need_future_gp_mask() to compute the indexing mask from the array size, need_future_gp_element() to access the element corresponding to the specified grace-period number, and need_any_future_gp() to determine if any future grace period is needed. This commit also applies need_future_gp_element() to existing open-coded single-element accesses. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Make rcu_start_future_gp()'s grace-period check more precisePaul E. McKenney
The rcu_start_future_gp() function uses a sloppy check for a grace period being in progress, which works today because there are a number of code sequences that resolve the resulting races. However, some of these race-resolution code sequences must acquire the root rcu_node structure's ->lock, and contention on that lock has started manifesting. This commit therefore makes rcu_start_future_gp() check more precise, eliminating the sloppy lockless check of the rcu_state structure's ->gpnum and ->completed fields. The effect is that rcu_start_future_gp() will sometimes unnecessarily attempt to start a new grace period, but this overhead will be reduced later using funnel locking. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15rcu: Improve non-root rcu_cbs_completed() accuracyPaul E. McKenney
When rcu_cbs_completed() is invoked on a non-root rcu_node structure, it unconditionally assumes that two grace periods must complete before the callbacks at hand can be invoked. This is overly conservative because if that non-root rcu_node structure believes that no grace period is in progress, and if the corresponding rcu_state structure's ->gpnum field has not yet been incremented, then these callbacks may safely be invoked after only one grace period has completed. This change is required to permit grace-period start requests to use funnel locking, which is in turn permitted to reduce root rcu_node ->lock contention, which has been observed by Nick Piggin. Furthermore, such contention will likely be increased by the merging of RCU-bh, RCU-preempt, and RCU-sched, so it makes sense to take steps to decrease it. This commit therefore improves the accuracy of rcu_cbs_completed() when invoked on a non-root rcu_node structure as described above. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-04-15Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes and updates for x86: - Address a swiotlb regression which was caused by the recent DMA rework and made driver fail because dma_direct_supported() returned false - Fix a signedness bug in the APIC ID validation which caused invalid APIC IDs to be detected as valid thereby bloating the CPU possible space. - Fix inconsisten config dependcy/select magic for the MFD_CS5535 driver. - Fix a corruption of the physical address space bits when encryption has reduced the address space and late cpuinfo updates overwrite the reduced bit information with the original value. - Dominiks syscall rework which consolidates the architecture specific syscall functions so all syscalls can be wrapped with the same macros. This allows to switch x86/64 to struct pt_regs based syscalls. Extend the clearing of user space controlled registers in the entry patch to the lower registers" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Fix signedness bug in APIC ID validity checks x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption x86/olpc: Fix inconsistent MFD_CS5535 configuration swiotlb: Use dma_direct_supported() for swiotlb_ops syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*() syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention syscalls/core, syscalls/x86: Clean up syscall stub naming convention syscalls/x86: Extend register clearing on syscall entry to lower registers syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64 syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32 syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y x86/syscalls: Don't pointlessly reload the system call number x86/mm: Fix documentation of module mapping range with 4-level paging x86/cpuid: Switch to 'static const' specifier
2018-04-15Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "A few scheduler fixes: - Prevent a bogus warning vs. runqueue clock update flags in do_sched_rt_period_timer() - Simplify the helper functions which handle requests for skipping the runqueue clock updat. - Do not unlock the tunables mutex in the error path of the cpu frequency scheduler utils. Its not held. - Enforce proper alignement for 'struct util_est' in sched_avg to prevent a misalignment fault on IA64" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Force proper alignment of 'struct util_est' sched/core: Simplify helpers for rq clock update skip requests sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning sched/cpufreq/schedutil: Fix error path mutex unlock
2018-04-15Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Thomas Gleixner: "A rather large set of perf updates: Kernel: - Fix various initialization issues - Prevent creating [ku]probes for not CAP_SYS_ADMIN users Tooling: - Show only failing syscalls with 'perf trace --failure' (Arnaldo Carvalho de Melo) e.g: See what 'openat' syscalls are failing: # perf trace --failure -e openat 762.323 ( 0.007 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video2) = -1 ENOENT No such file or directory <SNIP N /dev/videoN open attempts... sigh, where is that improvised camera lid?!? > 790.228 ( 0.008 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video63) = -1 ENOENT No such file or directory ^C# - Show information about the event (freq, nr_samples, total period/nr_events) in the annotate --tui and --stdio2 'perf annotate' output, similar to the first line in the 'perf report --tui', but just for the samples for a the annotated symbol (Arnaldo Carvalho de Melo) - Introduce 'perf version --build-options' to show what features were linked, aliased as well as a shorter 'perf -vv' (Jin Yao) - Add a "dso_size" sort order (Kim Phillips) - Remove redundant ')' in the tracepoint output in 'perf trace' (Changbin Du) - Synchronize x86's cpufeatures.h, no effect on toolss (Arnaldo Carvalho de Melo) - Show group details on the title line in the annotate browser and 'perf annotate --stdio2' output, so that the per-event columns can have headers (Arnaldo Carvalho de Melo) - Fixup vertical line separating metrics from instructions and cleaning unused lines at the bottom, both in the annotate TUI browser (Arnaldo Carvalho de Melo) - Remove duplicated 'samples' in lost samples warning in 'perf report' (Arnaldo Carvalho de Melo) - Synchronize i915_drm.h, silencing the perf build process, automagically adding support for the new DRM_I915_QUERY ioctl (Arnaldo Carvalho de Melo) - Make auxtrace_queues__add_buffer() allocate struct buffer, from a patchkit already applied (Adrian Hunter) - Fix the --stdio2/TUI annotate output to include group details, be it for a recorded '{a,b,f}' explicit event group or when forcing group display using 'perf report --group' for a set of events not recorded as a group (Arnaldo Carvalho de Melo) - Fix display artifacts in the ui browser (base class for the annotate and main report/top TUI browser) related to the extra title lines work (Arnaldo Carvalho de Melo) - perf auxtrace refactorings, leftovers from a previously partially processed patchset (Adrian Hunter) - Fix the builtin clang build (Sandipan Das, Arnaldo Carvalho de Melo) - Synchronize i915_drm.h, silencing a perf build warning and in the process automagically adding support for a new ioctl command (Arnaldo Carvalho de Melo) - Fix a strncpy issue in uprobe tracing" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) perf/core: Need CAP_SYS_ADMIN to create k/uprobe with perf_event_open() tracing/uprobe_event: Fix strncpy corner case perf/core: Fix perf_uprobe_init() perf/core: Fix perf_kprobe_init() perf/core: Fix use-after-free in uprobe_perf_close() perf tests clang: Fix function name for clang IR test perf clang: Add support for recent clang versions perf tools: Fix perf builds with clang support perf tools: No need to include namespaces.h in util.h perf hists browser: Remove leftover from row returned from refresh perf hists browser: Show extra_title_lines in the 'D' debug hotkey perf auxtrace: Make auxtrace_queues__add_buffer() do CPU filtering tools headers uapi: Synchronize i915_drm.h perf report: Remove duplicated 'samples' in lost samples warning perf ui browser: Fixup cleaning unused lines at the bottom perf annotate browser: Fixup vertical line separating metrics from instructions perf annotate: Show group details on the title line perf auxtrace: Make auxtrace_queues__add_buffer() allocate struct buffer perf/x86/intel: Move regs->flags EXACT bit init perf trace: Remove redundant ')' ...
2018-04-15Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq affinity fixes from Thomas Gleixner: - Fix error path handling in the affinity spreading code - Make affinity spreading smarter to avoid issues on systems which claim to have hotpluggable CPUs while in fact they can't hotplug anything. So instead of trying to spread the vectors (and thereby the associated device queues) to all possibe CPUs, spread them on all present CPUs first. If there are left over vectors after that first step they are spread among the possible, but not present CPUs which keeps the code backwards compatible for virtual decives and NVME which allocate a queue per possible CPU, but makes the spreading smarter for devices which have less queues than possible or present CPUs. * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/affinity: Spread irq vectors among present CPUs as far as possible genirq/affinity: Allow irq spreading from a given starting point genirq/affinity: Move actual irq vector spreading into a helper function genirq/affinity: Rename *node_to_possible_cpumask as *node_to_cpumask genirq/affinity: Don't return with empty affinity masks on error
2018-04-13kernel/kexec_file.c: allow archs to set purgatory load addressPhilipp Rudo
For s390 new kernels are loaded to fixed addresses in memory before they are booted. With the current code this is a problem as it assumes the kernel will be loaded to an 'arbitrary' address. In particular, kexec_locate_mem_hole searches for a large enough memory region and sets the load address (kexec_bufer->mem) to it. Luckily there is a simple workaround for this problem. By returning 1 in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off. This allows the architecture to set kbuf->mem by hand. While the trick works fine for the kernel it does not for the purgatory as here the architectures don't have access to its kexec_buffer. Give architectures access to the purgatories kexec_buffer by changing kexec_load_purgatory to take a pointer to it. With this change architectures have access to the buffer and can edit it as they need. A nice side effect of this change is that we can get rid of the purgatory_info->purgatory_load_address field. As now the information stored there can directly be accessed from kbuf->mem. Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory loadPhilipp Rudo
The current code uses the sh_offset field in purgatory_info->sechdrs to store a pointer to the current load address of the section. Depending whether the section will be loaded or not this is either a pointer into purgatory_info->purgatory_buf or kexec_purgatory. This is not only a violation of the ELF standard but also makes the code very hard to understand as you cannot tell if the memory you are using is read-only or not. Remove this misuse and store the offset of the section in pugaroty_info->purgatory_buf in sh_offset. Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrsPhilipp Rudo
The main loop currently uses quite a lot of variables to update the section headers. Some of them are unnecessary. So clean them up a little. Link: http://lkml.kernel.org/r/20180321112751.22196-9-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrsPhilipp Rudo
To update the entry point there is an extra loop over all section headers although this can be done in the main loop. So move it there and eliminate the extra loop and variable to store the 'entry section index'. Also, in the main loop, move the usual case, i.e. non-bss section, out of the extra if-block. Link: http://lkml.kernel.org/r/20180321112751.22196-8-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: split up __kexec_load_puragoryPhilipp Rudo
When inspecting __kexec_load_purgatory you find that it has two tasks 1) setting up the kexec_buffer for the new kernel and, 2) setting up pi->sechdrs for the final load address. The two tasks are independent of each other. To improve readability split up __kexec_load_purgatory into two functions, one for each task, and call them directly from kexec_load_purgatory. Link: http://lkml.kernel.org/r/20180321112751.22196-7-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*Philipp Rudo
When the relocations are applied to the purgatory only the section the relocations are applied to is writable. The other sections, i.e. the symtab and .rel/.rela, are in read-only kexec_purgatory. Highlight this by marking the corresponding variables as 'const'. While at it also change the signatures of arch_kexec_apply_relocations* to take section pointers instead of just the index of the relocation section. This removes the second lookup and sanity check of the sections in arch code. Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: search symbols in read-only kexec_purgatoryPhilipp Rudo
The stripped purgatory does not contain a symtab. So when looking for symbols this is done in read-only kexec_purgatory. Highlight this by marking the corresponding variables as 'const'. Link: http://lkml.kernel.org/r/20180321112751.22196-5-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: make purgatory_info->ehdr constPhilipp Rudo
The kexec_purgatory buffer is read-only. Thus all pointers into kexec_purgatory are read-only, too. Point this out by explicitly marking purgatory_info->ehdr as 'const' and update the comments in purgatory_info. Link: http://lkml.kernel.org/r/20180321112751.22196-4-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: remove checks in kexec_purgatory_loadPhilipp Rudo
Before the purgatory is loaded several checks are done whether the ELF file in kexec_purgatory is valid or not. These checks are incomplete. For example they don't check for the total size of the sections defined in the section header table or if the entry point actually points into the purgatory. On the other hand the purgatory, although an ELF file on its own, is part of the kernel. Thus not trusting the purgatory means not trusting the kernel build itself. So remove all validity checks on the purgatory and just trust the kernel build. Link: http://lkml.kernel.org/r/20180321112751.22196-3-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kexec_file, x86: move re-factored code to generic sideAKASHI Takahiro
In the previous patches, commonly-used routines, exclude_mem_range() and prepare_elf64_headers(), were carved out. Now place them in kexec common code. A prefix "crash_" is given to each of their names to avoid possible name collisions. Link: http://lkml.kernel.org/r/20180306102303.9063-8-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kexec_file,x86,powerpc: factor out kexec_file_ops functionsAKASHI Takahiro
As arch_kexec_kernel_image_{probe,load}(), arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig() are almost duplicated among architectures, they can be commonalized with an architecture-defined kexec_file_ops array. So let's factor them out. Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kexec_file: make use of purgatory optionalAKASHI Takahiro
Patch series "kexec_file, x86, powerpc: refactoring for other architecutres", v2. This is a preparatory patchset for adding kexec_file support on arm64. It was originally included in a arm64 patch set[1], but Philipp is also working on their kexec_file support on s390[2] and some changes are now conflicting. So these common parts were extracted and put into a separate patch set for better integration. What's more, my original patch#4 was split into a few small chunks for easier review after Dave's comment. As such, the resulting code is basically identical with my original, and the only *visible* differences are: - renaming of _kexec_kernel_image_probe() and _kimage_file_post_load_cleanup() - change one of types of arguments at prepare_elf64_headers() Those, unfortunately, require a couple of trivial changes on the rest (#1, #6 to #13) of my arm64 kexec_file patch set[1]. Patch #1 allows making a use of purgatory optional, particularly useful for arm64. Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load, verify_sig}() and arch_kimage_file_post_load_cleanup() across architectures. Patches #3-#7 are also intended to generalize parse_elf64_headers(), along with exclude_mem_range(), to be made best re-use of. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html [2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html This patch (of 7): On arm64, crash dump kernel's usable memory is protected by *unmapping* it from kernel virtual space unlike other architectures where the region is just made read-only. It is highly unlikely that the region is accidentally corrupted and this observation rationalizes that digest check code can also be dropped from purgatory. The resulting code is so simple as it doesn't require a bit ugly re-linking/relocation stuff, i.e. arch_kexec_apply_relocations_add(). Please see: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html All that the purgatory does is to shuffle arguments and jump into a new kernel, while we still need to have some space for a hash value (purgatory_sha256_digest) which is never checked against. As such, it doesn't make sense to have trampline code between old kernel and new kernel on arm64. This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and allows related code to be compiled in only if necessary. [takahiro.akashi@linaro.org: fix trivial screwup] Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kexec: export PG_swapbacked to VMCOREINFOPetr Tesarik
Since commit 6326fec1122c ("mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked"), PG_swapcache is an alias for PG_owner_priv_1, which may be also used for other purposes. To know whether the bit indeed has the PG_swapcache meaning, it is necessary to check PG_swapbacked, hence this bit must be exported. Link: http://lkml.kernel.org/r/20180410161345.142e142d@ezekiel.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Young <dyoung@redhat.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: "Marc-Andr Lureau" <marcandre.lureau@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13resource: fix integer overflow at reallocationTakashi Iwai
We've got a bug report indicating a kernel panic at booting on an x86-32 system, and it turned out to be the invalid PCI resource assigned after reallocation. __find_resource() first aligns the resource start address and resets the end address with start+size-1 accordingly, then checks whether it's contained. Here the end address may overflow the integer, although resource_contains() still returns true because the function validates only start and end address. So this ends up with returning an invalid resource (start > end). There was already an attempt to cover such a problem in the commit 47ea91b4052d ("Resource: fix wrong resource window calculation"), but this case is an overseen one. This patch adds the validity check of the newly calculated resource for avoiding the integer overflow problem. Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1086739 Link: http://lkml.kernel.org/r/s5hpo37d5l8.wl-tiwai@suse.de Fixes: 23c570a67448 ("resource: ability to resize an allocated resource") Signed-off-by: Takashi Iwai <tiwai@suse.de> Reported-by: Michael Henders <hendersm@shaw.ca> Tested-by: Michael Henders <hendersm@shaw.ca> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-12Merge tag 'trace-v4.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few clean ups and bug fixes: - replace open coded "ARRAY_SIZE()" with macro - updates to uprobes - bug fix for perf event filter on error path" * tag 'trace-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Enforce passing in filter=NULL to create_filter() trace_uprobe: Simplify probes_seq_show() trace_uprobe: Use %lx to display offset tracing/uprobe: Add support for overlayfs tracing: Use ARRAY_SIZE() macro instead of open coding it
2018-04-12Merge tag 'for_linus-4.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb Pull kdb updates from Jason Wessel: - fix 2032 time access issues and new compiler warnings - minor regression test cleanup - formatting fixes for end user use of kdb * tag 'for_linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb: kdb: use memmove instead of overlapping memcpy kdb: use ktime_get_mono_fast_ns() instead of ktime_get_ts() kdb: bl: don't use tab character in output kdb: drop newline in unknown command output kdb: make "mdr" command repeat kdb: use __ktime_get_real_seconds instead of __current_kernel_time misc: kgdbts: Display progress of asynchronous tests
2018-04-12perf/core: Need CAP_SYS_ADMIN to create k/uprobe with perf_event_open()Song Liu
Non-root user cannot create kprobe or uprobe through the text-based interface (kprobe_events, uprobe_events),so they should not be able to create probes via perf_event_open() either. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Song Liu <songliubraving@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU") Fixes: e12f03d7031a ("perf/core: Implement the 'perf_kprobe' PMU") Link: http://lkml.kernel.org/r/C0B2EFB5-C403-4BDB-9046-C14B3EE66999@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-11Merge tag 'pm-4.17-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These include one big-ticket item which is the rework of the idle loop in order to prevent CPUs from spending too much time in shallow idle states. It reduces idle power on some systems by 10% or more and may improve performance of workloads in which the idle loop overhead matters. This has been in the works for several weeks and it has been tested and reviewed quite thoroughly. Also included are changes that finalize the cpufreq cleanup moving frequency table validation from drivers to the core, a few fixes and cleanups of cpufreq drivers, a cpuidle documentation update and a PM QoS core update to mark the expected switch fall-throughs in it. Specifics: - Rework the idle loop in order to prevent CPUs from spending too much time in shallow idle states by making it stop the scheduler tick before putting the CPU into an idle state only if the idle duration predicted by the idle governor is long enough. That required the code to be reordered to invoke the idle governor before stopping the tick, among other things (Rafael Wysocki, Frederic Weisbecker, Arnd Bergmann). - Add the missing description of the residency sysfs attribute to the cpuidle documentation (Prashanth Prakash). - Finalize the cpufreq cleanup moving frequency table validation from drivers to the core (Viresh Kumar). - Fix a clock leak regression in the armada-37xx cpufreq driver (Gregory Clement). - Fix the initialization of the CPU performance data structures for shared policies in the CPPC cpufreq driver (Shunyong Yang). - Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers a bit (Viresh Kumar, Rafael Wysocki). - Mark the expected switch fall-throughs in the PM QoS core (Gustavo Silva)" * tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits) tick-sched: avoid a maybe-uninitialized warning cpufreq: Drop cpufreq_table_validate_and_show() cpufreq: SCMI: Don't validate the frequency table twice cpufreq: CPPC: Initialize shared perf capabilities of CPUs cpufreq: armada-37xx: Fix clock leak cpufreq: CPPC: Don't set transition_latency cpufreq: ti-cpufreq: Use builtin_platform_driver() cpufreq: intel_pstate: Do not include debugfs.h PM / QoS: mark expected switch fall-throughs cpuidle: Add definition of residency to sysfs documentation time: hrtimer: Use timerqueue_iterate_next() to get to the next timer nohz: Avoid duplication of code related to got_idle_tick nohz: Gather tick_sched booleans under a common flag field cpuidle: menu: Avoid selecting shallow states with stopped tick cpuidle: menu: Refine idle state selection for running tick sched: idle: Select idle state before stopping the tick time: hrtimer: Introduce hrtimer_next_event_without() time: tick-sched: Split tick_nohz_stop_sched_tick() cpuidle: Return nohz hint from cpuidle_select() jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC ...
2018-04-11xarray: add the xa_lock to the radix_tree_rootMatthew Wilcox
This results in no change in structure size on 64-bit machines as it fits in the padding between the gfp_t and the void *. 32-bit machines will grow the structure from 8 to 12 bytes. Almost all radix trees are protected with (at least) a spinlock, so as they are converted from radix trees to xarrays, the data structures will shrink again. Initialising the spinlock requires a name for the benefit of lockdep, so RADIX_TREE_INIT() now needs to know the name of the radix tree it's initialising, and so do IDR_INIT() and IDA_INIT(). Also add the xa_lock() and xa_unlock() family of wrappers to make it easier to use the lock. If we could rely on -fplan9-extensions in the compiler, we could avoid all of this syntactic sugar, but that wasn't added until gcc 4.6. Link: http://lkml.kernel.org/r/20180313132639.17387-8-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_paramWaiman Long
Kdoc comments are added to the do_proc_dointvec_minmax_conv_param and do_proc_douintvec_minmax_conv_param structures thare are used internally for range checking. The error codes returned by proc_dointvec_minmax() and proc_douintvec_minmax() are also documented. Link: http://lkml.kernel.org/r/1519926220-7453-3-git-send-email-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11kernel/params.c: downgrade warning for unsafe parametersChris Wilson
As using an unsafe module parameter is, by its very definition, an expected user action, emitting a warning is overkill. Nothing has yet gone wrong, and we add a taint flag for any future oops should something actually go wrong. So instead of having a user controllable pr_warn, downgrade it to a pr_notice for "a normal, but significant condition". We make use of unsafe kernel parameters in igt (https://cgit.freedesktop.org/drm/igt-gpu-tools/) (we have not yet succeeded in removing all such debugging options), which generates a warning and taints the kernel. The warning is unhelpful as we then need to filter it out again as we check that every test themselves do not provoke any kernel warnings. Link: http://lkml.kernel.org/r/20180226151919.9674-1-chris@chris-wilson.co.uk Fixes: 91f9d330cc14 ("module: make it possible to have unsafe, tainting module params") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jean Delvare <khali@linux-fr.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Petri Latvala <petri.latvala@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11kernel/sysctl.c: fix sizeof argument to match variable nameRandy Dunlap
Fix sizeof argument to be the same as the data variable name. Probably a copy/paste error. Mostly harmless since both variables are unsigned int. Fixes kernel bugzilla #197371: Possible access to unintended variable in "kernel/sysctl.c" line 1339 https://bugzilla.kernel.org/show_bug.cgi?id=197371 Link: http://lkml.kernel.org/r/e0d0531f-361e-ef5f-8499-32743ba907e1@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Petru Mihancea <petrum@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11uts: create "struct uts_namespace" from kmem_cacheAlexey Dobriyan
So "struct uts_namespace" can enjoy fine-grained SLAB debugging and usercopy protection. I'd prefer shorter name "utsns" but there is "user_namespace" already. Link: http://lkml.kernel.org/r/20180228215158.GA23146@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11taint: add taint for randstructKees Cook
Since the randstruct plugin can intentionally produce extremely unusual kernel structure layouts (even performance pathological ones), some maintainers want to be able to trivially determine if an Oops is coming from a randstruct-built kernel, so as to keep their sanity when debugging. This adds the new flag and initializes taint_mask immediately when built with randstruct. Link: http://lkml.kernel.org/r/1519084390-43867-4-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11taint: consolidate documentationKees Cook
This consolidates the taint bit documentation into a single place with both numeric and letter values. Additionally adds the missing TAINT_AUX documentation. Link: http://lkml.kernel.org/r/1519084390-43867-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11taint: convert to indexed initializationKees Cook
This converts to using indexed initializers instead of comments, adds a comment on why the taint flags can't be an enum, and make sure that no one forgets to update the taint_flags when adding new bits. Link: http://lkml.kernel.org/r/1519084390-43867-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>