summaryrefslogtreecommitdiff
path: root/init/Kconfig
AgeCommit message (Collapse)Author
2017-11-08Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27sched/isolation: Handle the nohz_full= parameterFrederic Weisbecker
We want to centralize the isolation management, done by the housekeeping subsystem. Therefore we need to handle the nohz_full= parameter from there. Since nohz_full= so far has involved unbound timers, watchdog, RCU and tilegx NAPI isolation, we keep that default behaviour. nohz_full= will be deprecated in the future. We want to control the isolation features from the isolcpus= parameter. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-10-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from ↵Frederic Weisbecker
CONFIG_NO_HZ_FULL Split the housekeeping config from CONFIG_NO_HZ_FULL. This way we finally separate the isolation code from NOHZ. Although a dependency to CONFIG_NO_HZ_FULL remains for now, while the housekeeping code still deals with NOHZ internals. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-8-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-07kbuild: Fix optimization level choice defaultUlf Magnusson
The choice containing the CC_OPTIMIZE_FOR_PERFORMANCE symbol accidentally added a "CONFIG_" prefix when trying to make it the default, selecting an undefined symbol as the default. The mistake is harmless here: Since the default symbol is not visible, the choice falls back on using the visible symbol as the default instead, which is CC_OPTIMIZE_FOR_PERFORMANCE, as intended. A patch that makes Kconfig print a warning in this case has been submitted separately: http://www.spinics.net/lists/linux-kbuild/msg15566.html Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-09-06mm: add SLUB free list pointer obfuscationKees Cook
This SLUB free list pointer obfuscation code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. This adds a per-cache random value to SLUB caches that is XORed with their freelist pointer address and value. This adds nearly zero overhead and frustrates the very common heap overflow exploitation method of overwriting freelist pointers. A recent example of the attack is written up here: http://cyseclabs.com/blog/cve-2016-6187-heap-off-by-one-exploit and there is a section dedicated to the technique the book "A Guide to Kernel Exploitation: Attacking the Core". This is based on patches by Daniel Micay, and refactored to minimize the use of #ifdef. With 200-count cycles of "hackbench -g 20 -l 1000" I saw the following run times: before: mean 10.11882499999999999995 variance .03320378329145728642 stdev .18221905304181911048 after: mean 10.12654000000000000014 variance .04700556623115577889 stdev .21680767106160192064 The difference gets lost in the noise, but if the above is to be taken literally, using CONFIG_FREELIST_HARDENED is 0.07% slower. Link: http://lkml.kernel.org/r/20170802180609.GA66807@beast Signed-off-by: Kees Cook <keescook@chromium.org> Suggested-by: Daniel Micay <danielmicay@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Tycho Andersen <tycho@docker.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-01futex: Allow for compiling out PI supportNicolas Pitre
This makes it possible to preserve basic futex support and compile out the PI support when RT mutexes are not available. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708010024190.5981@knanqh.ubzr
2017-07-06mm: allow slab_nomerge to be set at build timeKees Cook
Some hardened environments want to build kernels with slab_nomerge already set (so that they do not depend on remembering to set the kernel command line option). This is desired to reduce the risk of kernel heap overflows being able to overwrite objects from merged caches and changes the requirements for cache layout control, increasing the difficulty of these attacks. By keeping caches unmerged, these kinds of exploits can usually only damage objects in the same cache (though the risk to metadata exploitation is unchanged). Link: http://lkml.kernel.org/r/20170620230911.GA25238@beast Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Cc: David Windsor <dave@nullcore.net> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Daniel Micay <danielmicay@gmail.com> Cc: David Windsor <dave@nullcore.net> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Tejun Heo <tj@kernel.org> Cc: Daniel Mack <daniel@zonque.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Rik van Riel <riel@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06Merge branch 'for-4.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: - Waiman made the debug controller work and a lot more useful on cgroup2 - There were a couple issues with cgroup subtree delegation. The documentation on delegating to a non-root user was missing some part and cgroup namespace support wasn't factoring in delegation at all. The documentation is updated and the now there is a mount option to make cgroup namespace fit for delegation * 'for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: implement "nsdelegate" mount option cgroup: restructure cgroup_procs_write_permission() cgroup: "cgroup.subtree_control" should be writeable by delegatee cgroup: fix lockdep warning in debug controller cgroup: refactor cgroup_masks_read() in the debug controller cgroup: make debug an implicit controller on cgroup2 cgroup: Make debug cgroup support v2 and thread mode cgroup: Make Kconfig prompt of debug cgroup more accurate cgroup: Move debug cgroup to its own file cgroup: Keep accurate count of tasks in each css_set
2017-07-03Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Add the SYSTEM_SCHEDULING bootup state to move various scheduler debug checks earlier into the bootup. This turns silent and sporadically deadly bugs into nice, deterministic splats. Fix some of the splats that triggered. (Thomas Gleixner) - A round of restructuring and refactoring of the load-balancing and topology code (Peter Zijlstra) - Another round of consolidating ~20 of incremental scheduler code history: this time in terms of wait-queue nomenclature. (I didn't get much feedback on these renaming patches, and we can still easily change any names I might have misplaced, so if anyone hates a new name, please holler and I'll fix it.) (Ingo Molnar) - sched/numa improvements, fixes and updates (Rik van Riel) - Another round of x86/tsc scheduler clock code improvements, in hope of making it more robust (Peter Zijlstra) - Improve NOHZ behavior (Frederic Weisbecker) - Deadline scheduler improvements and fixes (Luca Abeni, Daniel Bristot de Oliveira) - Simplify and optimize the topology setup code (Lauro Ramos Venancio) - Debloat and decouple scheduler code some more (Nicolas Pitre) - Simplify code by making better use of llist primitives (Byungchul Park) - ... plus other fixes and improvements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) sched/cputime: Refactor the cputime_adjust() code sched/debug: Expose the number of RT/DL tasks that can migrate sched/numa: Hide numa_wake_affine() from UP build sched/fair: Remove effective_load() sched/numa: Implement NUMA node level wake_affine() sched/fair: Simplify wake_affine() for the single socket case sched/numa: Override part of migrate_degrades_locality() when idle balancing sched/rt: Move RT related code from sched/core.c to sched/rt.c sched/deadline: Move DL related code from sched/core.c to sched/deadline.c sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled sched/fair: Spare idle load balancing on nohz_full CPUs nohz: Move idle balancer registration to the idle path sched/loadavg: Generalize "_idle" naming to "_nohz" sched/core: Drop the unused try_get_task_struct() helper function sched/fair: WARN() and refuse to set buddy when !se->on_rq sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h> ...
2017-06-23sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabledNicolas Pitre
Make CONFIG_CPUSETS=y depend on SMP as this feature makes no sense on UP. This allows for configuring out cpuset_cpumask_can_shrink() and task_can_attach() entirely, which shrinks the kernel a bit. Signed-off-by: Nicolas Pitre <nico@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170614171926.8345-2-nicolas.pitre@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-14cgroup: Make Kconfig prompt of debug cgroup more accurateWaiman Long
The Kconfig prompt and description of the debug cgroup controller more accurate by saying that it is for debug purpose only and its interfaces are unstable. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-06-08rcu: Move RCU non-debug Kconfig options to kernel/rcuPaul E. McKenney
RCU's Kconfig options are scattered, and there are enough of them that it would be good for them to be more centralized. This commit therefore extracts RCU's Kconfig options from init/Kconfig into a new kernel/rcu/Kconfig file. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Eliminate NOCBs CPU-state Kconfig optionsPaul E. McKenney
The CONFIG_RCU_NOCB_CPU_ALL, CONFIG_RCU_NOCB_CPU_NONE, and CONFIG_RCU_NOCB_CPU_ZERO Kconfig options are used only in testing and are redundant with the rcu_nocbs= boot parameter. This commit therefore removes these three Kconfig options and adjusts the rcutorture scripts to use the boot parameter instead. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Remove debugfs tracingPaul E. McKenney
RCU's debugfs tracing used to be the only reasonable low-level debug information available, but ftrace and event tracing has since surpassed the RCU debugfs level of usefulness. This commit therefore removes RCU's debugfs tracing. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Remove Classic SRCUPaul E. McKenney
Classic SRCU was only ever intended to be a fallback in case of issues with Tree/Tiny SRCU, and the latter two are doing quite well in testing. This commit therefore removes Classic SRCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Remove the RCU_KTHREAD_PRIO Kconfig optionPaul E. McKenney
Anything that can be done with the RCU_KTHREAD_PRIO Kconfig option can also be done with the rcutree.kthread_prio kernel boot parameter. This commit therefore removes this Kconfig option. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com>
2017-06-08srcu: Apply trivial callback lists to shrink Tiny SRCUPaul E. McKenney
The rcu_segcblist structure provides quite a bit of functionality, and Tiny SRCU needs almost none of it. So this commit replaces Tiny SRCU's uses of rcu_segcblist with a simple singly linked list with tail pointer. This change significantly reduces Tiny SRCU's memory footprint, more than making up for the growth caused by the creation of rcu_segcblist.c Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Make SRCU be once again optionalPaul E. McKenney
Commit d160a727c40e ("srcu: Make SRCU be built by default") in response to build errors, which were caused by code that included srcu.h despite !SRCU. However, srcutiny.o is almost 2K of code, which is not insignificant for those attempting to run the Linux kernel on IoT devices. This commit therefore makes SRCU be once again optional, and adjusts srcu.h to allow error-free inclusion in !SRCU kernel builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Nicolas Pitre <nico@linaro.org>
2017-05-02rcu: Separately compile large rcu_segcblist functionsPaul E. McKenney
This commit creates a new kernel/rcu/rcu_segcblist.c file that contains non-trivial segcblist functions. Trivial functions remain as static inline functions in kernel/rcu/rcu_segcblist.h Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2017-04-24srcu: Make SRCU be built by defaultPaul E. McKenney
SRCU is optional, and included only if there is a "select SRCU" in effect. However, we now have Tiny SRCU, so this commit defaults CONFIG_SRCU=y. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-24srcu: Fix Kconfig botch when SRCU not selectedPaul E. McKenney
If the CONFIG_SRCU option is not selected, for example, when building arch/tile allnoconfig, the following build errors appear: kernel/rcu/tree.o: In function `srcu_online_cpu': tree.c:(.text+0x4248): multiple definition of `srcu_online_cpu' kernel/rcu/srcutree.o:srcutree.c:(.text+0x2120): first defined here kernel/rcu/tree.o: In function `srcu_offline_cpu': tree.c:(.text+0x4250): multiple definition of `srcu_offline_cpu' kernel/rcu/srcutree.o:srcutree.c:(.text+0x2160): first defined here The corresponding .config file shows CONFIG_TREE_SRCU=y, but no sign of CONFIG_SRCU, which fatally confuses SRCU's #ifdefs, resulting in the above errors. The reason this occurs is the folowing line in init/Kconfig's definition for TREE_SRCU: default y if !TINY_RCU && !CLASSIC_SRCU If CONFIG_CLASSIC_SRCU=n, as it will be in for allnoconfig, and if CONFIG_SMP=y, then we will get CONFIG_TREE_SRCU=y but no CONFIG_SRCU, as seen in the .config file, and which will result in the above errors. This error did not show up during rcutorture testing because rcutorture forces CONFIG_SRCU=y, as it must to prevent build errors in rcutorture.c. This commit therefore conditions TREE_SRCU (and TINY_SRCU, while it is at it) with SRCU, like this: default y if SRCU && !TINY_RCU && !CLASSIC_SRCU Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20170423162205.GP3956@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-21Merge branches 'doc.2017.04.12a', 'fixes.2017.04.19a' and 'srcu.2017.04.21a' ↵Paul E. McKenney
into HEAD doc.2017.04.12a: Documentation updates fixes.2017.04.19a: Miscellaneous fixes srcu.2017.04.21a: Parallelize SRCU callback handling
2017-04-19rcu: Make RCU_FANOUT_LEAF help text more explicit about skew_tickPaul E. McKenney
If you set RCU_FANOUT_LEAF too high, you can get lock contention on the leaf rcu_node, and you should boot with the skew_tick kernel parameter set in order to avoid this lock contention. This commit therefore upgrades the RCU_FANOUT_LEAF help text to explicitly state this. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18srcu: Introduce CLASSIC_SRCU Kconfig optionPaul E. McKenney
The TREE_SRCU rewrite is large and a bit on the non-simple side, so this commit helps reduce risk by allowing the old v4.11 SRCU algorithm to be selected using a new CLASSIC_SRCU Kconfig option that depends on RCU_EXPERT. The default is to use the new TREE_SRCU and TINY_SRCU algorithms, in order to help get these the testing that they need. However, if your users do not require the update-side scalability that is to be provided by TREE_SRCU, select RCU_EXPERT and then CLASSIC_SRCU to revert back to the old classic SRCU algorithm. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18srcu: Create a tiny SRCUPaul E. McKenney
In response to automated complaints about modifications to SRCU increasing its size, this commit creates a tiny SRCU that is used in SMP=n && PREEMPT=n builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-02-27Merge branch 'for-4.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Several noteworthy changes. - Parav's rdma controller is finally merged. It is very straight forward and can limit the abosolute numbers of common rdma constructs used by different cgroups. - kernel/cgroup.c got too chubby and disorganized. Created kernel/cgroup/ subdirectory and moved all cgroup related files under kernel/ there and reorganized the core code. This hurts for backporting patches but was long overdue. - cgroup v2 process listing reimplemented so that it no longer depends on allocating a buffer large enough to cache the entire result to sort and uniq the output. v2 has always mangled the sort order to ensure that users don't depend on the sorted output, so this shouldn't surprise anybody. This makes the pid listing functions use the same iterators that are used internally, which have to have the same iterating capabilities anyway. - perf cgroup filtering now works automatically on cgroup v2. This patch was posted a long time ago but somehow fell through the cracks. - misc fixes asnd documentation updates" * 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits) kernfs: fix locking around kernfs_ops->release() callback cgroup: drop the matching uid requirement on migration for cgroup v2 cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy cgroup: misc cleanups cgroup: call subsys->*attach() only for subsystems which are actually affected by migration cgroup: track migration context in cgroup_mgctx cgroup: cosmetic update to cgroup_taskset_add() rdmacg: Fixed uninitialized current resource usage cgroup: Add missing cgroup-v2 PID controller documentation. rdmacg: Added documentation for rdmacg IB/core: added support to use rdma cgroup controller rdmacg: Added rdma cgroup controller cgroup: fix a comment typo cgroup: fix RCU related sparse warnings cgroup: move namespace code to kernel/cgroup/namespace.c cgroup: rename functions for consistency cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c cgroup: separate out cgroup1_kf_syscall_ops cgroup: refactor mount path and clearly distinguish v1 and v2 paths cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c ...
2017-02-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: "142 patches: - DAX updates - various misc bits - OCFS2 updates - most of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (142 commits) mm/z3fold.c: limit first_num to the actual range of possible buddy indexes mm: fix <linux/pagemap.h> stray kernel-doc notation zram: remove obsolete sysfs attrs mm/memblock.c: remove unnecessary log and clean up oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA mm: drop unused argument of zap_page_range() mm: drop zap_details::check_swap_entries mm: drop zap_details::ignore_dirty mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled mm: help __GFP_NOFAIL allocations which do not trigger OOM killer mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically mm: consolidate GFP_NOFAIL checks in the allocator slowpath lib/show_mem.c: teach show_mem to work with the given nodemask arch, mm: remove arch specific show_mem mm, page_alloc: warn_alloc print nodemask mm, page_alloc: do not report all nodes in show_mem Revert "mm: bail out in shrink_inactive_list()" mm, vmscan: consider eligible zones in get_scan_count mm, vmscan: cleanup lru size claculations mm, vmscan: do not count freed pages as PGDEACTIVATE ...
2017-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven Rostedt as the printk reviewer. This idea came up after the discussion about printk issues at Kernel Summit. It was formulated and discussed at lkml[1]. - Extend a lock-less NMI per-cpu buffers idea to handle recursive printk() calls by Sergey Senozhatsky[2]. It is the first step in sanitizing printk as discussed at Kernel Summit. The change allows to see messages that would normally get ignored or would cause a deadlock. Also it allows to enable lockdep in printk(). This already paid off. The testing in linux-next helped to discover two old problems that were hidden before[3][4]. - Remove unused parameter by Sergey Senozhatsky. Clean up after a past change. [1] http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com [2] http://lkml.kernel.org/r/20161227141611.940-1-sergey.senozhatsky@gmail.com [3] http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com [4] http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: drop call_console_drivers() unused param printk: convert the rest to printk-safe printk: remove zap_locks() function printk: use printk_safe buffers in printk printk: report lost messages in printk safe/nmi contexts printk: always use deferred printk when flush printk_safe lines printk: introduce per-cpu safe_print seq buffer printk: rename nmi.c and exported api printk: use vprintk_func in vprintk() MAINTAINERS: Add printk maintainers
2017-02-22slub: make sysfs directories for memcg sub-caches optionalTejun Heo
SLUB creates a per-cache directory under /sys/kernel/slab which hosts a bunch of debug files. Usually, there aren't that many caches on a system and this doesn't really matter; however, if memcg is in use, each cache can have per-cgroup sub-caches. SLUB creates the same directories for these sub-caches under /sys/kernel/slab/$CACHE/cgroup. Unfortunately, because there can be a lot of cgroups, active or draining, the product of the numbers of caches, cgroups and files in each directory can reach a very high number - hundreds of thousands is commonplace. Millions and beyond aren't difficult to reach either. What's under /sys/kernel/slab is primarily for debugging and the information and control on the a root cache already cover its sub-caches. While having a separate directory for each sub-cache can be helpful for development, it doesn't make much sense to pay this amount of overhead by default. This patch introduces a boot parameter slub_memcg_sysfs which determines whether to create sysfs directories for per-memcg sub-caches. It also adds CONFIG_SLUB_MEMCG_SYSFS_ON which determines the boot parameter's default value and defaults to 0. [akpm@linux-foundation.org: kset_unregister(NULL) is legal] Link: http://lkml.kernel.org/r/20170204145203.GB26958@mtj.duckdns.org Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22Merge tag 'char-misc-4.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver patchset for 4.11-rc1. Lots of different driver subsystems updated here: rework for the hyperv subsystem to handle new platforms better, mei and w1 and extcon driver updates, as well as a number of other "minor" driver updates. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (169 commits) goldfish: Sanitize the broken interrupt handler x86/platform/goldfish: Prevent unconditional loading vmbus: replace modulus operation with subtraction vmbus: constify parameters where possible vmbus: expose hv_begin/end_read vmbus: remove conditional locking of vmbus_write vmbus: add direct isr callback mode vmbus: change to per channel tasklet vmbus: put related per-cpu variable together vmbus: callback is in softirq not workqueue binder: Add support for file-descriptor arrays binder: Add support for scatter-gather binder: Add extra size to allocator binder: Refactor binder_transact() binder: Support multiple /dev instances binder: Deal with contexts in debugfs binder: Support multiple context managers binder: Split flat_binder_object auxdisplay: ht16k33: remove private workqueue auxdisplay: ht16k33: rework input device initialization ...
2017-02-20Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The RCU changes in this cycle are: - Dynticks updates, consolidating open-coded counter accesses into a well-defined API - SRCU updates: Simplify algorithm, add formal verification - Documentation updates - Miscellaneous fixes - Torture-test updates Most of the diffstat comes from the relatively large documentation update" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits) srcu: Reduce probability of SRCU ->unlock_count[] counter overflow rcutorture: Add CBMC-based formal verification for SRCU srcu: Force full grace-period ordering srcu: Implement more-efficient reader counts rcu: Adjust FQS offline checks for exact online-CPU detection rcu: Check cond_resched_rcu_qs() state less often to reduce GP overhead rcu: Abstract extended quiescent state determination rcu: Abstract dynticks extended quiescent state enter/exit operations rcu: Add lockdep checks to synchronous expedited primitives rcu: Eliminate unused expedited_normal counter llist: Clarify comments about when locking is needed rcu: Fix comment in rcu_organize_nocb_kthreads() rcu: Enable RCU tracepoints by default to aid in debugging rcu: Make rcu_cpu_starting() use its "cpu" argument rcu: Add comment headers to expedited-grace-period counter functions rcu: Don't wake rcuc/X kthreads on NOCB CPUs rcu: Re-enable TASKS_RCU for User Mode Linux rcu: Once again use NMI-based stack traces in stall warnings rcu: Remove short-term CPU kicking rcu: Add long-term CPU kicking ...
2017-02-08printk: rename nmi.c and exported apiSergey Senozhatsky
A preparation patch for printk_safe work. No functional change. - rename nmi.c to print_safe.c - add `printk_safe' prefix to some (which used both by printk-safe and printk-nmi) of the exported functions. Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-02-06Merge 4.10-rc7 into char-misc-nextGreg Kroah-Hartman
We want the hv and other fixes in here as well to handle merge and testing issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-03kbuild: modversions: add infrastructure for emitting relative CRCsArd Biesheuvel
This add the kbuild infrastructure that will allow architectures to emit vmlinux symbol CRCs as 32-bit offsets to another location in the kernel where the actual value is stored. This works around problems with CRCs being mistaken for relocatable symbols on kernels that self relocate at runtime (i.e., powerpc with CONFIG_RELOCATABLE=y) For the kbuild side of things, this comes down to the following: - introducing a Kconfig symbol MODULE_REL_CRCS - adding a -R switch to genksyms to instruct it to emit the CRC symbols as references into the .rodata section - making modpost distinguish such references from absolute CRC symbols by the section index (SHN_ABS) - making kallsyms disregard non-absolute symbols with a __crc_ prefix Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-31Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - Dynticks updates, consolidating open-coded counter accesses into a well-defined API - SRCU updates: Simplify algorithm, add formal verification - Documentation updates - Miscellaneous fixes - Torture-test updates Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-23rcu: Re-enable TASKS_RCU for User Mode LinuxPaul E. McKenney
Now that User Mode Linux supports arch_irqs_disabled_flags(), this commit re-enables TASKS_RCU for User Mode Linux. Reported-by: Richard Weinberger <richard@nod.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-19pc104: Introduce the PC104 Kconfig optionWilliam Breathitt Gray
PC/104 form factor devices serve a specific niche of embedded system users; most Linux users will not have PC/104 form factor devices. This patch introduces the PC104 Kconfig option, which should be used to filter PC/104 specific device drivers and options, so that only those users interested in PC/104 related options are exposed to them. Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-16rcu: update: Make RCU_EXPEDITE_BOOT be the defaultSebastian Andrzej Siewior
RCU_EXPEDITE_BOOT should speed up the boot process by enforcing synchronize_rcu_expedited() instead of synchronize_rcu() during the boot process. There should be no reason why one does not want this and there is no need worry about real time latency at this point. Therefore make it default. Note that users wishing to avoid expediting entirely, for example when bringing up new hardware possibly having flaky IPIs, can use the rcu_normal boot parameter to override boot-time expediting. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Reworded commit log. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-11cgroup: move CONFIG_SOCK_CGROUP_DATA to init/KconfigArnd Bergmann
We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is not right when CONFIG_NET is disabled and there is no socket interface: warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET) I don't know what the correct solution for this is, but simply removing the dependency on NET from SOCK_CGROUP_DATA by moving it out of the 'if NET' section avoids the warning and does not produce other build errors. Fixes: 483c4933ea09 ("cgroup: Fix CGROUP_BPF config") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10rdmacg: Added rdma cgroup controllerParav Pandit
Added rdma cgroup controller that does accounting, limit enforcement on rdma/IB resources. Added rdma cgroup header file which defines its APIs to perform charging/uncharging functionality. It also defined APIs for RDMA/IB stack for device registration. Devices which are registered will participate in controller functions of accounting and limit enforcements. It define rdmacg_device structure to bind IB stack and RDMA cgroup controller. RDMA resources are tracked using resource pool. Resource pool is per device, per cgroup entity which allows setting up accounting limits on per device basis. Currently resources are defined by the RDMA cgroup. Resource pool is created/destroyed dynamically whenever charging/uncharging occurs respectively and whenever user configuration is done. Its a tradeoff of memory vs little more code space that creates resource pool object whenever necessary, instead of creating them during cgroup creation and device registration time. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-12-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes and cleanups from David Miller: 1) Revert bogus nla_ok() change, from Alexey Dobriyan. 2) Various bpf validator fixes from Daniel Borkmann. 3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04 drivers, from Dongpo Li. 4) Several ethtool ksettings conversions from Philippe Reynes. 5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert. 6) XDP support for virtio_net, from John Fastabend. 7) Fix NAT handling within a vrf, from David Ahern. 8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits) net: mv643xx_eth: fix build failure isdn: Constify some function parameters mlxsw: spectrum: Mark split ports as such cgroup: Fix CGROUP_BPF config qed: fix old-style function definition net: ipv6: check route protocol when deleting routes r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected irda: w83977af_ir: cleanup an indent issue net: sfc: use new api ethtool_{get|set}_link_ksettings net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings bpf: fix mark_reg_unknown_value for spilled regs on map value marking bpf: fix overflow in prog accounting bpf: dynamically allocate digest scratch buffer gtp: Fix initialization of Flags octet in GTPv1 header gtp: gtp_check_src_ms_ipv4() always return success net/x25: use designated initializers isdn: use designated initializers ...
2016-12-17cgroup: Fix CGROUP_BPF configAndy Lutomirski
CGROUP_BPF depended on SOCK_CGROUP_DATA which can't be manually enabled, making it rather challenging to turn CGROUP_BPF on. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-12Merge tag 'docs-4.10' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation update from Jonathan Corbet: "These are the documentation changes for 4.10. It's another busy cycle for the docs tree, as the sphinx conversion continues. Highlights include: - Further work on PDF output, which remains a bit of a pain but should be more solid now. - Five more DocBook template files converted to Sphinx. Only 27 to go... Lots of plain-text files have also been converted and integrated. - Images in binary formats have been replaced with more source-friendly versions. - Various bits of organizational work, including the renaming of various files discussed at the kernel summit. - New documentation for the device_link mechanism. ... and, of course, lots of typo fixes and small updates" * tag 'docs-4.10' of git://git.lwn.net/linux: (193 commits) dma-buf: Extract dma-buf.rst Update Documentation/00-INDEX docs: 00-INDEX: document directories/files with no docs docs: 00-INDEX: remove non-existing entries docs: 00-INDEX: add missing entries for documentation files/dirs docs: 00-INDEX: consolidate process/ and admin-guide/ description scripts: add a script to check if Documentation/00-INDEX is sane Docs: change sh -> awk in REPORTING-BUGS Documentation/core-api/device_link: Add initial documentation core-api: remove an unexpected unident ppc/idle: Add documentation for powersave=off Doc: Correct typo, "Introdution" => "Introduction" Documentation/atomic_ops.txt: convert to ReST markup Documentation/local_ops.txt: convert to ReST markup Documentation/assoc_array.txt: convert to ReST markup docs-rst: parse-headers.pl: cleanup the documentation docs-rst: fix media cleandocs target docs-rst: media/Makefile: reorganize the rules docs-rst: media: build SVG from graphviz files docs-rst: replace bayer.png by a SVG image ...
2016-12-12Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The time/timekeeping/timer folks deliver with this update: - Fix a reintroduced signed/unsigned issue and cleanup the whole signed/unsigned mess in the timekeeping core so this wont happen accidentaly again. - Add a new trace clock based on boot time - Prevent injection of random sleep times when PM tracing abuses the RTC for storage - Make posix timers configurable for real tiny systems - Add tracepoints for the alarm timer subsystem so timer based suspend wakeups can be instrumented - The usual pile of fixes and updates to core and drivers" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) timekeeping: Use mul_u64_u32_shr() instead of open coding it timekeeping: Get rid of pointless typecasts timekeeping: Make the conversion call chain consistently unsigned timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion alarmtimer: Add tracepoints for alarm timers trace: Update documentation for mono, mono_raw and boot clock trace: Add an option for boot clock as trace clock timekeeping: Add a fast and NMI safe boot clock timekeeping/clocksource_cyc2ns: Document intended range limitation timekeeping: Ignore the bogus sleep time if pm_trace is enabled selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous" clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map() arm64: dts: rockchip: Arch counter doesn't tick in system suspend clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend posix-timers: Make them configurable posix_cpu_timers: Move the add_device_randomness() call to a proper place timer: Move sys_alarm from timer.c to itimer.c ptp_clock: Allow for it to be optional Kconfig: Regenerate *.c_shipped files after previous changes ...
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29Re-enable CONFIG_MODVERSIONS in a slightly weaker formLinus Torvalds
This enables CONFIG_MODVERSIONS again, but allows for missing symbol CRC information in order to work around the issue that newer binutils versions seem to occasionally drop the CRC on the floor. binutils 2.26 seems to work fine, while binutils 2.27 seems to break MODVERSIONS of symbols that have been defined in assembler files. [ We've had random missing CRC's before - it may be an old problem that just is now reliably triggered with the weak asm symbols and a new version of binutils ] Some day I really do want to remove MODVERSIONS entirely. Sadly, today does not appear to be that day: Debian people apparently do want the option to enable MODVERSIONS to make it easier to have external modules across kernel versions, and this seems to be a fairly minimal fix for the annoying problem. Cc: Ben Hutchings <ben@decadent.org.uk> Acked-by: Michal Marek <mmarek@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
udplite conflict is resolved by taking what 'net-next' did which removed the backlog receive method assignment, since it is no longer necessary. Two entries were added to the non-priv ethtool operations switch statement, one in 'net' and one in 'net-next, so simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25Fix subtle CONFIG_MODVERSIONS problemsLinus Torvalds
CONFIG_MODVERSIONS has been broken for pretty much the whole 4.9 series, and quite frankly, nobody has cared very deeply. We absolutely know how to fix it, and it's not _complicated_, but it's not exactly pretty either. This oneliner fixes it without the ugliness, and allows for further future cleanups. "We've secretly replaced their regular MODVERSIONS with nothing at all, let's see if they notice" Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-25cgroup: add support for eBPF programsDaniel Mack
This patch adds two sets of eBPF program pointers to struct cgroup. One for such that are directly pinned to a cgroup, and one for such that are effective for it. To illustrate the logic behind that, assume the following example cgroup hierarchy. A - B - C \ D - E If only B has a program attached, it will be effective for B, C, D and E. If D then attaches a program itself, that will be effective for both D and E, and the program in B will only affect B and C. Only one program of a given type is effective for a cgroup. Attaching and detaching programs will be done through the bpf(2) syscall. For now, ingress and egress inet socket filtering are the only supported use-cases. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16posix-timers: Make them configurableNicolas Pitre
Some embedded systems have no use for them. This removes about 25KB from the kernel binary size when configured out. Corresponding syscalls are routed to a stub logging the attempt to use those syscalls which should be enough of a clue if they were disabled without proper consideration. They are: timer_create, timer_gettime: timer_getoverrun, timer_settime, timer_delete, clock_adjtime, setitimer, getitimer, alarm. The clock_settime, clock_gettime, clock_getres and clock_nanosleep syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME, CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast majority of use cases with very little code. Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: linux-kbuild@vger.kernel.org Cc: netdev@vger.kernel.org Cc: Michal Marek <mmarek@suse.com> Cc: Edward Cree <ecree@solarflare.com> Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>