summaryrefslogtreecommitdiff
path: root/include/linux/sysctl.h
AgeCommit message (Collapse)Author
2023-12-28sysctl: remove struct ctl_pathThomas Weißschuh
All usages of this struct have been removed from the kernel tree. The struct is still referenced by scripts/check-sysctl-docs but that script is broken anyways as it only supports the register_sysctl_paths() API and not the currently used register_sysctl() one. Fixes: 0199849acd07 ("sysctl: remove register_sysctl_paths()") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-12-28sysctl: delete unused define SYSCTL_PERM_EMPTY_DIRThomas Weißschuh
It seems it was never used. Fixes: 2f2665c13af4 ("sysctl: replace child with an enumeration") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-11-01proc: sysctl: prevent aliased sysctls from getting passed to initKrister Johansen
The code that checks for unknown boot options is unaware of the sysctl alias facility, which maps bootparams to sysctl values. If a user sets an old value that has a valid alias, a message about an invalid parameter will be printed during boot, and the parameter will get passed to init. Fix by checking for the existence of aliased parameters in the unknown boot parameter code. If an alias exists, don't return an error or pass the value to init. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Cc: stable@vger.kernel.org Fixes: 0a477e1ae21b ("kernel/sysctl: support handling command line aliases") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Add size arg to __register_sysctl_initJoel Granados
This commit adds table_size to __register_sysctl_init in preparation for the removal of the sentinel elements in the ctl_table arrays (last empty markers). And though we do *not* remove any sentinels in this commit, we set things up by calculating the ctl_table array size with ARRAY_SIZE. We add a table_size argument to __register_sysctl_init and modify the register_sysctl_init macro to calculate the array size with ARRAY_SIZE. The original callers do not need to be updated as they will go through the new macro. Signed-off-by: Joel Granados <j.granados@samsung.com> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Add size to register_sysctlJoel Granados
This commit adds table_size to register_sysctl in preparation for the removal of the sentinel elements in the ctl_table arrays (last empty markers). And though we do *not* remove any sentinels in this commit, we set things up by either passing the table_size explicitly or using ARRAY_SIZE on the ctl_table arrays. We replace the register_syctl function with a macro that will add the ARRAY_SIZE to the new register_sysctl_sz function. In this way the callers that are already using an array of ctl_table structs do not change. For the callers that pass a ctl_table array pointer, we pass the table_size to register_sysctl_sz instead of the macro. Signed-off-by: Joel Granados <j.granados@samsung.com> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Add a size arg to __register_sysctl_tableJoel Granados
We make these changes in order to prepare __register_sysctl_table and its callers for when we remove the sentinel element (empty element at the end of ctl_table arrays). We don't actually remove any sentinels in this commit, but we *do* make sure to use ARRAY_SIZE so the table_size is available when the removal occurs. We add a table_size argument to __register_sysctl_table and adjust callers, all of which pass ctl_table pointers and need an explicit call to ARRAY_SIZE. We implement a size calculation in register_net_sysctl in order to forward the size of the array pointer received from the network register calls. The new table_size argument does not yet have any effect in the init_header call which is still dependent on the sentinel's presence. table_size *does* however drive the `kzalloc` allocation in __register_sysctl_table with no adverse effects as the allocated memory is either one element greater than the calculated ctl_table array (for the calls in ipc_sysctl.c, mq_sysctl.c and ucount.c) or the exact size of the calculated ctl_table array (for the call from sysctl_net.c and register_sysctl). This approach will allows us to "just" remove the sentinel without further changes to __register_sysctl_table as table_size will represent the exact size for all the callers at that point. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-08-15sysctl: Add ctl_table_size to ctl_table_headerJoel Granados
The new ctl_table_size element will hold the size of the ctl_table arrays contained in the ctl_table_header. This value should eventually be passed by the callers to the sysctl register infrastructure. And while this commit introduces the variable, it does not set nor use it because that requires case by case considerations for each caller. It provides two important things: (1) A place to put the result of the ctl_table array calculation when it gets introduced for each caller. And (2) the size that will be used as the additional stopping criteria in the list_for_each_table_entry macro (to be added when all the callers are migrated) Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-06-18sysctl: replace child with an enumerationJoel Granados
This is part of the effort to remove the empty element at the end of ctl_table structs. "child" was a deprecated elem in this struct and was being used to differentiate between two types of ctl_tables: "normal" and "permanently emtpy". What changed?: * Replace "child" with an enumeration that will have two values: the default (0) and the permanently empty (1). The latter is left at zero so when struct ctl_table is created with kzalloc or in a local context, it will have the zero value by default. We document the new enum with kdoc. * Remove the "empty child" check from sysctl_check_table * Remove count_subheaders function as there is no longer a need to calculate how many headers there are for every child * Remove the recursive call to unregister_sysctl_table as there is no need to traverse down the child tree any longer * Add a new SYSCTL_PERM_EMPTY_DIR binary flag * Remove the last remanence of child from partport/procfs.c Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-23sysctl: Refactor base paths registrationsJoel Granados
This is part of the general push to deprecate register_sysctl_paths and register_sysctl_table. The old way of doing this through register_sysctl_base and DECLARE_SYSCTL_BASE macro is replaced with a call to register_sysctl_init. The 5 base paths affected are: "kernel", "vm", "debug", "dev" and "fs". We remove the register_sysctl_base function and the DECLARE_SYSCTL_BASE macro since they are no longer needed. In order to quickly acertain that the paths did not actually change I executed `find /proc/sys/ | sha1sum` and made sure that the sha was the same before and after the commit. We end up saving 563 bytes with this change: ./scripts/bloat-o-meter vmlinux.0.base vmlinux.1.refactor-base-paths add/remove: 0/5 grow/shrink: 2/0 up/down: 77/-640 (-563) Function old new delta sysctl_init_bases 55 111 +56 init_fs_sysctls 12 33 +21 vm_base_table 128 - -128 kernel_base_table 128 - -128 fs_base_table 128 - -128 dev_base_table 128 - -128 debug_base_table 128 - -128 Total: Before=21258215, After=21257652, chg -0.00% [mcgrof: modified to use register_sysctl_init() over register_sysctl() and add bloat-o-meter stats] Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Christian Brauner <brauner@kernel.org>
2023-05-23sysctl: stop exporting register_sysctl_tableJoel Granados
We make register_sysctl_table static because the only function calling it is in fs/proc/proc_sysctl.c (__register_sysctl_base). We remove it from the sysctl.h header and modify the documentation in both the header and proc_sysctl.c files to mention "register_sysctl" instead of "register_sysctl_table". This plus the commits that remove register_sysctl_table from parport save 217 bytes: ./scripts/bloat-o-meter .bsysctl/vmlinux.old .bsysctl/vmlinux.new add/remove: 0/1 grow/shrink: 5/1 up/down: 458/-675 (-217) Function old new delta __register_sysctl_base 8 286 +278 parport_proc_register 268 379 +111 parport_device_proc_register 195 247 +52 kzalloc.constprop 598 608 +10 parport_default_proc_register 62 69 +7 register_sysctl_table 291 - -291 parport_sysctl_template 1288 904 -384 Total: Before=8603076, After=8602859, chg -0.00% Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-02sysctl: remove register_sysctl_paths()Luis Chamberlain
The deprecation for register_sysctl_paths() is over. We can rejoice as we nuke register_sysctl_paths(). The routine register_sysctl_table() was the only user left of register_sysctl_paths(), so we can now just open code and move the implementation over to what used to be to __register_sysctl_paths(). The old dynamic struct ctl_table_set *set is now the point to sysctl_table_root.default_set. The old dynamic const struct ctl_path *path was being used in the routine register_sysctl_paths() with a static: static const struct ctl_path null_path[] = { {} }; Since this is a null path we can now just simplfy the old routine and remove its use as its always empty. This saves us a total of 230 bytes. $ ./scripts/bloat-o-meter vmlinux.old vmlinux add/remove: 2/7 grow/shrink: 1/1 up/down: 1015/-1245 (-230) Function old new delta register_leaf_sysctl_tables.constprop - 524 +524 register_sysctl_table 22 497 +475 __pfx_register_leaf_sysctl_tables.constprop - 16 +16 null_path 8 - -8 __pfx_register_sysctl_paths 16 - -16 __pfx_register_leaf_sysctl_tables 16 - -16 __pfx___register_sysctl_paths 16 - -16 __register_sysctl_base 29 12 -17 register_sysctl_paths 18 - -18 register_leaf_sysctl_tables 534 - -534 __register_sysctl_paths 620 - -620 Total: Before=21259666, After=21259436, chg -0.00% Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-08-08mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readabilityMuchun Song
There is a discussion about the name of hugetlb_vmemmap_alloc/free in thread [1]. The suggestion suggested by David is rename "alloc/free" to "optimize/restore" to make functionalities clearer to users, "optimize" means the function will optimize vmemmap pages, while "restore" means restoring its vmemmap pages discared before. This commit does this. Another discussion is the confusion RESERVE_VMEMMAP_NR isn't used explicitly for vmemmap_addr but implicitly for vmemmap_end in hugetlb_vmemmap_alloc/free. David suggested we can compute what hugetlb_vmemmap_init() does now at runtime. We do not need to worry for the overhead of computing at runtime since the calculation is simple enough and those functions are not in a hot path. This commit has the following improvements: 1) The function suffixed name ("optimize/restore") is more expressive. 2) The logic becomes less weird in hugetlb_vmemmap_optimize/restore(). 3) The hugetlb_vmemmap_init() does not need to be exported anymore. 4) A ->optimize_vmemmap_pages field in struct hstate is killed. 5) There is only one place where checks is_power_of_2(sizeof(struct page)) instead of two places. 6) Add more comments for hugetlb_vmemmap_optimize/restore(). 7) For external users, hugetlb_optimize_vmemmap_pages() is used for detecting if the HugeTLB's vmemmap pages is optimizable originally. In this commit, it is killed and we introduce a new helper hugetlb_vmemmap_optimizable() to replace it. The name is more expressive. Link: https://lore.kernel.org/all/20220404074652.68024-2-songmuchun@bytedance.com/ [1] Link: https://lkml.kernel.org/r/20220628092235.91270-7-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Will Deacon <will@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-30sysctl: add proc_dointvec_ms_jiffies_minmaxYuwei Wang
add proc_dointvec_ms_jiffies_minmax to fit read msecs value to jiffies with a limited range of values Signed-off-by: Yuwei Wang <wangyuweihx@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-03net: sysctl: introduce sysctl SYSCTL_THREETonghao Zhang
This patch introdues the SYSCTL_THREE. KUnit: [00:10:14] ================ sysctl_test (10 subtests) ================= [00:10:14] [PASSED] sysctl_test_api_dointvec_null_tbl_data [00:10:14] [PASSED] sysctl_test_api_dointvec_table_maxlen_unset [00:10:14] [PASSED] sysctl_test_api_dointvec_table_len_is_zero [00:10:14] [PASSED] sysctl_test_api_dointvec_table_read_but_position_set [00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_positive [00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_negative [00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_positive [00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_negative [00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_less_int_min [00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_greater_int_max [00:10:14] =================== [PASSED] sysctl_test =================== ./run_kselftest.sh -c sysctl ... ok 1 selftests: sysctl: sysctl.sh Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Lorenz Bauer <lmb@cloudflare.com> Cc: Akhmat Karakotov <hmukos@yandex-team.ru> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-30include/linux/sysctl.h: fix register_sysctl_mount_point() return typeAndrew Morton
The CONFIG_SYSCTL=n stub returns the wrong type. Fixes: ee9efac48a082 ("sysctl: add helper to register a sysctl mount point") Reported-by: kernel test robot <lkp@intel.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22kernel/sysctl.c: rename sysctl_init() to sysctl_init_bases()Luis Chamberlain
Rename sysctl_init() to sysctl_init_bases() so to reflect exactly what this is doing. Link: https://lkml.kernel.org/r/20211129211943.640266-4-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Stephen Kitt <steve@sk2.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22sysctl: add and use base directory declarer and registration helperLuis Chamberlain
Patch series "sysctl: add and use base directory declarer and registration helper". In this patch series we start addressing base directories, and so we start with the "fs" sysctls. The end goal is we end up completely moving all "fs" sysctl knobs out from kernel/sysctl. This patch (of 6): Add a set of helpers which can be used to declare and register base directory sysctls on their own. We do this so we can later move each of the base sysctl directories like "fs", "kernel", etc, to their own respective files instead of shoving the declarations and registrations all on kernel/sysctl.c. The lazy approach has caught up and with this, we just end up extending the list of base directories / sysctls on one file and this makes maintenance difficult due to merge conflicts from many developers. The declarations are used first by kernel/sysctl.c for registration its own base which over time we'll try to clean up. It will be used in the next patch to demonstrate how to cleanly deal with base sysctl directories. [mcgrof@kernel.org: null-terminate the ctl_table arrays] Link: https://lkml.kernel.org/r/YafJY3rXDYnjK/gs@bombadil.infradead.org Link: https://lkml.kernel.org/r/20211129211943.640266-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211129211943.640266-2-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Xiaoming Ni <nixiaoming@huawei.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Stephen Kitt <steve@sk2.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Antti Palosaari <crope@iki.fi> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Eric Biggers <ebiggers@google.com> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22fs: move pipe sysctls to is own fileLuis Chamberlain
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the pipe sysctls to its own file. Link: https://lkml.kernel.org/r/20211129205548.605569-10-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Stephen Kitt <steve@sk2.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22sysctl: move maxolduid as a sysctl specific constLuis Chamberlain
The maxolduid value is only shared for sysctl purposes for use on a max range. Just stuff this into our shared const array. [akpm@linux-foundation.org: fix sysctl_vals[], per Mickaël] Link: https://lkml.kernel.org/r/20211129205548.605569-5-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Stephen Kitt <steve@sk2.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22sysctl: share unsigned long const valuesLuis Chamberlain
Provide a way to share unsigned long values. This will allow others to not have to re-invent these values. Link: https://lkml.kernel.org/r/20211124231435.1445213-9-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Qing Wang <wangqing@vivo.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22sysctl: add helper to register a sysctl mount pointLuis Chamberlain
The way to create a subdirectory on top of sysctl_mount_point is a bit obscure, and *why* we do that even so more. Provide a helper which makes it clear why we do this. [akpm@linux-foundation.org: export register_sysctl_mount_point() to modules] Link: https://lkml.kernel.org/r/20211124231435.1445213-4-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Qing Wang <wangqing@vivo.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22random: move the random sysctl declarations to its own fileXiaoming Ni
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the random sysctls to their own file and use register_sysctl_init(). [mcgrof@kernel.org: commit log update to justify the move] Link: https://lkml.kernel.org/r/20211124231435.1445213-3-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Qing Wang <wangqing@vivo.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22firmware_loader: move firmware sysctl to its own filesXiaoming Ni
Patch series "sysctl: 3rd set of kernel/sysctl cleanups", v2. This is the third set of patches to help address cleaning the kitchen seink in kernel/sysctl.c and to move sysctls away to where they are actually implemented / used. This patch (of 8): kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the firmware configuration sysctl table to the only place where it is used, and make it clear that if sysctls are disabled this is not used. [akpm@linux-foundation.org: export register_firmware_config_sysctl and unregister_firmware_config_sysctl to modules] [akpm@linux-foundation.org: use EXPORT_SYMBOL_NS_GPL instead] [sfr@canb.auug.org.au: fix that so it compiles] Link: https://lkml.kernel.org/r/20211201160626.401d828d@canb.auug.org.au [mcgrof@kernel.org: major commit log update to justify the move] Link: https://lkml.kernel.org/r/20211124231435.1445213-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211124231435.1445213-2-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Stephen Kitt <steve@sk2.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Antti Palosaari <crope@iki.fi> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Mark Fasheh <mark@fasheh.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Qing Wang <wangqing@vivo.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22eventpoll: simplify sysctl declaration with register_sysctl()Xiaoming Ni
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the epoll_table sysctl to fs/eventpoll.c and use register_sysctl(). Link: https://lkml.kernel.org/r/20211123202422.819032-9-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Qing Wang <wangqing@vivo.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22sysctl: move some boundary constants from sysctl.c to sysctl_valsXiaoming Ni
sysctl has helpers which let us specify boundary values for a min or max int value. Since these are used for a boundary check only they don't change, so move these variables to sysctl_vals to avoid adding duplicate variables. This will help with our cleanup of kernel/sysctl.c. [akpm@linux-foundation.org: update it for "mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%"] [mcgrof@kernel.org: major rebase] Link: https://lkml.kernel.org/r/20211123202347.818157-3-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Qing Wang <wangqing@vivo.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22sysctl: add a new register_sysctl_init() interfaceXiaoming Ni
Patch series "sysctl: first set of kernel/sysctl cleanups", v2. Finally had time to respin the series of the work we had started last year on cleaning up the kernel/sysct.c kitchen sink. People keeps stuffing their sysctls in that file and this creates a maintenance burden. So this effort is aimed at placing sysctls where they actually belong. I'm going to split patches up into series as there is quite a bit of work. This first set adds register_sysctl_init() for uses of registerting a sysctl on the init path, adds const where missing to a few places, generalizes common values so to be more easy to share, and starts the move of a few kernel/sysctl.c out where they belong. The majority of rework on v2 in this first patch set is 0-day fixes. Eric Biederman's feedback is later addressed in subsequent patch sets. I'll only post the first two patch sets for now. We can address the rest once the first two patch sets get completely reviewed / Acked. This patch (of 9): The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. Today though folks heavily rely on tables on kernel/sysctl.c so they can easily just extend this table with their needed sysctls. In order to help users move their sysctls out we need to provide a helper which can be used during code initialization. We special-case the initialization use of register_sysctl() since it *is* safe to fail, given all that sysctls do is provide a dynamic interface to query or modify at runtime an existing variable. So the use case of register_sysctl() on init should *not* stop if the sysctls don't end up getting registered. It would be counter productive to stop boot if a simple sysctl registration failed. Provide a helper for init then, and document the recommended init levels to use for callers of this routine. We will later use this in subsequent patches to start slimming down kernel/sysctl.c tables and moving sysctl registration to the code which actually needs these sysctls. [mcgrof@kernel.org: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ] Link: https://lkml.kernel.org/r/20211123202347.818157-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211123202347.818157-2-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Paul Turner <pjt@google.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Qing Wang <wangqing@vivo.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Stephen Kitt <steve@sk2.org> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-17sysctl: introduce new proc handler proc_doboolJia He
This is to let bool variable could be correctly displayed in big/little endian sysctl procfs. sizeof(bool) is arch dependent, proc_dobool should work in all arches. Suggested-by: Pan Xinhui <xinhui@linux.vnet.ibm.com> Signed-off-by: Jia He <hejianet@gmail.com> [thuth: rebased the patch to the current kernel version] Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-25sysctl: add proc_dou8vec_minmax()Eric Dumazet
Networking has many sysctls that could fit in one u8. This patch adds proc_dou8vec_minmax() for this purpose. Note that the .extra1 and .extra2 fields are pointing to integers, because it makes conversions easier. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14all arch: remove system call sys_sysctlXiaoming Ni
Since commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"), sys_sysctl is actually unavailable: any input can only return an error. We have been warning about people using the sysctl system call for years and believe there are no more users. Even if there are users of this interface if they have not complained or fixed their code by now they probably are not going to, so there is no point in warning them any longer. So completely remove sys_sysctl on all architectures. [nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu] Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.com Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Will Deacon <will@kernel.org> [arm/arm64] Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bin Meng <bin.meng@windriver.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: chenzefeng <chenzefeng2@huawei.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christian Brauner <christian@brauner.io> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Diego Elio Pettenò <flameeyes@flameeyes.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kars de Jong <jongk@linux-m68k.org> Cc: Kees Cook <keescook@chromium.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Nick Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Olof Johansson <olof@lixom.net> Cc: Paul Burton <paulburton@kernel.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Sven Schnelle <svens@stackframe.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com> Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08kernel/sysctl: support setting sysctl parameters from kernel command lineVlastimil Babka
Patch series "support setting sysctl parameters from kernel command line", v3. This series adds support for something that seems like many people always wanted but nobody added it yet, so here's the ability to set sysctl parameters via kernel command line options in the form of sysctl.vm.something=1 The important part is Patch 1. The second, not so important part is an attempt to clean up legacy one-off parameters that do the same thing as a sysctl. I don't want to remove them completely for compatibility reasons, but with generic sysctl support the idea is to remove the one-off param handlers and treat the parameters as aliases for the sysctl variants. I have identified several parameters that mention sysctl counterparts in Documentation/admin-guide/kernel-parameters.txt but there might be more. The conversion also has varying level of success: - numa_zonelist_order is converted in Patch 2 together with adding the necessary infrastructure. It's easy as it doesn't really do anything but warn on deprecated value these days. - hung_task_panic is converted in Patch 3, but there's a downside that now it only accepts 0 and 1, while previously it was any integer value - nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic, so there's no straighforward conversion possible - traceoff_on_warning is a flag without value and it would be required to handle that somehow in the conversion infractructure, which seems pointless for a single flag This patch (of 5): A recently proposed patch to add vm_swappiness command line parameter in addition to existing sysctl [1] made me wonder why we don't have a general support for passing sysctl parameters via command line. Googling found only somebody else wondering the same [2], but I haven't found any prior discussion with reasons why not to do this. Settings the vm_swappiness issue aside (the underlying issue might be solved in a different way), quick search of kernel-parameters.txt shows there are already some that exist as both sysctl and kernel parameter - hung_task_panic, nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism would remove the need to add more of those one-offs and might be handy in situations where configuration by e.g. /etc/sysctl.d/ is impractical. Hence, this patch adds a new parse_args() pass that looks for parameters prefixed by 'sysctl.' and tries to interpret them as writes to the corresponding sys/ files using an temporary in-kernel procfs mount. This mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically registered sysctl tables, even though we don't handle modular sysctls. Errors due to e.g. invalid parameter name or value are reported in the kernel log. The processing is hooked right before the init process is loaded, as some handlers might be more complicated than simple setters and might need some subsystems to be initialized. At the moment the init process can be started and eventually execute a process writing to /proc/sys/ then it should be also fine to do that from the kernel. Sysctls registered later on module load time are not set by this mechanism - it's expected that in such scenarios, setting sysctl values from userspace is practical enough. [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/ [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter [3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Link: http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz Link: http://lkml.kernel.org/r/20200427180433.7029-2-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-27sysctl: pass kernel pointers to ->proc_handlerChristoph Hellwig
Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27sysctl: remove all extern declaration from sysctl.cChristoph Hellwig
Extern declarations in .c files are a bad style and can lead to mismatches. Use existing definitions in headers where they exist, and otherwise move the external declarations to suitable header files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-04include/linux/sysctl.h: inline braces for ctl_table and ctl_table_headerAlessio Balsini
Fix coding style of "struct ctl_table" and "struct ctl_table_header" to have inline braces. Besides the wide use of this proposed cose style, this change helps to find at a glance the struct definition when navigating the code. Link: http://lkml.kernel.org/r/20190903154906.188651-1-balsini@android.com Signed-off-by: Alessio Balsini <balsini@android.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-14sysctl: define proc_do_static_key()Eric Dumazet
Convert proc_dointvec_minmax_bpf_stats() into a more generic helper, since we are going to use jump labels more often. Note that sysctl_bpf_stats_enabled is removed, since it is no longer needed/used. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06pipe, sysctl: remove pipe_proc_fn()Eric Biggers
pipe_proc_fn() is no longer needed, as it only calls through to proc_dopipe_max_size(). Just put proc_dopipe_max_size() in the ctl_table entry directly, and remove the unneeded EXPORT_SYMBOL() and the ENOSYS stub for it. (The reason the ENOSYS stub isn't needed is that the pipe-max-size ctl_table entry is located directly in 'kern_table' rather than being registered separately. Therefore, the entry is already only defined when the kernel is built with sysctl support.) Link: http://lkml.kernel.org/r/20180111052902.14409-3-ebiggers3@gmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Luis R . Rodriguez" <mcgrof@kernel.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17pipe: add proc_dopipe_max_size() to safely assign pipe_max_sizeJoe Lawrence
pipe_max_size is assigned directly via procfs sysctl: static struct ctl_table fs_table[] = { ... { .procname = "pipe-max-size", .data = &pipe_max_size, .maxlen = sizeof(int), .mode = 0644, .proc_handler = &pipe_proc_fn, .extra1 = &pipe_min_size, }, ... int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf, size_t *lenp, loff_t *ppos) { ... ret = proc_dointvec_minmax(table, write, buf, lenp, ppos) ... and then later rounded in-place a few statements later: ... pipe_max_size = round_pipe_size(pipe_max_size); ... This leaves a window of time between initial assignment and rounding that may be visible to other threads. (For example, one thread sets a non-rounded value to pipe_max_size while another reads its value.) Similar reads of pipe_max_size are potentially racy: pipe.c :: alloc_pipe_info() pipe.c :: pipe_set_size() Add a new proc_dopipe_max_size() that consolidates reading the new value from the user buffer, verifying bounds, and calling round_pipe_size() with a single assignment to pipe_max_size. Link: http://lkml.kernel.org/r/1507658689-11669-4-git-send-email-joe.lawrence@redhat.com Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-09sysctl: add register_sysctl() dummy helperArnd Bergmann
register_sysctl() has been around for five years with commit fea478d4101a ("sysctl: Add register_sysctl for normal sysctl users") but now that arm64 started using it, I ran into a compile error: arch/arm64/kernel/armv8_deprecated.c: In function 'register_insn_emulation_sysctl': arch/arm64/kernel/armv8_deprecated.c:257:2: error: implicit declaration of function 'register_sysctl' This adds a inline function like we already have for register_sysctl_paths() and register_sysctl_table(). Link: http://lkml.kernel.org/r/20171106133700.558647-1-arnd@arndb.de Fixes: 38b9aeb32fa7 ("arm64: Port deprecated instruction emulation to new sysctl interface") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Alex Benne <alex.bennee@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-19Merge tag 'gcc-plugins-v4.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull structure randomization updates from Kees Cook: "Now that IPC and other changes have landed, enable manual markings for randstruct plugin, including the task_struct. This is the rest of what was staged in -next for the gcc-plugins, and comes in three patches, largest first: - mark "easy" structs with __randomize_layout - mark task_struct with an optional anonymous struct to isolate the __randomize_layout section - mark structs to opt _out_ of automated marking (which will come later) And, FWIW, this continues to pass allmodconfig (normal and patched to enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and s390 for me" * tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: randstruct: opt-out externally exposed function pointer structs task_struct: Allow randomized layout randstruct: Mark various structs for randomization
2017-07-13Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: - various misc things - kexec updates - sysctl core updates - scripts/gdb udpates - checkpoint-restart updates - ipc updates - kernel/watchdog updates - Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature" - "stackprotector: ascii armor the stack canary" - more MM bits - checkpatch updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits) writeback: rework wb_[dec|inc]_stat family of functions ARM: samsung: usb-ohci: move inline before return type video: fbdev: omap: move inline before return type video: fbdev: intelfb: move inline before return type USB: serial: safe_serial: move __inline__ before return type drivers: tty: serial: move inline before return type drivers: s390: move static and inline before return type x86/efi: move asmlinkage before return type sh: move inline before return type MIPS: SMP: move asmlinkage before return type m68k: coldfire: move inline before return type ia64: sn: pci: move inline before type ia64: move inline before return type FRV: tlbflush: move asmlinkage before return type CRIS: gpio: move inline before return type ARM: HP Jornada 7XX: move inline before return type ARM: KVM: move asmlinkage before type checkpatch: improve the STORAGE_CLASS test mm, migration: do not trigger OOM killer when migrating memory drm/i915: use __GFP_RETRY_MAYFAIL ...
2017-07-12sysctl: add unsigned int range supportLuis R. Rodriguez
To keep parity with regular int interfaces provide the an unsigned int proc_douintvec_minmax() which allows you to specify a range of allowed valid numbers. Adding proc_douintvec_minmax_sysadmin() is easy but we can wait for an actual user for that. Link: http://lkml.kernel.org/r/20170519033554.18592-6-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-11proc: Fix proc_sys_prune_dcache to hold a sb referenceEric W. Biederman
Andrei Vagin writes: FYI: This bug has been reproduced on 4.11.7 > BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo} still in use (1) [unmount of proc proc] > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80 > CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1 > Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015 > Workqueue: events proc_cleanup_work > Call Trace: > dump_stack+0x63/0x86 > __warn+0xcb/0xf0 > warn_slowpath_null+0x1d/0x20 > umount_check+0x6e/0x80 > d_walk+0xc6/0x270 > ? dentry_free+0x80/0x80 > do_one_tree+0x26/0x40 > shrink_dcache_for_umount+0x2d/0x90 > generic_shutdown_super+0x1f/0xf0 > kill_anon_super+0x12/0x20 > proc_kill_sb+0x40/0x50 > deactivate_locked_super+0x43/0x70 > deactivate_super+0x5a/0x60 > cleanup_mnt+0x3f/0x90 > mntput_no_expire+0x13b/0x190 > kern_unmount+0x3e/0x50 > pid_ns_release_proc+0x15/0x20 > proc_cleanup_work+0x15/0x20 > process_one_work+0x197/0x450 > worker_thread+0x4e/0x4a0 > kthread+0x109/0x140 > ? process_one_work+0x450/0x450 > ? kthread_park+0x90/0x90 > ret_from_fork+0x2c/0x40 > ---[ end trace e1c109611e5d0b41 ]--- > VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds. Have a nice day... > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: _raw_spin_lock+0xc/0x30 > PGD 0 Fix this by taking a reference to the super block in proc_sys_prune_dcache. The superblock reference is the core of the fix however the sysctl_inodes list is converted to a hlist so that hlist_del_init_rcu may be used. This allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while not causing problems for proc_sys_evict_inode when if it later choses to remove the inode from the sysctl_inodes list. Removing inodes from the sysctl_inodes list allows proc_sys_prune_dcache to have a progress guarantee, while still being able to drop all locks. The fact that head->unregistering is set in start_unregistering ensures that no more inodes will be added to the the sysctl_inodes list. Previously the code did a dance where it delayed calling iput until the next entry in the list was being considered to ensure the inode remained on the sysctl_inodes list until the next entry was walked to. The structure of the loop in this patch does not need that so is much easier to understand and maintain. Cc: stable@vger.kernel.org Reported-by: Andrei Vagin <avagin@gmail.com> Tested-by: Andrei Vagin <avagin@openvz.org> Fixes: ace0c791e6c3 ("proc/sysctl: Don't grab i_lock under sysctl_lock.") Fixes: d6cffbbe9a7e ("proc/sysctl: prune stale dentries during unregistering") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-06-30randstruct: Mark various structs for randomizationKees Cook
This marks many critical kernel structures for randomization. These are structures that have been targeted in the past in security exploits, or contain functions pointers, pointers to function pointer tables, lists, workqueues, ref-counters, credentials, permissions, or are otherwise sensitive. This initial list was extracted from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Left out of this list is task_struct, which requires special handling and will be covered in a subsequent patch. Signed-off-by: Kees Cook <keescook@chromium.org>
2017-04-16sysctl: Remove dead register_sysctl_rootEric W. Biederman
The function no longer does anything. The is only a single caller of register_sysctl_root when semantically there should be two. Remove this function so that if someone decides this functionality is needed again it will be obvious all of the callers of setup_sysctl_set need to be audited and modified appropriately. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-02-13proc/sysctl: prune stale dentries during unregisteringKonstantin Khlebnikov
Currently unregistering sysctl table does not prune its dentries. Stale dentries could slowdown sysctl operations significantly. For example, command: # for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done creates a millions of stale denties around sysctls of loopback interface: # sysctl fs.dentry-state fs.dentry-state = 25812579 24724135 45 0 0 0 All of them have matching names thus lookup have to scan though whole hash chain and call d_compare (proc_sys_compare) which checks them under system-wide spinlock (sysctl_lock). # time sysctl -a > /dev/null real 1m12.806s user 0m0.016s sys 1m12.400s Currently only memory reclaimer could remove this garbage. But without significant memory pressure this never happens. This patch collects sysctl inodes into list on sysctl table header and prunes all their dentries once that table unregisters. Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes: > On 10.02.2017 10:47, Al Viro wrote: >> how about >> the matching stats *after* that patch? > > dcache size doesn't grow endlessly, so stats are fine > > # sysctl fs.dentry-state > fs.dentry-state = 92712 58376 45 0 0 0 > > # time sysctl -a &>/dev/null > > real 0m0.013s > user 0m0.004s > sys 0m0.008s Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-10-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "This set of changes is a number of smaller things that have been overlooked in other development cycles focused on more fundamental change. The devpts changes are small things that were a distraction until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an trivial regression fix to autofs for the unprivileged mount changes that went in last cycle. A pair of ioctls has been added by Andrey Vagin making it is possible to discover the relationships between namespaces when referring to them through file descriptors. The big user visible change is starting to add simple resource limits to catch programs that misbehave. With namespaces in general and user namespaces in particular allowing users to use more kinds of resources, it has become important to have something to limit errant programs. Because the purpose of these limits is to catch errant programs the code needs to be inexpensive to use as it always on, and the default limits need to be high enough that well behaved programs on well behaved systems don't encounter them. To this end, after some review I have implemented per user per user namespace limits, and use them to limit the number of namespaces. The limits being per user mean that one user can not exhause the limits of another user. The limits being per user namespace allow contexts where the limit is 0 and security conscious folks can remove from their threat anlysis the code used to manage namespaces (as they have historically done as it root only). At the same time the limits being per user namespace allow other parts of the system to use namespaces. Namespaces are increasingly being used in application sand boxing scenarios so an all or nothing disable for the entire system for the security conscious folks makes increasing use of these sandboxes impossible. There is also added a limit on the maximum number of mounts present in a single mount namespace. It is nontrivial to guess what a reasonable system wide limit on the number of mount structure in the kernel would be, especially as it various based on how a system is using containers. A limit on the number of mounts in a mount namespace however is much easier to understand and set. In most cases in practice only about 1000 mounts are used. Given that some autofs scenarious have the potential to be 30,000 to 50,000 mounts I have set the default limit for the number of mounts at 100,000 which is well above every known set of users but low enough that the mount hash tables don't degrade unreaonsably. These limits are a start. I expect this estabilishes a pattern that other limits for resources that namespaces use will follow. There has been interest in making inotify event limits per user per user namespace as well as interest expressed in making details about what is going on in the kernel more visible" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits) autofs: Fix automounts by using current_real_cred()->uid mnt: Add a per mount namespace limit on the number of mounts netns: move {inc,dec}_net_namespaces into #ifdef nsfs: Simplify __ns_get_path tools/testing: add a test to check nsfs ioctl-s nsfs: add ioctl to get a parent namespace nsfs: add ioctl to get an owning user namespace for ns file descriptor kernel: add a helper to get an owning user namespace for a namespace devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts devpts: Remove sync_filesystems devpts: Make devpts_kill_sb safe if fsi is NULL devpts: Simplify devpts_mount by using mount_nodev devpts: Move the creation of /dev/pts/ptmx into fill_super devpts: Move parse_mount_options into fill_super userns: When the per user per user namespace limit is reached return ENOSPC userns; Document per user per user namespace limits. mntns: Add a limit on the number of mount namespaces. netns: Add a limit on the number of net namespaces cgroupns: Add a limit on the number of cgroup namespaces ipcns: Add a limit on the number of ipc namespaces ...
2016-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26sysctl: handle error writing UINT_MAX to u32 fieldsSubash Abhinov Kasiviswanathan
We have scripts which write to certain fields on 3.18 kernels but this seems to be failing on 4.4 kernels. An entry which we write to here is xfrm_aevent_rseqth which is u32. echo 4294967295 > /proc/sys/net/core/xfrm_aevent_rseqth Commit 230633d109e3 ("kernel/sysctl.c: detect overflows when converting to int") prevented writing to sysctl entries when integer overflow occurs. However, this does not apply to unsigned integers. Heinrich suggested that we introduce a new option to handle 64 bit limits and set min as 0 and max as UINT_MAX. This might not work as it leads to issues similar to __do_proc_doulongvec_minmax. Alternatively, we would need to change the datatype of the entry to 64 bit. static int __do_proc_doulongvec_minmax(void *data, struct ctl_table { i = (unsigned long *) data; //This cast is causing to read beyond the size of data (u32) vleft = table->maxlen / sizeof(unsigned long); //vleft is 0 because maxlen is sizeof(u32) which is lesser than sizeof(unsigned long) on x86_64. Introduce a new proc handler proc_douintvec. Individual proc entries will need to be updated to use the new handler. [akpm@linux-foundation.org: coding-style fixes] Fixes: 230633d109e3 ("kernel/sysctl.c:detect overflows when converting to int") Link: http://lkml.kernel.org/r/1471479806-5252-1-git-send-email-subashab@codeaurora.org Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-14net: make net namespace sysctls belong to container's ownerDmitry Torokhov
If net namespace is attached to a user namespace let's make container's root owner of sysctls affecting said network namespace instead of global root. This also allows us to clean up net_ctl_permissions() because we do not need to fudge permissions anymore for the container's owner since it now owns the objects in question. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>