linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-06-17	open: add close_range()	Christian Brauner
	This adds the close_range() syscall. It allows to efficiently close a range of file descriptors up to all file descriptors of a calling task. I was contacted by FreeBSD as they wanted to have the same close_range() syscall as we proposed here. We've coordinated this and in the meantime, Kyle was fast enough to merge close_range() into FreeBSD already in April: https://reviews.freebsd.org/D21627 https://svnweb.freebsd.org/base?view=revision&revision=359836 and the current plan is to backport close_range() to FreeBSD 12.2 (cf. [2]) once its merged in Linux too. Python is in the process of switching to close_range() on FreeBSD and they are waiting on us to merge this to switch on Linux as well: https://bugs.python.org/issue38061 The syscall came up in a recent discussion around the new mount API and making new file descriptor types cloexec by default. During this discussion, Al suggested the close_range() syscall (cf. [1]). Note, a syscall in this manner has been requested by various people over time. First, it helps to close all file descriptors of an exec()ing task. This can be done safely via (quoting Al's example from [1] verbatim): /* that exec is sensitive / unshare(CLONE_FILES); / we don't want anything past stderr here / close_range(3, ~0U); execve(....); The code snippet above is one way of working around the problem that file descriptors are not cloexec by default. This is aggravated by the fact that we can't just switch them over without massively regressing userspace. For a whole class of programs having an in-kernel method of closing all file descriptors is very helpful (e.g. demons, service managers, programming language standard libraries, container managers etc.). (Please note, unshare(CLONE_FILES) should only be needed if the calling task is multi-threaded and shares the file descriptor table with another thread in which case two threads could race with one thread allocating file descriptors and the other one closing them via close_range(). For the general case close_range() before the execve() is sufficient.) Second, it allows userspace to avoid implementing closing all file descriptors by parsing through /proc/<pid>/fd/ and calling close() on each file descriptor. From looking at various large(ish) userspace code bases this or similar patterns are very common in: - service managers (cf. [4]) - libcs (cf. [6]) - container runtimes (cf. [5]) - programming language runtimes/standard libraries - Python (cf. [2]) - Rust (cf. [7], [8]) As Dmitry pointed out there's even a long-standing glibc bug about missing kernel support for this task (cf. [3]). In addition, the syscall will also work for tasks that do not have procfs mounted and on kernels that do not have procfs support compiled in. In such situations the only way to make sure that all file descriptors are closed is to call close() on each file descriptor up to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery (cf. comment [8] on Rust). The performance is striking. For good measure, comparing the following simple close_all_fds() userspace implementation that is essentially just glibc's version in [6]: static int close_all_fds(void) { int dir_fd; DIR dir; struct dirent direntp; dir = opendir("/proc/self/fd"); if (!dir) return -1; dir_fd = dirfd(dir); while ((direntp = readdir(dir))) { int fd; if (strcmp(direntp->d_name, ".") == 0) continue; if (strcmp(direntp->d_name, "..") == 0) continue; fd = atoi(direntp->d_name); if (fd == dir_fd \|\| fd == 0 \|\| fd == 1 \|\| fd == 2) continue; close(fd); } closedir(dir); return 0; } to close_range() yields: 1. closing 4 open files: - close_all_fds(): ~280 us - close_range(): ~24 us 2. closing 1000 open files: - close_all_fds(): ~5000 us - close_range(): ~800 us close_range() is designed to allow for some flexibility. Specifically, it does not simply always close all open file descriptors of a task. Instead, callers can specify an upper bound. This is e.g. useful for scenarios where specific file descriptors are created with well-known numbers that are supposed to be excluded from getting closed. For extra paranoia close_range() comes with a flags argument. This can e.g. be used to implement extension. Once can imagine userspace wanting to stop at the first error instead of ignoring errors under certain circumstances. There might be other valid ideas in the future. In any case, a flag argument doesn't hurt and keeps us on the safe side. From an implementation side this is kept rather dumb. It saw some input from David and Jann but all nonsense is obviously my own! - Errors to close file descriptors are currently ignored. (Could be changed by setting a flag in the future if needed.) - __close_range() is a rather simplistic wrapper around __close_fd(). My reasoning behind this is based on the nature of how __close_fd() needs to release an fd. But maybe I misunderstood specifics: We take the files_lock and rcu-dereference the fdtable of the calling task, we find the entry in the fdtable, get the file and need to release files_lock before calling filp_close(). In the meantime the fdtable might have been altered so we can't just retake the spinlock and keep the old rcu-reference of the fdtable around. Instead we need to grab a fresh reference to the fdtable. If my reasoning is correct then there's really no point in fancyfying __close_range(): We just need to rcu-dereference the fdtable of the calling task once to cap the max_fd value correctly and then go on calling __close_fd() in a loop. /* References */ [1]: https://lore.kernel.org/lkml/20190516165021.GD17978@ZenIV.linux.org.uk/ [2]: https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L220 [3]: https://sourceware.org/bugzilla/show_bug.cgi?id=10353#c7 [4]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa3b338/src/basic/fd-util.c#L217 [5]: https://github.com/lxc/lxc/blob/ddf4b77e11a4d08f09b7b9cd13e593f8c047edc5/src/lxc/start.c#L236 [6]: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/grantpt.c;h=2030e07fa6e652aac32c775b8c6e005844c3c4eb;hb=HEAD#l17 Note that this is an internal implementation that is not exported. Currently, libc seems to not provide an exported version of this because of missing kernel support to do this. Note, in a recent patch series Florian made grantpt() a nop thereby removing the code referenced here. [7]: https://github.com/rust-lang/rust/issues/12148 [8]: https://github.com/rust-lang/rust/blob/5f47c0613ed4eb46fca3633c1297364c09e5e451/src/libstd/sys/unix/process2.rs#L303-L308 Rust's solution is slightly different but is equally unperformant. Rust calls getdtablesize() which is a glibc library function that simply returns the current RLIMIT_NOFILE or OPEN_MAX values. Rust then goes on to call close() on each fd. That's obviously overkill for most tasks. Rarely, tasks - especially non-demons - hit RLIMIT_NOFILE or OPEN_MAX. Let's be nice and assume an unprivileged user with RLIMIT_NOFILE set to 1024. Even in this case, there's a very high chance that in the common case Rust is calling the close() syscall 1021 times pointlessly if the task just has 0, 1, and 2 open. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kyle Evans <self@kyle-evans.net> Cc: Jann Horn <jannh@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Dmitry V. Levin <ldv@altlinux.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: linux-api@vger.kernel.org
2020-06-16	EDAC: Remove edac_get_dimm_by_index()	Borislav Petkov
	It is unused now. Signed-off-by: Borislav Petkov <bp@suse.de>
2020-06-16	gpu: host1x: Correct trivial kernel-doc inconsistencies	Colton Lewis
	Silence documentation build warnings by adding kernel-doc fields. ./include/linux/host1x.h:69: warning: Function parameter or member 'parent' not described in 'host1x_client' ./include/linux/host1x.h:69: warning: Function parameter or member 'usecount' not described in 'host1x_client' ./include/linux/host1x.h:69: warning: Function parameter or member 'lock' not described in 'host1x_client' Signed-off-by: Colton Lewis <colton.w.lewis@protonmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2020-06-16	libceph: move away from global osd_req_flags	Ilya Dryomov
	osd_req_flags is overly general and doesn't suit its only user (read_from_replica option) well: - applying osd_req_flags in account_request() affects all OSD requests, including linger (i.e. watch and notify). However, linger requests should always go to the primary even though some of them are reads (e.g. notify has side effects but it is a read because it doesn't result in mutation on the OSDs). - calls to class methods that are reads are allowed to go to the replica, but most such calls issued for "rbd map" and/or exclusive lock transitions are requested to be resent to the primary via EAGAIN, doubling the latency. Get rid of global osd_req_flags and set read_from_replica flag only on specific OSD requests instead. Fixes: 8ad44d5e0d1e ("libceph: read_from_replica option") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2020-06-16	interconnect: Mark all dummy functions as static inline	Georgi Djakov
	There are a few dummy stub functions that are not marked as static inline yet. Currently this header file is not included in any other file outside of drivers/interconnect/, but that might not be the case in the future. If this file gets included and the framework is disabled, we will be see warnings. Let's fix this in advance. Link: https://lore.kernel.org/r/20200228145945.13579-1-georgi.djakov@linaro.org Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
2020-06-16	interconnect: Allow inter-provider pairs to be configured	Artur Świgoń
	This patch adds support for a new boolean 'inter_set' field in struct icc_provider. Setting it to 'true' enables calling '->set' for inter-provider node pairs. All existing users of the interconnect framework allocate this structure with kzalloc, and are therefore unaffected by this change. This makes it easier for hierarchies like exynos-bus, where every bus is probed separately and registers a separate interconnect provider, to model constraints between buses. Signed-off-by: Artur Świgoń <a.swigon@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20200521122841.8867-4-s.nawrocki@samsung.com Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
2020-06-16	interconnect: Export of_icc_get_from_provider()	Artur Świgoń
	This patch makes the above function public (for use in exynos-bus devfreq driver). Signed-off-by: Artur Świgoń <a.swigon@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Link: https://lore.kernel.org/r/20200521122841.8867-2-s.nawrocki@samsung.com Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
2020-06-16	compiler_attributes.h: Support no_sanitize_undefined check with GCC 4	Marco Elver
	UBSAN is supported since GCC 4.9, which unfortunately did not yet have __has_attribute(). To work around, the __GCC4_has_attribute workaround requires defining which compiler version supports the given attribute. In the case of no_sanitize_undefined, it is the first version that supports UBSAN, which is GCC 4.9. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Link: https://lkml.kernel.org/r/20200615231529.GA119644@google.com
2020-06-16	reset: simple: Add reset callback	Maxime Ripard
	The reset-simple code lacks a reset callback that is still pretty easy to implement. The only real thing to consider is the delay needed for a device to be reset, so let's expose that as part of the reset-simple driver data. Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2020-06-16	reset: Move reset-simple header out of drivers/reset	Maxime Ripard
	The reset-simple code can be useful for drivers outside of drivers/reset that have a few reset controls as part of their features. Let's move it to include/linux/reset. Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2020-06-15	tifm: Replace zero-length array with flexible-array	Gustavo A. R. Silva
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://github.com/KSPP/linux/issues/21 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15	sctp: Replace zero-length array with flexible-array	Gustavo A. R. Silva
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://github.com/KSPP/linux/issues/21 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15	libata: Replace zero-length array with flexible-array	Gustavo A. R. Silva
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://github.com/KSPP/linux/issues/21 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15	kprobes: Replace zero-length array with flexible-array	Gustavo A. R. Silva
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://github.com/KSPP/linux/issues/21 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15	kexec: Replace zero-length array with flexible-array	Gustavo A. R. Silva
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://github.com/KSPP/linux/issues/21 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15	KVM: Replace zero-length array with flexible-array	Gustavo A. R. Silva
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://github.com/KSPP/linux/issues/21 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15	FS-Cache: Replace zero-length array with flexible-array	Gustavo A. R. Silva
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://github.com/KSPP/linux/issues/21 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15	cb710: Replace zero-length array with flexible-array	Gustavo A. R. Silva
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://github.com/KSPP/linux/issues/21 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15	can: Replace zero-length array with flexible-array	Gustavo A. R. Silva
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://github.com/KSPP/linux/issues/21 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15	dmaengine: Replace zero-length array with flexible-array	Gustavo A. R. Silva
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://github.com/KSPP/linux/issues/21 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15	scsi: libata: Provide an ata_scsi_dma_need_drain stub for !CONFIG_ATA	Christoph Hellwig
	SAS drivers can be compiled with ata support disabled. Provide a stub so that the drivers don't have to ifdef around wiring up ata_scsi_dma_need_drain. Link: https://lore.kernel.org/r/20200615064624.37317-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-06-15	spi: altera: add platform data for slave information.	Xu Yilun
	This patch introduces platform data for slave information, it allows spi-altera to add new spi devices once master registration is done. Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Reviewed-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/1591845911-10197-4-git-send-email-yilun.xu@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-15	spi: altera: add SPI core parameters support via platform data.	Xu Yilun
	This patch introduced SPI core parameters in platform data, it allows passing these SPI core parameters via platform data. Signed-off-by: Wu Hao <hao.wu@intel.com> Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Reviewed-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/1591845911-10197-3-git-send-email-yilun.xu@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-15	Merge series "Add support for voltage regulator on ChromeOS EC." from ↵	Mark Brown
	Pi-Hsun Shih <pihsun@chromium.org>: Add support for controlling voltage regulator that is connected and controlled by ChromeOS EC. Kernel controls these regulators through newly added EC host commands. Changes from v5: * Move new host command to a separate patch. * Use devm_regulator_register. * Address review comments. Changes from v4: * Change compatible name from regulator-cros-ec to cros-ec-regulator. Changes from v3: * Fix dt bindings file name. * Remove check around CONFIG_OF in driver. * Add new host commands to cros_ec_trace. * Address review comments. Changes from v2: * Add 'depends on OF' to Kconfig. * Add Kconfig description about compiling as module. Changes from v1: * Change compatible string to google,regulator-cros-ec. * Use reg property in device tree. * Change license for dt binding according to checkpatch.pl. * Address comments on code styles. Pi-Hsun Shih (3): dt-bindings: regulator: Add DT binding for cros-ec-regulator platform/chrome: cros_ec: Add command for regulator control. regulator: Add driver for cros-ec-regulator .../regulator/google,cros-ec-regulator.yaml \| 51 ++++ drivers/platform/chrome/cros_ec_trace.c \| 5 + drivers/regulator/Kconfig \| 10 + drivers/regulator/Makefile \| 1 + drivers/regulator/cros-ec-regulator.c \| 257 ++++++++++++++++++ .../linux/platform_data/cros_ec_commands.h \| 82 ++++++ 6 files changed, 406 insertions(+) create mode 100644 Documentation/devicetree/bindings/regulator/google,cros-ec-regulator.yaml create mode 100644 drivers/regulator/cros-ec-regulator.c base-commit: b791d1bdf9212d944d749a5c7ff6febdba241771 -- 2.27.0.290.gba653c62da-goog
2020-06-15	platform/chrome: cros_ec: Add command for regulator control.	Pi-Hsun Shih
	Add host commands for voltage regulator control through ChromeOS EC. Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org> Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Link: https://lore.kernel.org/r/20200612040526.192878-3-pihsun@chromium.org Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-15	regmap: convert all regmap_update_bits() and co. macros to static inlines	Bartosz Golaszewski
	There's no reason to have these as macros. Let's convert them all to static inlines for better readability and stronger typing. Suggested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Link: https://lore.kernel.org/r/20200615072313.11106-1-brgl@bgdev.pl Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-15	Merge tag 'ext4-for-linus-5.8-rc1-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull more ext4 updates from Ted Ts'o: "This is the second round of ext4 commits for 5.8 merge window [1]. It includes the per-inode DAX support, which was dependant on the DAX infrastructure which came in via the XFS tree, and a number of regression and bug fixes; most notably the "BUG: using smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported by syzkaller" [1] The pull request actually came in 15 minutes after I had tagged the rc1 release. Tssk, tssk, late.. - Linus * tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers ext4: support xattr gnu.* namespace for the Hurd ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr ext4: avoid utf8_strncasecmp() with unstable name ext4: stop overwrite the errcode in ext4_setup_super ext4: fix partial cluster initialization when splitting extent ext4: avoid race conditions when remounting with options that change dax Documentation/dax: Update DAX enablement for ext4 fs/ext4: Introduce DAX inode flag fs/ext4: Remove jflag variable fs/ext4: Make DAX mount option a tri-state fs/ext4: Only change S_DAX on inode load fs/ext4: Update ext4_should_use_dax() fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS fs/ext4: Disallow verity if inode is DAX fs/ext4: Narrow scope of DAX check in setflags
2020-06-15	efi: Replace zero-length array and use struct_size() helper	Gustavo A. R. Silva
	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. Lastly, make use of the sizeof_field() helper instead of an open-coded version. This issue was found with the help of Coccinelle and audited _manually_. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200527171425.GA4053@embeddedor Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-06-15	efi/tpm: Verify event log header before parsing	Fabian Vogt
	It is possible that the first event in the event log is not actually a log header at all, but rather a normal event. This leads to the cast in __calc_tpm2_event_size being an invalid conversion, which means that the values read are effectively garbage. Depending on the first event's contents, this leads either to apparently normal behaviour, a crash or a freeze. While this behaviour of the firmware is not in accordance with the TCG Client EFI Specification, this happens on a Dell Precision 5510 with the TPM enabled but hidden from the OS ("TPM On" disabled, state otherwise untouched). The EFI firmware claims that the TPM is present and active and that it supports the TCG 2.0 event log format. Fortunately, this can be worked around by simply checking the header of the first event and the event log header signature itself. Commit b4f1874c6216 ("tpm: check event log version before reading final events") addressed a similar issue also found on Dell models. Fixes: 6b0326190205 ("efi: Attempt to get the TCG2 event log in the boot stub") Signed-off-by: Fabian Vogt <fvogt@suse.de> Link: https://lore.kernel.org/r/1927248.evlx2EsYKh@linux-e202.suse.de Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1165773 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-06-15	sched: Remove sched_set_*() return value	Peter Zijlstra
	Ingo suggested that since the new sched_set_*() functions are implemented using the 'nocheck' variants, they really shouldn't ever fail, so remove the return value. Cc: axboe@kernel.dk Cc: daniel.lezcano@linaro.org Cc: sudeep.holla@arm.com Cc: airlied@redhat.com Cc: broonie@kernel.org Cc: paulmck@kernel.org Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org>
2020-06-15	sched: Provide sched_set_fifo()	Peter Zijlstra
	SCHED_FIFO (or any static priority scheduler) is a broken scheduler model; it is fundamentally incapable of resource management, the one thing an OS is actually supposed to do. It is impossible to compose static priority workloads. One cannot take two well designed and functional static priority workloads and mash them together and still expect them to work. Therefore it doesn't make sense to expose the priority field; the kernel is fundamentally incapable of setting a sensible value, it needs systems knowledge that it doesn't have. Take away sched_setschedule() / sched_setattr() from modules and replace them with: - sched_set_fifo(p); create a FIFO task (at prio 50) - sched_set_fifo_low(p); create a task higher than NORMAL, which ends up being a FIFO task at prio 1. - sched_set_normal(p, nice); (re)set the task to normal This stops the proliferation of randomly chosen, and irrelevant, FIFO priorities that dont't really mean anything anyway. The system administrator/integrator, whoever has insight into the actual system design and requirements (userspace) can set-up appropriate priorities if and when needed. Cc: airlied@redhat.com Cc: alexander.deucher@amd.com Cc: awalls@md.metrocast.net Cc: axboe@kernel.dk Cc: broonie@kernel.org Cc: daniel.lezcano@linaro.org Cc: gregkh@linuxfoundation.org Cc: hannes@cmpxchg.org Cc: herbert@gondor.apana.org.au Cc: hverkuil@xs4all.nl Cc: john.stultz@linaro.org Cc: nico@fluxnic.net Cc: paulmck@kernel.org Cc: rafael.j.wysocki@intel.com Cc: rmk+kernel@arm.linux.org.uk Cc: sudeep.holla@arm.com Cc: tglx@linutronix.de Cc: ulf.hansson@linaro.org Cc: wim@linux-watchdog.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-15	x86/entry, ubsan, objtool: Whitelist __ubsan_handle_*()	Peter Zijlstra
	The UBSAN instrumentation only inserts external CALLs when things go 'BAD', much like WARN(). So treat them similar to WARN()s for noinstr, that is: allow them, at the risk of taking the machine down, to get their message out. Suggested-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Marco Elver <elver@google.com>
2020-06-15	compiler_types.h: Add __no_sanitize_{address,undefined} to noinstr	Marco Elver
	Adds the portable definitions for __no_sanitize_address, and __no_sanitize_undefined, and subsequently changes noinstr to use the attributes to disable instrumentation via KASAN or UBSAN. Reported-by: syzbot+dc1fa714cb070b184db5@syzkaller.appspotmail.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Link: https://lore.kernel.org/lkml/000000000000d2474c05a6c938fe@google.com/
2020-06-15	x86, kcsan: Add __no_kcsan to noinstr	Peter Zijlstra
	The 'noinstr' function attribute means no-instrumentation, this should very much include *SAN. Because lots of that is broken at present, only include KCSAN for now, as that is limited to clang11, which has sane function attribute behaviour. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-06-15	kcsan: Remove __no_kcsan_or_inline	Peter Zijlstra
	There are no more user of this function attribute, also, with us now actively supporting '__no_kcsan inline' it doesn't make sense to have in any case. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-06-15	sched/deadline: Impose global limits on sched_attr::sched_period	Peter Zijlstra
	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726161357.397880775@infradead.org
2020-06-15	isolcpus: Affine unbound kernel threads to housekeeping cpus	Marcelo Tosatti
	This is a kernel enhancement that configures the cpu affinity of kernel threads via kernel boot option nohz_full=. When this option is specified, the cpumask is immediately applied upon kthread launch. This does not affect kernel threads that specify cpu and node. This allows CPU isolation (that is not allowing certain threads to execute on certain CPUs) without using the isolcpus=domain parameter, making it possible to enable load balancing on such CPUs during runtime (see kernel-parameters.txt). Note-1: this is based off on Wind River's patch at https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch Difference being that this patch is limited to modifying kernel thread cpumask. Behaviour of other threads can be controlled via cgroups or sched_setaffinity. Note-2: Wind River's patch was based off Christoph Lameter's patch at https://lwn.net/Articles/565932/ with the only difference being the kernel parameter changed from kthread to kthread_cpus. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200527142909.23372-3-frederic@kernel.org
2020-06-15	psi: eliminate kthread_worker from psi trigger scheduling mechanism	Suren Baghdasaryan
	Each psi group requires a dedicated kthread_delayed_work and kthread_worker. Since no other work can be performed using psi_group's kthread_worker, the same result can be obtained using a task_struct and a timer directly. This makes psi triggering simpler by removing lists and locks involved with kthread_worker usage and eliminates the need for poll_scheduled atomic use in the hot path. Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200528195442.190116-1-surenb@google.com
2020-06-15	sched/cputime: Improve cputime_adjust()	Oleg Nesterov
	People report that utime and stime from /proc/<pid>/stat become very wrong when the numbers are big enough, especially if you watch these counters incrementally. Specifically, the current implementation of: stime*rtime/total, results in a saw-tooth function on top of the desired line, where the teeth grow in size the larger the values become. IOW, it has a relative error. The result is that, when watching incrementally as time progresses (for large values), we'll see periods of pure stime or utime increase, irrespective of the actual ratio we're striving for. Replace scale_stime() with a math64.h helper: mul_u64_u64_div_u64() that is far more accurate. This also allows architectures to override the implementation -- for instance they can opt for the old algorithm if this new one turns out to be too expensive for them. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200519172506.GA317395@hirez.programming.kicks-ass.net
2020-06-15	ftrace: Add symbols for ftrace trampolines	Adrian Hunter
	Symbols are needed for tools to describe instruction addresses. Pages allocated for ftrace's purposes need symbols to be created for them. Add such symbols to be visible via /proc/kallsyms. Example on x86 with CONFIG_DYNAMIC_FTRACE=y # echo function > /sys/kernel/debug/tracing/current_tracer # cat /proc/kallsyms \| grep '\[__builtin__ftrace\]' ffffffffc0238000 t ftrace_trampoline [__builtin__ftrace] Note: This patch adds "__builtin__ftrace" as a module name in /proc/kallsyms for symbols for pages allocated for ftrace's purposes, even though "__builtin__ftrace" is not a module. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200512121922.8997-7-adrian.hunter@intel.com
2020-06-15	kprobes: Add symbols for kprobe insn pages	Adrian Hunter
	Symbols are needed for tools to describe instruction addresses. Pages allocated for kprobe's purposes need symbols to be created for them. Add such symbols to be visible via /proc/kallsyms. Note: kprobe insn pages are not used if ftrace is configured. To see the effect of this patch, the kernel must be configured with: # CONFIG_FUNCTION_TRACER is not set CONFIG_KPROBES=y and for optimised kprobes: CONFIG_OPTPROBES=y Example on x86: # perf probe __schedule Added new event: probe:__schedule (on __schedule) # cat /proc/kallsyms \| grep '\[__builtin__kprobes\]' ffffffffc00d4000 t kprobe_insn_page [__builtin__kprobes] ffffffffc00d6000 t kprobe_optinsn_page [__builtin__kprobes] Note: This patch adds "__builtin__kprobes" as a module name in /proc/kallsyms for symbols for pages allocated for kprobes' purposes, even though "__builtin__kprobes" is not a module. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200528080058.20230-1-adrian.hunter@intel.com
2020-06-15	perf: Add perf text poke event	Adrian Hunter
	Record (single instruction) changes to the kernel text (i.e. self-modifying code) in order to support tracers like Intel PT and ARM CoreSight. A copy of the running kernel code is needed as a reference point (e.g. from /proc/kcore). The text poke event records the old bytes and the new bytes so that the event can be processed forwards or backwards. The basic problem is recording the modified instruction in an unambiguous manner given SMP instruction cache (in)coherence. That is, when modifying an instruction concurrently any solution with one or multiple timestamps is not sufficient: CPU0 CPU1 0 1 write insn A 2 execute insn A 3 sync-I$ 4 Due to I$, CPU1 might execute either the old or new A. No matter where we record tracepoints on CPU0, one simply cannot tell what CPU1 will have observed, except that at 0 it must be the old one and at 4 it must be the new one. To solve this, take inspiration from x86 text poking, which has to solve this exact problem due to variable length instruction encoding and I-fetch windows. 1) overwrite the instruction with a breakpoint and sync I$ This guarantees that that code flow will never hit the target instruction anymore, on any CPU (or rather, it will cause an exception). 2) issue the TEXT_POKE event 3) overwrite the breakpoint with the new instruction and sync I$ Now we know that any execution after the TEXT_POKE event will either observe the breakpoint (and hit the exception) or the new instruction. So by guarding the TEXT_POKE event with an exception on either side; we can now tell, without doubt, which instruction another CPU will have observed. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200512121922.8997-2-adrian.hunter@intel.com
2020-06-15	syscalls: Fix offset type of ksys_ftruncate()	Jiri Slaby
	After the commit below, truncate() on x86 32bit uses ksys_ftruncate(). But ksys_ftruncate() truncates the offset to unsigned long. Switch the type of offset to loff_t which is what do_sys_ftruncate() expects. Fixes: 121b32a58a3a (x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments) Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Brian Gerst <brgerst@gmail.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200610114851.28549-1-jslaby@suse.cz
2020-06-15	gpio: driver.h: fix kernel-doc markup	Mauro Carvalho Chehab
	There is one parameter with a wrong name at kernel-doc macro: ./include/linux/gpio/driver.h:499: warning: Function parameter or member 'gc' not described in 'gpiochip_add_data' ./include/linux/gpio/driver.h:499: warning: Excess function parameter 'chip' description in 'gpiochip_add_data' Fix it. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2020-06-15	crypto: ccp - Fix sparse warnings in sev-dev	Herbert Xu
	This patch fixes a bunch of sparse warnings in sev-dev where the __user marking is incorrectly handled. Reported-by: kbuild test robot <lkp@intel.com> Fixes: 7360e4b14350 ("crypto: ccp: Implement SEV_PEK_CERT_IMPORT...") Fixes: e799035609e1 ("crypto: ccp: Implement SEV_PEK_CSR ioctl...") Fixes: 76a2b524a4b1 ("crypto: ccp: Implement SEV_PDH_CERT_EXPORT...") Fixes: d6112ea0cb34 ("crypto: ccp - introduce SEV_GET_ID2 command") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-14	Merge tag 'LSM-add-setgid-hook-5.8-author-fix' of ↵	Linus Torvalds
	git://github.com/micah-morton/linux Pull SafeSetID update from Micah Morton: "Add additional LSM hooks for SafeSetID SafeSetID is capable of making allow/deny decisions for setuid calls on a system, and we want to add similar functionality for setgid calls. The work to do that is not yet complete, so probably won't make it in for v5.8, but we are looking to get this simple patch in for v5.8 since we have it ready. We are planning on the rest of the work for extending the SafeSetID LSM being merged during the v5.9 merge window" * tag 'LSM-add-setgid-hook-5.8-author-fix' of git://github.com/micah-morton/linux: security: Add LSM hooks to set*gid syscalls
2020-06-14	security: Add LSM hooks to set*gid syscalls	Thomas Cedeno
	The SafeSetID LSM uses the security_task_fix_setuid hook to filter setuid() syscalls according to its configured security policy. In preparation for adding analagous support in the LSM for setgid() syscalls, we add the requisite hook here. Tested by putting print statements in the security_task_fix_setgid hook and seeing them get hit during kernel boot. Signed-off-by: Thomas Cedeno <thomascedeno@google.com> Signed-off-by: Micah Morton <mortonm@chromium.org>
2020-06-14	Merge tag 'for-5.8-part2-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "This reverts the direct io port to iomap infrastructure of btrfs merged in the first pull request. We found problems in invalidate page that don't seem to be fixable as regressions or without changing iomap code that would not affect other filesystems. There are four reverts in total, but three of them are followup cleanups needed to revert a43a67a2d715 cleanly. The result is the buffer head based implementation of direct io. Reverts are not great, but under current circumstances I don't see better options" * tag 'for-5.8-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Revert "btrfs: switch to iomap_dio_rw() for dio" Revert "fs: remove dio_end_io()" Revert "btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK" Revert "btrfs: split btrfs_direct_IO to read and write part"
2020-06-14	iio: core: add iio_device_set_parent() helper	Alexandru Ardelean
	By default, the device allocation will also assign a parent device to the IIO device object. In cases where devm_iio_device_alloc() is used, sometimes the parent device must be different than the device used to manage the allocation. In that case, this helper should be used to change the parent, hence the requirement to call this between allocation & registration. This pattern/requirement is not very common in the IIO space, and it may be cleaned up later. But until then, assigning the parent manually between allocation & registration is slightly easier. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2020-06-14	iio: core: pass parent device as parameter during allocation	Alexandru Ardelean
	The change passes the parent device to the iio_device_alloc() call. This also updates the devm_iio_device_alloc() call to consider the device object as the parent device by default. Having it passed like this, should ensure that any IIO device object already has a device object as parent, allowing for neater control, like passing the 'indio_dev' object for other stuff [like buffers/triggers/etc], and potentially creating iiom_xxx(indio_dev) functions. With this patch, only the 'drivers/platform/x86/toshiba_acpi.c' needs an update to pass the parent object as a parameter. In the next patch all devm_iio_device_alloc() calls will be handled. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>