linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-02-21	Merge tag 'rcu.2023.02.10a' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes, perhaps most notably: - Throttling callback invocation based on the number of callbacks that are now ready to invoke instead of on the total number of callbacks - Several patches that suppress false-positive boot-time diagnostics, for example, due to lockdep not yet being initialized - Make expedited RCU CPU stall warnings dump stacks of any tasks that are blocking the stalled grace period. (Normal RCU CPU stall warnings have done this for many years) - Lazy-callback fixes to avoid delays during boot, suspend, and resume. (Note that lazy callbacks must be explicitly enabled, so this should not (yet) affect production use cases) - Make kfree_rcu() and friends take advantage of polled grace periods, thus reducing memory footprint by almost two orders of magnitude, admittedly on a microbenchmark This also begins the transition from kfree_rcu(p) to kfree_rcu_mightsleep(p). This transition was motivated by bugs where kfree_rcu(p), which can block, was typed instead of the intended kfree_rcu(p, rh) - SRCU updates, perhaps most notably fixing a bug that causes SRCU to fail when booted on a system with a non-zero boot CPU. This surprising situation actually happens for kdump kernels on the powerpc architecture This also adds an srcu_down_read() and srcu_up_read(), which act like srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side critical section to be handed off from one task to another - Clean up the now-useless SRCU Kconfig option There are a few more commits that are not yet acked or pulled into maintainer trees, and these will be in a pull request for a later merge window - RCU-tasks updates, perhaps most notably these fixes: - A strange interaction between PID-namespace unshare and the RCU-tasks grace period that results in a low-probability but very real hang - A race between an RCU tasks rude grace period on a single-CPU system and CPU-hotplug addition of the second CPU that can result in a too-short grace period - A race between shrinking RCU tasks down to a single callback list and queuing a new callback to some other CPU, but where that queuing is delayed for more than an RCU grace period. This can result in that callback being stranded on the non-boot CPU - Torture-test updates and fixes - Torture-test scripting updates and fixes - Provide additional RCU CPU stall-warning information in kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute timeout limit for expedited RCU CPU stall warnings * tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() kernel/notifier: Remove CONFIG_SRCU init: Remove "select SRCU" fs/quota: Remove "select SRCU" fs/notify: Remove "select SRCU" fs/btrfs: Remove "select SRCU" fs: Remove CONFIG_SRCU drivers/pci/controller: Remove "select SRCU" drivers/net: Remove "select SRCU" drivers/md: Remove "select SRCU" drivers/hwtracing/stm: Remove "select SRCU" drivers/dax: Remove "select SRCU" drivers/base: Remove CONFIG_SRCU rcu: Disable laziness if lazy-tracking says so rcu: Track laziness during boot and suspend rcu: Remove redundant call to rcu_boost_kthread_setaffinity() rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts rcu: Align the output of RCU CPU stall warning messages rcu: Add RCU stall diagnosis information sched: Add helper nr_context_switches_cpu() ...
2023-02-20	Merge tag 'sched-core-2023-02-20' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Improve the scalability of the CFS bandwidth unthrottling logic with large number of CPUs. - Fix & rework various cpuidle routines, simplify interaction with the generic scheduler code. Add __cpuidle methods as noinstr to objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks. - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query previously issued registrations. - Limit scheduler slice duration to the sysctl_sched_latency period, to improve scheduling granularity with a large number of SCHED_IDLE tasks. - Debuggability enhancement on sys_exit(): warn about disabled IRQs, but also enable them to prevent a cascade of followup problems and repeat warnings. - Fix the rescheduling logic in prio_changed_dl(). - Micro-optimize cpufreq and sched-util methods. - Micro-optimize ttwu_runnable() - Micro-optimize the idle-scanning in update_numa_stats(), select_idle_capacity() and steal_cookie_task(). - Update the RSEQ code & self-tests - Constify various scheduler methods - Remove unused methods - Refine __init tags - Documentation updates - Misc other cleanups, fixes * tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits) sched/rt: pick_next_rt_entity(): check list_entry sched/deadline: Add more reschedule cases to prio_changed_dl() sched/fair: sanitize vruntime of entity being placed sched/fair: Remove capacity inversion detection sched/fair: unlink misfit task from cpu overutilized objtool: mem() are not uaccess safe cpuidle: Fix poll_idle() noinstr annotation sched/clock: Make local_clock() noinstr sched/clock/x86: Mark sched_clock() noinstr x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read() x86/atomics: Always inline arch_atomic64() cpuidle: tracing, preempt: Squash _rcuidle tracing cpuidle: tracing: Warn about !rcu_is_watching() cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG cpuidle: drivers: firmware: psci: Dont instrument suspend code KVM: selftests: Fix build of rseq test exit: Detect and fix irq disabled state in oops cpuidle, arm64: Fix the ARM64 cpuidle logic cpuidle: mvebu: Fix duplicate flags assignment sched/fair: Limit sched slice duration ...
2023-02-20	Merge tag 'fs.idmapped.v6.3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs idmapping updates from Christian Brauner: - Last cycle we introduced the dedicated struct mnt_idmap type for mount idmapping and the required infrastucture in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). As promised in last cycle's pull request message this converts everything to rely on struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevant on the mount level. Especially for non-vfs developers without detailed knowledge in this area this was a potential source for bugs. This finishes the conversion. Instead of passing the plain namespace around this updates all places that currently take a pointer to a mnt_userns with a pointer to struct mnt_idmap. Now that the conversion is done all helpers down to the really low-level helpers only accept a struct mnt_idmap argument instead of two namespace arguments. Conflating mount and other idmappings will now cause the compiler to complain loudly thus eliminating the possibility of any bugs. This makes it impossible for filesystem developers to mix up mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. Everything associated with struct mnt_idmap is moved into a single separate file. With that change no code can poke around in struct mnt_idmap. It can only be interacted with through dedicated helpers. That means all filesystems are and all of the vfs is completely oblivious to the actual implementation of idmappings. We are now also able to extend struct mnt_idmap as we see fit. For example, we can decouple it completely from namespaces for users that don't require or don't want to use them at all. We can also extend the concept of idmappings so we can cover filesystem specific requirements. In combination with the vfs{g,u}id_t work we finished in v6.2 this makes this feature substantially more robust and thus difficult to implement wrong by a given filesystem and also protects the vfs. - Enable idmapped mounts for tmpfs and fulfill a longstanding request. A long-standing request from users had been to make it possible to create idmapped mounts for tmpfs. For example, to share the host's tmpfs mount between multiple sandboxes. This is a prerequisite for some advanced Kubernetes cases. Systemd also has a range of use-cases to increase service isolation. And there are more users of this. However, with all of the other work going on this was way down on the priority list but luckily someone other than ourselves picked this up. As usual the patch is tiny as all the infrastructure work had been done multiple kernel releases ago. In addition to all the tests that we already have I requested that Rodrigo add a dedicated tmpfs testsuite for idmapped mounts to xfstests. It is to be included into xfstests during the v6.3 development cycle. This should add a slew of additional tests. * tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits) shmem: support idmapped mounts for tmpfs fs: move mnt_idmap fs: port vfs{g,u}id helpers to mnt_idmap fs: port fs{g,u}id helpers to mnt_idmap fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap fs: port i_{g,u}id_{needs_}update() to mnt_idmap quota: port to mnt_idmap fs: port privilege checking helpers to mnt_idmap fs: port inode_owner_or_capable() to mnt_idmap fs: port inode_init_owner() to mnt_idmap fs: port acl to mnt_idmap fs: port xattr to mnt_idmap fs: port ->permission() to pass mnt_idmap fs: port ->fileattr_set() to pass mnt_idmap fs: port ->set_acl() to pass mnt_idmap fs: port ->get_acl() to pass mnt_idmap fs: port ->tmpfile() to pass mnt_idmap fs: port ->rename() to pass mnt_idmap fs: port ->mknod() to pass mnt_idmap fs: port ->mkdir() to pass mnt_idmap ...
2023-02-17	Merge remote-tracking branch 'regmap/for-6.3' into regmap-next	Mark Brown

2023-02-17	regmap-irq: Remove unused mask_invert flag	Aidan MacDonald
	mask_invert is deprecated and no longer used; it can now be removed. Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com> Link: https://lore.kernel.org/r/20230216223200.150679-2-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-02-17	regmap-irq: Remove unused type_invert flag	Aidan MacDonald
	type_invert is deprecated and no longer used; it can now be removed. Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com> Link: https://lore.kernel.org/r/20230216223200.150679-1-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-02-15	Merge branches 'powercap', 'pm-domains', 'pm-em' and 'pm-opp'	Rafael J. Wysocki
	Merge updates of the powercap framework, generic PM domains, Energy Model and operating performance points for 6.3-rc1: - Fix possible name leak in powercap_register_zone() (Yang Yingliang). - Add Meteor Lake and Emerald Rapids support to the intel_rapl power capping driver (Zhang Rui). - Modify the idle_inject power capping facility to support 100% idle injection (Srinivas Pandruvada). - Fix large time windows handling in the intel_rapl power capping driver (Zhang Rui). - Fix memory leaks with using debugfs_lookup() in the generic PM domains and Energy Model code (Greg Kroah-Hartman). - Add missing 'cache-unified' property in example for kryo OPP bindings (Rob Herring). - Fix error checking in opp_migrate_dentry() (Qi Zheng). - Remove "select SRCU" (Paul E. McKenney). - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad Dybcio). * powercap: powercap: intel_rapl: Fix handling for large time window powercap: idle_inject: Support 100% idle injection powercap: intel_rapl: add support for Emerald Rapids powercap: intel_rapl: add support for Meteor Lake powercap: fix possible name leak in powercap_register_zone() * pm-domains: PM: domains: fix memory leak with using debugfs_lookup() * pm-em: PM: EM: fix memory leak with using debugfs_lookup() * pm-opp: OPP: fix error checking in opp_migrate_dentry() dt-bindings: opp: v2-qcom-level: Let qcom,opp-fuse-level be a 2-long array drivers/opp: Remove "select SRCU" dt-bindings: opp: opp-v2-kryo-cpu: Add missing 'cache-unified' property in example
2023-02-15	Merge branches 'pm-cpuidle', 'pm-core' and 'pm-sleep'	Rafael J. Wysocki
	Merge cpuidle updates, PM core updates and changes related to system sleep handling for 6.3-rc1: - Make the TEO cpuidle governor check CPU utilization in order to refine idle state selection (Kajetan Puchalski). - Make Kconfig select the haltpoll cpuidle governor when the haltpoll cpuidle driver is selected and replace a default_idle() call in that driver with arch_cpu_idle() which allows MWAIT to be used (Li RongQing). - Add Emerald Rapids Xeon support to the intel_idle driver (Artem Bityutskiy). - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to avoid randconfig build failures (Arnd Bergmann). - Make kobj_type structures used in the cpuidle sysfs interface constant (Thomas Weißschuh). - Make the cpuidle driver registration code update microsecond values of idle state parameters in accordance with their nanosecond values if they are provided (Rafael Wysocki). - Make the PSCI cpuidle driver prevent topology CPUs from being suspended on PREEMPT_RT (Krzysztof Kozlowski). - Document that pm_runtime_force_suspend() cannot be used with DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald). - Add EXPORT macros for exporting PM functions from drivers (Richard Fitzgerald). - Drop "select SRCU" from system sleep Kconfig (Paul E. McKenney). - Remove /** from non-kernel-doc comments in hibernation code (Randy Dunlap). * pm-cpuidle: cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT cpuidle: driver: Update microsecond values of state parameters as needed cpuidle: sysfs: make kobj_type structures constant cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies intel_idle: add Emerald Rapids Xeon support cpuidle-haltpoll: Replace default_idle() with arch_cpu_idle() cpuidle-haltpoll: select haltpoll governor cpuidle: teo: Introduce util-awareness cpuidle: teo: Optionally skip polling states in teo_find_shallower_state() * pm-core: PM: Add EXPORT macros for exporting PM functions PM: runtime: Document that force_suspend() is incompatible with SMART_SUSPEND * pm-sleep: PM: sleep: Remove "select SRCU" PM: hibernate: swap: don't use /** for non-kernel-doc comments
2023-02-14	driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place	Greg Kroah-Hartman
	For some reason, the drivers/base/class.c file still had the "old style" of exports, at the end of the file. Move the exports to the proper location, right after the function, to be correct. Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20230214144117.158956-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-14	Revert "driver core: add error handling for devtmpfs_create_node()"	Greg Kroah-Hartman
	This reverts commit 31b4b6730fd4f5d503c9f23619c920ce7b794754 as it is reported to cause boot regressions. Link: https://lore.kernel.org/r/Y+rSXg14z1Myd8Px@dev-arch.thelio-3990X Reported-by: Nathan Chancellor <nathan@kernel.org> Cc: Longlong Xia <xialonglong1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-14	Revert "devtmpfs: add debug info to handle()"	Greg Kroah-Hartman
	This reverts commit 90a9d5ff225267b3376f73c19f21174e3b6d7746 as it is reported to cause boot regressions. Link: https://lore.kernel.org/r/Y+rSXg14z1Myd8Px@dev-arch.thelio-3990X Reported-by: Nathan Chancellor <nathan@kernel.org> Cc: Longlong Xia <xialonglong1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-14	Revert "devtmpfs: remove return value of devtmpfs_delete_node()"	Greg Kroah-Hartman
	This reverts commit 9d3fe6aa6b9517408064c7c3134187e8ec77dbf7 as it is reported to cause boot regressions. Link: https://lore.kernel.org/r/Y+rSXg14z1Myd8Px@dev-arch.thelio-3990X Reported-by: Nathan Chancellor <nathan@kernel.org> Cc: Longlong Xia <xialonglong1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-11	driver core: cpu: don't hand-override the uevent bus_type callback.	Greg Kroah-Hartman
	Instead of having to change the uevent bus_type callback by hand at runtime, set it at build time based on the build configuration options, making this much simpler to maintain and understand (and allow to make the structure constant.) Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230210102408.1083177-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-11	devtmpfs: remove return value of devtmpfs_delete_node()	Longlong Xia
	The only caller of device_del() does not check the return value. And there's nothing we can do when cleaning things up on a remove path. Let's make it a void function. Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Link: https://lore.kernel.org/r/20230210095444.4067307-4-xialonglong1@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-10	devtmpfs: add debug info to handle()	Longlong Xia
	Because handle() is the core function for processing devtmpfs requests, Let's add some debug info in handle() to help users know why failed. Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Link: https://lore.kernel.org/r/20230210095444.4067307-3-xialonglong1@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-10	driver core: add error handling for devtmpfs_create_node()	Longlong Xia
	In some cases, devtmpfs_create_node() can return error value. So, make use of it. Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Link: https://lore.kernel.org/r/20230210095444.4067307-2-xialonglong1@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-10	driver core: bus: update my copyright notice	Greg Kroah-Hartman
	There's been some work done recently to the drivers/base/bus.c file so update the copyright notice in it to make those who track those types of things have an easier job. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230210091318.733561-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-10	driver core: bus: add bus_get_dev_root() function	Greg Kroah-Hartman
	Instead of poking around in the struct bus_type directly for the dev_root pointer, provide a function to return it properly reference counted, if it is present in the bus. This will be needed to move the pointer out of struct bus_type in the future. Use the function in the driver core code at the same time it is introduced to verify that it works properly. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230209093556.19132-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	PM: domains: fix memory leak with using debugfs_lookup()	Greg Kroah-Hartman
	When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09	driver core: bus: constify bus_unregister()	Greg Kroah-Hartman
	The bus_unregister() function can now take a const * to bus_type, not just a * so fix that up. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-22-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: constify some internal functions	Greg Kroah-Hartman
	The functions add_probe_files() and remove_probe_files() should be taking a const * to bus_type, not just a *, so fix that up. These functions should really be removed entirely and an attribute group used instead, but for now, make this change so that other const work can continue. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-21-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: constify bus_get_kset()	Greg Kroah-Hartman
	The bus_get_kset() function should be taking a const * to bus_type, not just a * so fix that up. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-20-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: constify bus_register/unregister_notifier()	Greg Kroah-Hartman
	The bus_register_notifier() and bus_unregister_notifier() functions should be taking a const * to bus_type, not just a * so fix that up. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-19-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: remove private pointer from struct bus_type	Greg Kroah-Hartman
	Now that the driver code has been refactored to not rely on the pointer from a struct bus_type to the private structure it can be safely removed from the structure entirely. This will allow most bus_type structures to now be marked as const. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-18-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: create bus_is_registered()	Greg Kroah-Hartman
	A local function to the driver core to determine if a bus really is registered with the kernel or not. To be used only by the driver core code, as part of the driver registration path as it's not really "safe" because the bus could be unregistered instantly after being called. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-17-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: clean up driver_find()	Greg Kroah-Hartman
	Convert the driver_find() function to use bus_to_subsys() and not use the back-pointer to the private structure. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-16-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: move driver_find() to bus.c	Greg Kroah-Hartman
	This function really is a bus function, not a driver one, so move it from driver.c to bus.c so that we can clean up some internal bus logic easier. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-15-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: clean up bus_sort_breadthfirst()	Greg Kroah-Hartman
	Convert the bus_sort_breadthfirst() function to use bus_to_subsys() and not use the back-pointer to the private structure. This also allows us to get rid of bus_get_device_klist() which was only being used by this one internal function. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-14-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: bus iterator cleanups	Greg Kroah-Hartman
	Convert the bus_for_each_dev(), bus_find_device, and bus_for_each_drv() functions to use bus_to_subsys() and not use the back-pointer to the private structure. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-13-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: bus_add/remove_driver() cleanups	Greg Kroah-Hartman
	Convert the bus_add_driver() and bus_remove_driver() functions to use bus_to_subsys() and not use the back-pointer to the private structure. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-12-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: bus_register/unregister_notifier() cleanups	Greg Kroah-Hartman
	Convert the bus_register_notifier() and bus_unregister_notifier() public functions to use bus_to_subsys() and not use the back-pointer to the private structure as well as the bus_notify() function. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-11-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: bus_get_kset() cleanup	Greg Kroah-Hartman
	Convert the bus_get_kset() function function to use bus_to_subsys() and not use the back-pointer to the private structure. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-10-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: subsys_interface_register/unregister() cleanups	Greg Kroah-Hartman
	Convert the subsys_interface_register and subsys_interface_unregister() functions to use bus_to_subsys() and not use the back-pointer to the private structure. This also requires changing the parameters on subsys_dev_iter_init() to iterate over the list properly. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-9-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: bus_register/unregister() cleanups	Greg Kroah-Hartman
	Convert the bus_register() and bus_unregister() functions to use bus_to_subsys() and not use the back-pointer to the private structure. Because bus_add_groups() and bus_remove_groups() were only called in one place, remove those one-line-wrapper functions and call the real sysfs group function where it is needed instead, saving another layer of indirection. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-8-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: bus_add/probe/remove_device() cleanups	Greg Kroah-Hartman
	Convert the bus_add_device(), bus_probe_device(), and bus_remove_device() functions to use bus_to_subsys() and not use the back-pointer to the private structure. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-7-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: sysfs function cleanups	Greg Kroah-Hartman
	Convert the drivers_autoprobe show/store and uevent sysfs callbacks to use bus_to_subsys() and not use the back-pointer to the private structure. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-6-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: convert bus_create/remove_file to be constant	Greg Kroah-Hartman
	bus_create_file() and bus_remove_file() can be made to take a constant bus pointer, as it should not be modifying anything in the bus structure. Make this change and move the functions to use the internal subsys_get/put() logic as well, to prevent the use of the back-pointer in struct bus_type. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-5-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: constantify the bus_find_* functions	Greg Kroah-Hartman
	All of the bus find and iterator functions do not modify the struct bus_type passed to them, so mark them as constant to enforce this rule. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-4-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: bus: implement bus_get/put() without the private pointer	Greg Kroah-Hartman
	In the quest to make 'struct bus_type' constant and in read-only memory, we need to stop using the private pointer to the subsys_private structure. First step in doing this is to create a helper function that turns a 'struct bus_type' into 'struct subsys_private' called bus_to_subsys(). bus_to_subsys() walks the list of registered busses in the system and finds the matching one based on the pointer to the bus_type itself. As this is a short list, and this function is not on any fast path, it should not be noticable. Implement bus_get() and bus_put() using this new helper function. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-3-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09	driver core: add local subsys_get and subsys_put functions	Greg Kroah-Hartman
	We need to control the reference count of the subsys private structure instead of directly manipulating the kset reference count of it, so wrap that logic up in a subsys_get() and subsys_put() function to make it more obvious as to what is happening. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230208111330.439504-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-08	driver core: fw_devlink: Make cycle detection more robust	Saravana Kannan
	fw_devlink could only detect a single and simple cycle because it relied mainly on device link cycle detection code that only checked for cycles between devices. The expectation was that the firmware wouldn't have complicated cycles and multiple cycles between devices. That expectation has been proven to be wrong. For example, fw_devlink could handle: +-+ +-+ \|A+------> \|B+ +-+ +++ ^ \| \| \| +----------+ But it couldn't handle even something as "simple" as: +---------------------+ \| \| v \| +-+ +-+ +++ \|A+------> \|B+------> \|C\| +-+ +++ +-+ ^ \| \| \| +----------+ But firmware has even more complicated cycles like: +---------------------+ \| \| v \| +-+ +---+ +++ +--+A+------>\| B +-----> \|C\|<--+ \| +-+ ++--+ +++ \| \| ^ \| ^ \| \| \| \| \| \| \| \| \| +---------+ +---------+ \| \| \| +------------------------------+ And this is without including parent child dependencies or nodes in the cycle that are just firmware nodes that'll never have a struct device created for them. The proper way to treat these devices it to not force any probe ordering between them, while still enforce dependencies between node in the cycles (A, B and C) and their consumers. So this patch goes all out and just deals with all types of cycles. It does this by: 1. Following dependencies across device links, parent-child and fwnode links. 2. When it find cycles, it mark the device links and fwnode links as such instead of just deleting them or making the indistinguishable from proxy SYNC_STATE_ONLY device links. This way, when new nodes get added, we can immediately find and mark any new cycles whether the new node is a device or firmware node. Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies") Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-9-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-08	driver core: fw_devlink: Consolidate device link flag computation	Saravana Kannan
	Consolidate the code that computes the flags to be used when creating a device link from a fwnode link. Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies") Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-8-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-08	driver core: fw_devlink: Allow marking a fwnode link as being part of a cycle	Saravana Kannan
	To improve detection and handling of dependency cycles, we need to be able to mark fwnode links as being part of cycles. fwnode links marked as being part of a cycle should not block their consumers from probing. Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies") Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-7-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-08	driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links	Saravana Kannan
	fw_devlink uses DL_FLAG_SYNC_STATE_ONLY device link flag for two purposes: 1. To allow a parent device to proxy its child device's dependency on a supplier so that the supplier doesn't get its sync_state() callback before the child device/consumer can be added and probed. In this usage scenario, we need to ignore cycles for ensure correctness of sync_state() callbacks. 2. When there are dependency cycles in firmware, we don't know which of those dependencies are valid. So, we have to ignore them all wrt probe ordering while still making sure the sync_state() callbacks come correctly. However, when detecting dependency cycles, there can be multiple dependency cycles between two devices that we need to detect. For example: A -> B -> A and A -> C -> B -> A. To detect multiple cycles correct, we need to be able to differentiate DL_FLAG_SYNC_STATE_ONLY device links used for (1) vs (2) above. To allow this differentiation, add a DL_FLAG_CYCLE that can be use to mark use case (2). We can then use the DL_FLAG_CYCLE to decide which DL_FLAG_SYNC_STATE_ONLY device links to follow when looking for dependency cycles. Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies") Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-6-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-08	driver core: fw_devlink: Improve check for fwnode with no device/driver	Saravana Kannan
	fw_devlink shouldn't defer the probe of a device to wait on a supplier that'll never have a struct device or will never be probed by a driver. We currently check if a supplier falls into this category, but don't check its ancestors. We need to check the ancestors too because if the ancestor will never probe, then the supplier will never probe either. Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-08	driver core: fw_devlink: Don't purge child fwnode's consumer links	Saravana Kannan
	When a device X is bound successfully to a driver, if it has a child firmware node Y that doesn't have a struct device created by then, we delete fwnode links where the child firmware node Y is the supplier. We did this to avoid blocking the consumers of the child firmware node Y from deferring probe indefinitely. While that a step in the right direction, it's better to make the consumers of the child firmware node Y to be consumers of the device X because device X is probably implementing whatever functionality is represented by child firmware node Y. By doing this, we capture the device dependencies more accurately and ensure better probe/suspend/resume ordering. Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-08	driver core: make kobj_type structures constant	Thomas Weißschuh
	Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20230204-kobj_type-driver-core-v1-1-b9f809419f2c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-08	drivers: base: dd: fix memory leak with using debugfs_lookup()	Greg Kroah-Hartman
	When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230202141621.2296458-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-08	drivers: base: component: fix memory leak with using debugfs_lookup()	Greg Kroah-Hartman
	When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230202141621.2296458-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-02	mm: memory-failure: add memory failure stats to sysfs	Jiaqi Yan
	Patch series "Introduce per NUMA node memory error statistics", v2. Background ========== In the RFC for Kernel Support of Memory Error Detection [1], one advantage of software-based scanning over hardware patrol scrubber is the ability to make statistics visible to system administrators. The statistics include 2 categories: * Memory error statistics, for example, how many memory error are encountered, how many of them are recovered by the kernel. Note these memory errors are non-fatal to kernel: during the machine check exception (MCE) handling kernel already classified MCE's severity to be unnecessary to panic (but either action required or optional). * Scanner statistics, for example how many times the scanner have fully scanned a NUMA node, how many errors are first detected by the scanner. The memory error statistics are useful to userspace and actually not specific to scanner detected memory errors, and are the focus of this patchset. Motivation ========== Memory error stats are important to userspace but insufficient in kernel today. Datacenter administrators can better monitor a machine's memory health with the visible stats. For example, while memory errors are inevitable on servers with 10+ TB memory, starting server maintenance when there are only 1~2 recovered memory errors could be overreacting; in cloud production environment maintenance usually means live migrate all the workload running on the server and this usually causes nontrivial disruption to the customer. Providing insight into the scope of memory errors on a system helps to determine the appropriate follow-up action. In addition, the kernel's existing memory error stats need to be standardized so that userspace can reliably count on their usefulness. Today kernel provides following memory error info to userspace, but they are not sufficient or have disadvantages: * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, not per NUMA node stats though * ras:memory_failure_event: only available after explicitly enabled * /dev/mcelog provides many useful info about the MCEs, but doesn't capture how memory_failure recovered memory MCEs * kernel logs: userspace needs to process log text Exposing memory error stats is also a good start for the in-kernel memory error detector. Today the data source of memory error stats are either direct memory error consumption, or hardware patrol scrubber detection (either signaled as UCNA or SRAO). Once in-kernel memory scanner is implemented, it will be the main source as it is usually configured to scan memory DIMMs constantly and faster than hardware patrol scrubber. How Implemented =============== As Naoya pointed out [2], exposing memory error statistics to userspace is useful independent of software or hardware scanner. Therefore we implement the memory error statistics independent of the in-kernel memory error detector. It exposes the following per NUMA node memory error counters: /sys/devices/system/node/node${X}/memory_failure/total /sys/devices/system/node/node${X}/memory_failure/recovered /sys/devices/system/node/node${X}/memory_failure/ignored /sys/devices/system/node/node${X}/memory_failure/failed /sys/devices/system/node/node${X}/memory_failure/delayed These counters describe how many raw pages are poisoned and after the attempted recoveries by the kernel, their resolutions: how many are recovered, ignored, failed, or delayed respectively. This approach can be easier to extend for future use cases than /proc/meminfo, trace event, and log. The following math holds for the statistics: * total = recovered + ignored + failed + delayed These memory error stats are reset during machine boot. The 1st commit introduces these sysfs entries. The 2nd commit populates memory error stats every time memory_failure attempts memory error recovery. The 3rd commit adds documentations for introduced stats. [1] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#mc22959244f5388891c523882e61163c6e4d703af [2] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#m52d8d7a333d8536bd7ce74253298858b1c0c0ac6 This patch (of 3): Today kernel provides following memory error info to userspace, but each has its own disadvantage * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, not per NUMA node stats though * ras:memory_failure_event: only available after explicitly enabled * /dev/mcelog provides many useful info about the MCEs, but doesn't capture how memory_failure recovered memory MCEs * kernel logs: userspace needs to process log text Exposes per NUMA node memory error stats as sysfs entries: /sys/devices/system/node/node${X}/memory_failure/total /sys/devices/system/node/node${X}/memory_failure/recovered /sys/devices/system/node/node${X}/memory_failure/ignored /sys/devices/system/node/node${X}/memory_failure/failed /sys/devices/system/node/node${X}/memory_failure/delayed These counters describe how many raw pages are poisoned and after the attempted recoveries by the kernel, their resolutions: how many are recovered, ignored, failed, or delayed respectively. The following math holds for the statistics: * total = recovered + ignored + failed + delayed Link: https://lkml.kernel.org/r/20230120034622.2698268-1-jiaqiyan@google.com Link: https://lkml.kernel.org/r/20230120034622.2698268-2-jiaqiyan@google.com Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>