summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-11-19Merge tag 'acpi-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These include a couple of fixes, a new ACPI backlight quirk for Apple MacbookPro11,2 and Air7,2 and a bunch of cleanups: - Fix _CPC register setting issue for registers located in memory in the ACPI CPPC library code (Lifeng Zheng) - Use DEFINE_SIMPLE_DEV_PM_OPS in the ACPI battery driver, make it use devm_ for initializing mutexes and allocating driver data, and make it check the register_pm_notifier() return value (Thomas Weißschuh, Andy Shevchenko) - Make the ACPI EC driver support compile-time conditional and allow ACPI to be built without CONFIG_HAS_IOPORT (Arnd Bergmann) - Remove a redundant error check from the pfr_telemetry driver (Colin Ian King) - Rearrange the processor_perflib code in the ACPI processor driver to avoid compiling x86-specific code on other architectures (Arnd Bergmann) - Add adev NULL check to acpi_quirk_skip_serdev_enumeration() and make UART skip quirks work on PCI UARTs without an UID (Hans de Goede) - Force native backlight handling Apple MacbookPro11,2 and Air7,2 in the ACPI video driver (Jonathan Denose) - Switch several ACPI platform drivers back to using struct platform_driver::remove() (Uwe Kleine-König) - Replace strcpy() with strscpy() in multiple places in the ACPI subsystem (Muhammad Qasim Abdul Majeed, Abdul Rahim)" * tag 'acpi-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits) ACPI: video: force native for Apple MacbookPro11,2 and Air7,2 ACPI: CPPC: Fix _CPC register setting issue ACPI: Switch back to struct platform_driver::remove() ACPI: x86: Add adev NULL check to acpi_quirk_skip_serdev_enumeration() ACPI: x86: Make UART skip quirks work on PCI UARTs without an UID ACPI: allow building without CONFIG_HAS_IOPORT ACPI: processor_perflib: extend X86 dependency ACPI: scan: Use strscpy() instead of strcpy() ACPI: SBSHC: Use strscpy() instead of strcpy() ACPI: SBS: Use strscpy() instead of strcpy() ACPI: power: Use strscpy() instead of strcpy() ACPI: pci_root: Use strscpy() instead of strcpy() ACPI: pci_link: Use strscpy() instead of strcpy() ACPI: event: Use strscpy() instead of strcpy() ACPI: EC: Use strscpy() instead of strcpy() ACPI: APD: Use strscpy() instead of strcpy() ACPI: thermal: Use strscpy() instead of strcpy() ACPI: battery: Check for error code from devm_mutex_init() call ACPI: EC: make EC support compile-time conditional ACPI: pfr_telemetry: remove redundant error check on ret ...
2024-11-19Merge tag 'thermal-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These are thermal core changes, including the addition of support for temperature thresholds that can be set from user space, fixes related to thermal zone initialization, suspend/resume and exit, locking rework and rearrangement of the code handling thermal zone temperature updates. Specifics: - Add support for thermal thresholds that can be added and removed from user space via netlink along with a related library update (Daniel Lezcano) - Fix thermal zone initialization, suspend/resume and exit synchronization issues (Rafael Wysocki) - Rearrange locking in the thermal core to use guards (Rafael Wysocki) - Make the code handling thermal zone temperature updates use sorted lists of trip points to reduce the number of trip points table walks in the thermal core (Rafael Wysocki) - Fix and clean up the thermal testing facility code (Rafael Wysocki) - Fix a Power Allocator thermal governor issue (ZhengShaobo)" * tag 'thermal-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (45 commits) thermal: testing: Initialize some variables annoteded with _free() thermal: testing: Use DEFINE_FREE() and __free() to simplify code thermal: testing: Simplify tt_get_tt_zone() thermal: gov_power_allocator: Granted power set to max when nobody request power thermal: core: Relocate thermal zone initialization routine thermal: core: Use trip lists for trip crossing detection thermal: core: Eliminate thermal_zone_trip_down() thermal: core: Relocate functions that update trip points thermal: core: Move some trip processing to thermal_trip_crossed() thermal: core: Pass trip descriptor to thermal_trip_crossed() thermal: core: Rearrange __thermal_zone_device_update() thermal: core: Prepare for moving trips between sorted lists thermal: core: Rename trip list node in struct thermal_trip_desc thermal: core: Build sorted lists instead of sorting them later thermal/lib: Fix memory leak on error in thermal_genl_auto() thermal: thresholds: Fix thermal lock annotation issue tools/thermal/thermal-engine: Take into account the thresholds API tools/lib/thermal: Add the threshold netlink ABI tools/lib/thermal: Make more generic the command encoding function thermal: netlink: Add the commands and the events for the thresholds ...
2024-11-19Merge tag 'pm-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The amd-pstate cpufreq driver gets the majority of changes this time. They are mostly fixes and cleanups, but one of them causes it to become the default cpufreq driver on some AMD server platforms. Apart from that, the menu cpuidle governor is modified to not use iowait any more, the intel_idle gets a custom C-states table for Granite Rapids Xeon D, and the intel_pstate driver will use a more aggressive Balance- performance default EPP value on Granite Rapids now. There are also some fixes, cleanups and tooling updates. Specifics: - Update the amd-pstate driver to set the initial scaling frequency policy lower bound to be the lowest non-linear frequency (Dhananjay Ugwekar) - Enable amd-pstate by default on servers starting with newer AMD Epyc processors (Swapnil Sapkal) - Align more codepaths between shared memory and MSR designs in amd-pstate (Dhananjay Ugwekar) - Clean up amd-pstate code to rename functions and remove redundant calls (Dhananjay Ugwekar, Mario Limonciello) - Do other assorted fixes and cleanups in amd-pstate (Dhananjay Ugwekar and Mario Limonciello) - Change the Balance-performance EPP value for Granite Rapids in the intel_pstate driver to a more performance-biased one (Srinivas Pandruvada) - Simplify MSR read on the boot CPU in the ACPI cpufreq driver (Chang S. Bae) - Ensure sugov_eas_rebuild_sd() is always called when sugov_init() succeeds to always enforce sched domains rebuild in case EAS needs to be enabled (Christian Loehle) - Switch cpufreq back to platform_driver::remove() (Uwe Kleine-König) - Use proper frequency unit names in cpufreq (Marcin Juszkiewicz) - Add a built-in idle states table for Granite Rapids Xeon D to the intel_idle driver (Artem Bityutskiy) - Fix some typos in comments in the cpuidle core and drivers (Shen Lichuan) - Remove iowait influence from the menu cpuidle governor (Christian Loehle) - Add min/max available performance state limits to the Energy Model management code (Lukasz Luba) - Update pm-graph to v5.13 (Todd Brandt) - Add documentation for some recently introduced cpupower utility options (Tor Vic) - Make cpupower inform users where cpufreq-bench.conf should be located when opening it fails (Peng Fan) - Allow overriding cross-compiling env params in cpupower (Peng Fan) - Add compile_commands.json to .gitignore in cpupower (John B. Wyatt IV) - Improve disable c_state block in cpupower bindings and add a test to confirm that CPU state is disabled to it (John B. Wyatt IV) - Add Chinese Simplified translation to cpupower (Kieran Moy) - Add checks for xgettext and msgfmt to cpupower (Siddharth Menon)" * tag 'pm-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (38 commits) cpufreq: intel_pstate: Update Balance-performance EPP for Granite Rapids cpufreq: ACPI: Simplify MSR read on the boot CPU sched/cpufreq: Ensure sd is rebuilt for EAS check intel_idle: add Granite Rapids Xeon D support PM: EM: Add min/max available performance state limits cpufreq/amd-pstate: Move registration after static function call update cpufreq/amd-pstate: Push adjust_perf vfunc init into cpu_init cpufreq/amd-pstate: Align offline flow of shared memory and MSR based systems cpufreq/amd-pstate: Call cppc_set_epp_perf in the reenable function cpufreq/amd-pstate: Do not attempt to clear MSR_AMD_CPPC_ENABLE cpufreq/amd-pstate: Rename functions that enable CPPC cpufreq/amd-pstate-ut: Add fix for min freq unit test amd-pstate: Switch to amd-pstate by default on some Server platforms amd-pstate: Set min_perf to nominal_perf for active mode performance gov cpufreq/amd-pstate: Remove the redundant amd_pstate_set_driver() call cpufreq/amd-pstate: Remove the switch case in amd_pstate_init() cpufreq/amd-pstate: Call amd_pstate_set_driver() in amd_pstate_register_driver() cpufreq/amd-pstate: Call amd_pstate_register() in amd_pstate_init() cpufreq/amd-pstate: Set the initial min_freq to lowest_nonlinear_freq cpufreq/amd-pstate: Remove the redundant verify() function ...
2024-11-19Merge tag 'random-6.13-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "This contains a single series from Uros to replace uses of <linux/random.h> with prandom.h or other more specific headers as needed, in order to avoid a circular header issue. Uros' goal is to be able to use percpu.h from prandom.h, which will then allow him to define __percpu in percpu.h rather than in compiler_types.h" * tag 'random-6.13-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: Include <linux/percpu.h> in <linux/prandom.h> random: Do not include <linux/prandom.h> in <linux/random.h> netem: Include <linux/prandom.h> in sch_netem.c lib/test_scanf: Include <linux/prandom.h> instead of <linux/random.h> lib/test_parman: Include <linux/prandom.h> instead of <linux/random.h> bpf/tests: Include <linux/prandom.h> instead of <linux/random.h> lib/rbtree-test: Include <linux/prandom.h> instead of <linux/random.h> random32: Include <linux/prandom.h> instead of <linux/random.h> kunit: string-stream-test: Include <linux/prandom.h> lib/interval_tree_test.c: Include <linux/prandom.h> instead of <linux/random.h> bpf: Include <linux/prandom.h> instead of <linux/random.h> scsi: libfcoe: Include <linux/prandom.h> instead of <linux/random.h> fscrypt: Include <linux/once.h> in fs/crypto/keyring.c mtd: tests: Include <linux/prandom.h> instead of <linux/random.h> media: vivid: Include <linux/prandom.h> in vivid-vid-cap.c drm/lib: Include <linux/prandom.h> instead of <linux/random.h> drm/i915/selftests: Include <linux/prandom.h> instead of <linux/random.h> crypto: testmgr: Include <linux/prandom.h> instead of <linux/random.h> x86/kaslr: Include <linux/prandom.h> instead of <linux/random.h>
2024-11-19Merge tag 'v6.13-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add sig driver API - Remove signing/verification from akcipher API - Move crypto_simd_disabled_for_test to lib/crypto - Add WARN_ON for return values from driver that indicates memory corruption Algorithms: - Provide crc32-arch and crc32c-arch through Crypto API - Optimise crc32c code size on x86 - Optimise crct10dif on arm/arm64 - Optimise p10-aes-gcm on powerpc - Optimise aegis128 on x86 - Output full sample from test interface in jitter RNG - Retry without padata when it fails in pcrypt Drivers: - Add support for Airoha EN7581 TRNG - Add support for STM32MP25x platforms in stm32 - Enable iproc-r200 RNG driver on BCMBCA - Add Broadcom BCM74110 RNG driver" * tag 'v6.13-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (112 commits) crypto: marvell/cesa - fix uninit value for struct mv_cesa_op_ctx crypto: cavium - Fix an error handling path in cpt_ucode_load_fw() crypto: aesni - Move back to module_init crypto: lib/mpi - Export mpi_set_bit crypto: aes-gcm-p10 - Use the correct bit to test for P10 hwrng: amd - remove reference to removed PPC_MAPLE config crypto: arm/crct10dif - Implement plain NEON variant crypto: arm/crct10dif - Macroify PMULL asm code crypto: arm/crct10dif - Use existing mov_l macro instead of __adrl crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply crypto: arm64/crct10dif - Remove obsolete chunking logic crypto: bcm - add error check in the ahash_hmac_init function crypto: caam - add error check to caam_rsa_set_priv_key_form hwrng: bcm74110 - Add Broadcom BCM74110 RNG driver dt-bindings: rng: add binding for BCM74110 RNG padata: Clean up in padata_do_multithreaded() crypto: inside-secure - Fix the return value of safexcel_xcbcmac_cra_init() crypto: qat - Fix missing destroy_workqueue in adf_init_aer() crypto: rsassa-pkcs1 - Reinstate support for legacy protocols ...
2024-11-19Merge tag 'chrome-platform-firmware-for-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform firmware updates from Tzung-Bi Shih: - Do not double register "simple-framebuffer" platform device if Generic System Framebuffers (sysfb) already did that - Fix a missing of unregistering platform driver in error handling path * tag 'chrome-platform-firmware-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: firmware: google: Unregister driver_info on failure firmware: coreboot: Don't register a pdev if screen_info data is present firmware: sysfb: Add a sysfb_handles_screen_info() helper function
2024-11-19block: Support atomic writes limits for stacked devicesJohn Garry
Allow stacked devices to support atomic writes by aggregating the minimum capability of all bottom devices. Flag BLK_FEAT_ATOMIC_WRITES_STACKED is set for stacked devices which have been enabled to support atomic writes. Some things to note on the implementation: - For simplicity, all bottom devices must have same atomic write boundary value (if any) - The atomic write boundary must be a power-of-2 already, but this restriction could be relaxed. Furthermore, it is now required that the chunk sectors for a top device must be aligned with this boundary. - If a bottom device atomic write unit min/max are not aligned with the top device chunk sectors, the top device atomic write unit min/max are reduced to a value which works for the chunk sectors. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20241118105018.1870052-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-19Compiler Attributes: disable __counted_by for clang < 19.1.3Jan Hendrik Farr
This patch disables __counted_by for clang versions < 19.1.3 because of the two issues listed below. It does this by introducing CONFIG_CC_HAS_COUNTED_BY. 1. clang < 19.1.2 has a bug that can lead to __bdos returning 0: https://github.com/llvm/llvm-project/pull/110497 2. clang < 19.1.3 has a bug that can lead to __bdos being off by 4: https://github.com/llvm/llvm-project/pull/112636 Fixes: c8248faf3ca2 ("Compiler Attributes: counted_by: Adjust name and identifier expansion") Cc: stable@vger.kernel.org # 6.6.x: 16c31dd7fdf6: Compiler Attributes: counted_by: bump min gcc version Cc: stable@vger.kernel.org # 6.6.x: 2993eb7a8d34: Compiler Attributes: counted_by: fixup clang URL Cc: stable@vger.kernel.org # 6.6.x: 231dc3f0c936: lkdtm/bugs: Improve warning message for compilers without counted_by support Cc: stable@vger.kernel.org # 6.6.x Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/all/20240913164630.GA4091534@thelio-3990X/ Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202409260949.a1254989-oliver.sang@intel.com Link: https://lore.kernel.org/all/Zw8iawAF5W2uzGuh@archlinux/T/#m204c09f63c076586a02d194b87dffc7e81b8de7b Suggested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Jan Hendrik Farr <kernel@jfarr.cc> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20241029140036.577804-2-kernel@jfarr.cc Signed-off-by: Kees Cook <kees@kernel.org>
2024-11-19block: return unsigned int from bdev_io_minChristoph Hellwig
The underlying limit is defined as an unsigned int, so return that from bdev_io_min as well. Fixes: ac481c20ef8f ("block: Topology ioctls") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241119072602.1059488-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Merge in late fixes to prepare for the 6.13 net-next PR. Conflicts: include/linux/phy.h 41ffcd95015f net: phy: fix phylib's dual eee_enabled 721aa69e708b net: phy: convert eee_broken_modes to a linkmode bitmap https://lore.kernel.org/all/20241118135512.1039208b@canb.auug.org.au/ drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c 2160428bcb20 net: txgbe: fix null pointer to pcs 2160428bcb20 net: txgbe: remove GPIO interrupt controller Adjacent commits: include/linux/phy.h 41ffcd95015f net: phy: fix phylib's dual eee_enabled 516a5f11eb97 net: phy: respect cached advertising when re-enabling EEE Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-19ceph: correct ceph_mds_cap_peer field namePatrick Donnelly
The peer seq is used as the issue_seq. Use that name for consistency. See also ceph.git commit 1da6ef237fc7 ("include/ceph_fs: correct ceph_mds_cap_peer field name"). Link: https://tracker.ceph.com/issues/66704 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-11-19ceph: correct ceph_mds_cap_item field namePatrick Donnelly
The issue_seq is sent with bulk cap releases, not the current sequence number. See also ceph.git commit 655cddb7c9f3 ("include/ceph_fs: correct ceph_mds_cap_item field name"). Link: https://tracker.ceph.com/issues/66704 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-11-18Merge branches 'clk-marvell', 'clk-adi', 'clk-qcom' and 'clk-devm' into clk-nextStephen Boyd
- Add devm_clk_bulk_get_all_enabled() to return number of clks acquired - Marvell PXA1908 SoC clks * clk-marvell: clk: mmp: Add Marvell PXA1908 MPMU driver clk: mmp: Add Marvell PXA1908 APMU driver clk: mmp: Add Marvell PXA1908 APBCP driver clk: mmp: Add Marvell PXA1908 APBC driver dt-bindings: clock: Add Marvell PXA1908 clock bindings clk: mmp: Switch to use struct u32_fract instead of custom one * clk-adi: clk: clk-axi-clkgen: make sure to enable the AXI bus clock dt-bindings: clock: axi-clkgen: include AXI clk * clk-qcom: (43 commits) clk: qcom: remove unused data from gcc-ipq5424.c clk: qcom: Add support for Global Clock Controller on QCS8300 dt-bindings: clock: qcom: Add GCC clocks for QCS8300 clk: qcom: add Global Clock controller (GCC) driver for IPQ5424 SoC clk: qcom: clk-alpha-pll: Add NSS HUAYRA ALPHA PLL support for ipq9574 dt-bindings: clock: Add Qualcomm IPQ5424 GCC binding clk: qcom: add SAR2130P GPU Clock Controller support clk: qcom: dispcc-sm8550: enable support for SAR2130P clk: qcom: tcsrcc-sm8550: add SAR2130P support clk: qcom: add support for GCC on SAR2130P clk: qcom: rpmh: add support for SAR2130P clk: qcom: rcg2: add clk_rcg2_shared_floor_ops dt-bindings: clk: qcom,sm8450-gpucc: add SAR2130P compatibles dt-bindings: clock: qcom,sm8550-dispcc: Add SAR2130P compatible dt-bindings: clock: qcom,sm8550-tcsr: Add SAR2130P compatible dt-bindings: clock: qcom: document SAR2130P Global Clock Controller dt-bindings: clock: qcom,rpmhcc: Add SAR2130P compatible clk: qcom: Make GCC_6125 depend on QCOM_GDSC dt-bindings: clock: qcom: gcc-ipq9574: remove q6 bring up clock macros dt-bindings: clock: qcom: gcc-ipq5332: remove q6 bring up clock macros ... * clk-devm: clk: Provide devm_clk_bulk_get_all_enabled() helper
2024-11-18Merge branches 'clk-mobileye', 'clk-twl', 'clk-nuvoton', 'clk-renesas' and ↵Stephen Boyd
'clk-bindings' into clk-next - Mobileye EyeQ5, EyeQ6L and EyeQ6H clk driver - TWL6030 clk driver - Nuvoton Arbel BMC NPCM8XX SoC clks - Convert more clk bindings to YAML * clk-mobileye: clk: eyeq: add EyeQ6H west fixed factor clocks clk: eyeq: add EyeQ6H central fixed factor clocks clk: eyeq: add EyeQ5 fixed factor clocks clk: eyeq: add fixed factor clocks infrastructure clk: eyeq: require clock index with phandle in all cases clk: fixed-factor: add clk_hw_register_fixed_factor_index() function dt-bindings: clock: eyeq: add more Mobileye EyeQ5/EyeQ6H clocks dt-bindings: soc: mobileye: set `#clock-cells = <1>` for all compatibles clk: eyeq: add driver clk: divider: Introduce CLK_DIVIDER_EVEN_INTEGERS flag dt-bindings: clock: add Mobileye EyeQ6L/EyeQ6H clock indexes Revert "dt-bindings: clock: mobileye,eyeq5-clk: add bindings" * clk-twl: clk: twl: add TWL6030 support clk: twl: remove is_prepared * clk-nuvoton: clk: npcm8xx: add clock controller reset: npcm: register npcm8xx clock auxiliary bus device dt-bindings: reset: npcm: add clock properties * clk-renesas: clk: renesas: vbattb: Add VBATTB clock driver clk: Add devm_clk_hw_register_gate_parent_hw() clk: renesas: rzg2l: Fix FOUTPOSTDIV clk dt-bindings: clock: renesas,r9a08g045-vbattb: Document VBATTB clk: renesas: r9a08g045: Add power domain for RTC clk: renesas: r9a08g045: Mark the watchdog and always-on PM domains as IRQ safe clk: renesas: rzg2l-cpg: Use GENPD_FLAG_* flags instead of local ones clk: renesas: rzg2l-cpg: Move PM domain power on in rzg2l_cpg_pd_setup() dt-bindings: clock: r9a08g045-cpg: Add power domain ID for RTC clk: renesas: r8a779h0: Drop CLK_PLL2_DIV2 to clarify ZCn clocks clk: renesas: r9a09g057: Add clock and reset entries for ICU clk: renesas: r9a09g057: Add CA55 core clocks clk: renesas: Remove duplicate and trailing empty lines * clk-bindings: dt-bindings: clock: actions,owl-cmu: convert to YAML dt-bindings: clock: ti: Convert mux.txt to json-schema dt-bindings: clock: ti: Convert divider.txt to json-schema dt-bindings: clock: ti: Convert interface.txt to json-schema dt-bindings: clock: convert rockchip,rk3328-cru.txt to YAML
2024-11-18netpoll: Use rcu_access_pointer() in netpoll_poll_lockBreno Leitao
The ndev->npinfo pointer in netpoll_poll_lock() is RCU-protected but is being accessed directly for a NULL check. While no RCU read lock is held in this context, we should still use proper RCU primitives for consistency and correctness. Replace the direct NULL check with rcu_access_pointer(), which is the appropriate primitive when only checking for NULL without dereferencing the pointer. This function provides the necessary ordering guarantees without requiring RCU read-side protection. Fixes: bea3348eef27 ("[NET]: Make NAPI polling independent of struct net_device objects.") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20241118-netpoll_rcu-v1-2-a1888dcb4a02@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18Revert "net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end ↵Kees Cook
warnings" This reverts commit 3bd9b9abdf1563a22041b7255baea6d449902f1a. We cannot use the new tagged struct group because it throws C++ errors even under "extern C". Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20241115204308.3821419-1-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Support for running Linux in a protected VM under the Arm Confidential Compute Architecture (CCA) - Guarded Control Stack user-space support. Current patches follow the x86 ABI of implicitly creating a shadow stack on clone(). Subsequent patches (already on the list) will add support for clone3() allowing finer-grained control of the shadow stack size and placement from libc - AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are getting close with the upcoming dpISA support) - Other arch features: - In-kernel use of the memcpy instructions, FEAT_MOPS (previously only exposed to user; uaccess support not merged yet) - MTE: hugetlbfs support and the corresponding kselftests - Optimise CRC32 using the PMULL instructions - Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG - Optimise the kernel TLB flushing to use the range operations - POE/pkey (permission overlays): further cleanups after bringing the signal handler in line with the x86 behaviour for 6.12 - arm64 perf updates: - Support for the NXP i.MX91 PMU in the existing IMX driver - Support for Ampere SoCs in the Designware PCIe PMU driver - Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC - Support for Samsung's 'Mongoose' CPU PMU - Support for PMUv3.9 finer-grained userspace counter access control - Switch back to platform_driver::remove() now that it returns 'void' - Add some missing events for the CXL PMU driver - Miscellaneous arm64 fixes/cleanups: - Page table accessors cleanup: type updates, drop unused macros, reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity check addresses before runtime P4D/PUD folding - Command line override for ID_AA64MMFR0_EL1.ECV (advertising the FEAT_ECV for the generic timers) allowing Linux to boot with firmware deployments that don't set SCTLR_EL3.ECVEn - ACPI/arm64: tighten the check for the array of platform timer structures and adjust the error handling procedure in gtdt_parse_timer_block() - Optimise the cache flush for the uprobes xol slot (skip if no change) and other uprobes/kprobes cleanups - Fix the context switching of tpidrro_el0 when kpti is enabled - Dynamic shadow call stack fixes - Sysreg updates - Various arm64 kselftest improvements * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (168 commits) arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled kselftest/arm64: Try harder to generate different keys during PAC tests kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all() arm64/ptrace: Clarify documentation of VL configuration via ptrace kselftest/arm64: Corrupt P0 in the irritator when testing SSVE acpi/arm64: remove unnecessary cast arm64/mm: Change protval as 'pteval_t' in map_range() kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c kselftest/arm64: Add FPMR coverage to fp-ptrace kselftest/arm64: Expand the set of ZA writes fp-ptrace does kselftets/arm64: Use flag bits for features in fp-ptrace assembler code kselftest/arm64: Enable build of PAC tests with LLVM=1 kselftest/arm64: Check that SVCR is 0 in signal handlers selftests/mm: Fix unused function warning for aarch64_write_signal_pkey() kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests kselftest/arm64: Fix build with stricter assemblers arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux() arm64/scs: Deal with 64-bit relative offsets in FDE frames ...
2024-11-18Merge tag 'lsm-pr-20241112' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: "Thirteen patches, all focused on moving away from the current 'secid' LSM identifier to a richer 'lsm_prop' structure. This move will help reduce the translation that is necessary in many LSMs, offering better performance, and make it easier to support different LSMs in the future" * tag 'lsm-pr-20241112' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: remove lsm_prop scaffolding netlabel,smack: use lsm_prop for audit data audit: change context data from secid to lsm_prop lsm: create new security_cred_getlsmprop LSM hook audit: use an lsm_prop in audit_names lsm: use lsm_prop in security_inode_getsecid lsm: use lsm_prop in security_current_getsecid audit: update shutdown LSM data lsm: use lsm_prop in security_ipc_getsecid audit: maintain an lsm_prop in audit_context lsm: add lsmprop_to_secctx hook lsm: use lsm_prop in security_audit_rule_match lsm: add the lsm_prop data structure
2024-11-18nfs_common: must not hold RCU while calling nfsd_file_put_localMike Snitzer
Move holding the RCU from nfs_to_nfsd_file_put_local to nfs_to_nfsd_net_put. It is the call to nfs_to->nfsd_serv_put that requires the RCU anyway (the puts for nfsd_file and netns were combined to avoid an extra indirect reference but that micro-optimization isn't possible now). This fixes xfstests generic/013 and it triggering: "Voluntary context switch within RCU read-side critical section!" [ 143.545738] Call Trace: [ 143.546206] <TASK> [ 143.546625] ? show_regs+0x6d/0x80 [ 143.547267] ? __warn+0x91/0x140 [ 143.547951] ? rcu_note_context_switch+0x496/0x5d0 [ 143.548856] ? report_bug+0x193/0x1a0 [ 143.549557] ? handle_bug+0x63/0xa0 [ 143.550214] ? exc_invalid_op+0x1d/0x80 [ 143.550938] ? asm_exc_invalid_op+0x1f/0x30 [ 143.551736] ? rcu_note_context_switch+0x496/0x5d0 [ 143.552634] ? wakeup_preempt+0x62/0x70 [ 143.553358] __schedule+0xaa/0x1380 [ 143.554025] ? _raw_spin_unlock_irqrestore+0x12/0x40 [ 143.554958] ? try_to_wake_up+0x1fe/0x6b0 [ 143.555715] ? wake_up_process+0x19/0x20 [ 143.556452] schedule+0x2e/0x120 [ 143.557066] schedule_preempt_disabled+0x19/0x30 [ 143.557933] rwsem_down_read_slowpath+0x24d/0x4a0 [ 143.558818] ? xfs_efi_item_format+0x50/0xc0 [xfs] [ 143.559894] down_read+0x4e/0xb0 [ 143.560519] xlog_cil_commit+0x1b2/0xbc0 [xfs] [ 143.561460] ? _raw_spin_unlock+0x12/0x30 [ 143.562212] ? xfs_inode_item_precommit+0xc7/0x220 [xfs] [ 143.563309] ? xfs_trans_run_precommits+0x69/0xd0 [xfs] [ 143.564394] __xfs_trans_commit+0xb5/0x330 [xfs] [ 143.565367] xfs_trans_roll+0x48/0xc0 [xfs] [ 143.566262] xfs_defer_trans_roll+0x57/0x100 [xfs] [ 143.567278] xfs_defer_finish_noroll+0x27a/0x490 [xfs] [ 143.568342] xfs_defer_finish+0x1a/0x80 [xfs] [ 143.569267] xfs_bunmapi_range+0x4d/0xb0 [xfs] [ 143.570208] xfs_itruncate_extents_flags+0x13d/0x230 [xfs] [ 143.571353] xfs_free_eofblocks+0x12e/0x190 [xfs] [ 143.572359] xfs_file_release+0x12d/0x140 [xfs] [ 143.573324] __fput+0xe8/0x2d0 [ 143.573922] __fput_sync+0x1d/0x30 [ 143.574574] nfsd_filp_close+0x33/0x60 [nfsd] [ 143.575430] nfsd_file_free+0x96/0x150 [nfsd] [ 143.576274] nfsd_file_put+0xf7/0x1a0 [nfsd] [ 143.577104] nfsd_file_put_local+0x18/0x30 [nfsd] [ 143.578070] nfs_close_local_fh+0x101/0x110 [nfs_localio] [ 143.579079] __put_nfs_open_context+0xc9/0x180 [nfs] [ 143.580031] nfs_file_clear_open_context+0x4a/0x60 [nfs] [ 143.581038] nfs_file_release+0x3e/0x60 [nfs] [ 143.581879] __fput+0xe8/0x2d0 [ 143.582464] __fput_sync+0x1d/0x30 [ 143.583108] __x64_sys_close+0x41/0x80 [ 143.583823] x64_sys_call+0x189a/0x20d0 [ 143.584552] do_syscall_64+0x64/0x170 [ 143.585240] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 143.586185] RIP: 0033:0x7f3c5153efd7 Fixes: 65f2a5c36635 ("nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18lockd: Remove unused parameter to nlmsvc_testlock()Chuck Lever
The nlm_cookie parameter has been unused since commit 09802fd2a8ca ("lockd: rip out deferred lock handling from testlock codepath"). Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18lockd: Remove unused typedefChuck Lever
Clean up: Looks like the last usage of this typedef was removed by commit 026fec7e7c47 ("sunrpc: properly type pc_decode callbacks") in 2017. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18Merge tag 'for-6.13/io_uring-20241118' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring updates from Jens Axboe: - Cleanups of the eventfd handling code, making it fully private. - Support for sending a sync message to another ring, without having a ring available to send a normal async message. - Get rid of the separate unlocked hash table, unify everything around the single locked one. - Add support for ring resizing. It can be hard to appropriately size the CQ ring upfront, if the application doesn't know how busy it will be. This results in applications sizing rings for the most busy case, which can be wasteful. With ring resizing, they can start small and grow the ring, if needed. - Add support for fixed wait regions, rather than needing to copy the same wait data tons of times for each wait operation. - Rewrite the resource node handling, which before was serialized per ring. This caused issues with particularly fixed files, where one file waiting on IO could hold up putting and freeing of other unrelated files. Now each node is handled separately. New code is much simpler too, and was a net 250 line reduction in code. - Add support for just doing partial buffer clones, rather than always cloning the entire buffer table. - Series adding static NAPI support, where a specific NAPI instance is used rather than having a list of them available that need lookup. - Add support for mapped regions, and also convert the fixed wait support mentioned above to that concept. This avoids doing special mappings for various planned features, and folds the existing registered wait into that too. - Add support for hybrid IO polling, which is a variant of strict IOPOLL but with an initial sleep delay to avoid spinning too early and wasting resources on devices that aren't necessarily in the < 5 usec category wrt latencies. - Various cleanups and little fixes. * tag 'for-6.13/io_uring-20241118' of git://git.kernel.dk/linux: (79 commits) io_uring/region: fix error codes after failed vmap io_uring: restore back registered wait arguments io_uring: add memory region registration io_uring: introduce concept of memory regions io_uring: temporarily disable registered waits io_uring: disable ENTER_EXT_ARG_REG for IOPOLL io_uring: fortify io_pin_pages with a warning switch io_msg_ring() to CLASS(fd) io_uring: fix invalid hybrid polling ctx leaks io_uring/uring_cmd: fix buffer index retrieval io_uring/rsrc: add & apply io_req_assign_buf_node() io_uring/rsrc: remove '->ctx_ptr' of 'struct io_rsrc_node' io_uring/rsrc: pass 'struct io_ring_ctx' reference to rsrc helpers io_uring: avoid normal tw intermediate fallback io_uring/napi: add static napi tracking strategy io_uring/napi: clean up __io_napi_do_busy_loop io_uring/napi: Use lock guards io_uring/napi: improve __io_napi_add io_uring/napi: fix io_napi_entry RCU accesses io_uring/napi: protect concurrent io_napi_entry timeout accesses ...
2024-11-18Merge tag 'for-6.13/block-20241118' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - NVMe updates via Keith: - Use uring_cmd helper (Pavel) - Host Memory Buffer allocation enhancements (Christoph) - Target persistent reservation support (Guixin) - Persistent reservation tracing (Guixen) - NVMe 2.1 specification support (Keith) - Rotational Meta Support (Matias, Wang, Keith) - Volatile cache detection enhancment (Guixen) - MD updates via Song: - Maintainers update - raid5 sync IO fix - Enhance handling of faulty and blocked devices - raid5-ppl atomic improvement - md-bitmap fix - Support for manually defining embedded partition tables - Zone append fixes and cleanups - Stop sending the queued requests in the plug list to the driver ->queue_rqs() handle in reverse order. - Zoned write plug cleanups - Cleanups disk stats tracking and add support for disk stats for passthrough IO - Add preparatory support for file system atomic writes - Add lockdep support for queue freezing. Already found a bunch of issues, and some fixes for that are in here. More will be coming. - Fix race between queue stopping/quiescing and IO queueing - ublk recovery improvements - Fix ublk mmap for 64k pages - Various fixes and cleanups * tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux: (118 commits) MAINTAINERS: Update git tree for mdraid subsystem block: make struct rq_list available for !CONFIG_BLOCK block/genhd: use seq_put_decimal_ull for diskstats decimal values block: don't reorder requests in blk_mq_add_to_batch block: don't reorder requests in blk_add_rq_to_plug block: add a rq_list type block: remove rq_list_move virtio_blk: reverse request order in virtio_queue_rqs nvme-pci: reverse request order in nvme_queue_rqs btrfs: validate queue limits block: export blk_validate_limits nvmet: add tracing of reservation commands nvme: parse reservation commands's action and rtype to string nvmet: report ns's vwc not present md/raid5: Increase r5conf.cache_name size block: remove the ioprio field from struct request block: remove the write_hint field from struct request nvme: check ns's volatile write cache not present nvme: add rotational support nvme: use command set independent id ns if available ...
2024-11-18Merge tag 'ata-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata updates from Niklas Cassel: - Fix typos in comments (Yan Zhen) - Remove unused macro definitions (Damien Le Moal) - Switch back to the .remove() callback (Uwe Kleine-König) - Make use of the get_unaligned_be24() helper instead of open coding (Andy Shevchenko) - Refactor and cleanup ata_scsi_simulate() command emulation, such that all commands use ata_scsi_rbuf_fill() with its own callback (Damien Le Moal) - Improve ata_scsi_simulate() command emulation by accurately setting the SCSI command residual (number of bytes not filled) in the command reply (Damien Le Moal) - Add missing iommus property in ahci-platform device tree binding (Frank Wunderlich) * tag 'ata-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: dt-bindings: ata: ahci-platform: add missing iommus property ata: libata-scsi: Return residual for emulated SCSI commands ata: libata-scsi: Remove struct ata_scsi_args ata: libata-scsi: Document all VPD page inquiry actors ata: libata-scsi: Refactor ata_scsiop_maint_in() ata: libata-scsi: Refactor ata_scsiop_read_cap() ata: libata-scsi: Refactor ata_scsi_simulate() ata: libata-scsi: Refactor scsi_6_lba_len() with use of get_unaligned_be24() ata: Switch back to struct platform_driver::remove() ata: libata: Remove unused macro definitions ata: Fix typos in the comment
2024-11-18Merge tag 'for-6.13-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Changes outside of btrfs: add io_uring command flag to track a dying task (the rest will go via the block git tree). User visible changes: - wire encoded read (ioctl) to io_uring commands, this can be used on itself, in the future this will allow 'send' to be asynchronous. As a consequence, the encoded read ioctl can also work in non-blocking mode - new ioctl to wait for cleaned subvolumes, no need to use the generic and root-only SEARCH_TREE ioctl, will be used by "btrfs subvol sync" - recognize different paths/symlinks for the same devices and don't report them during rescanning, this can be observed with LVM or DM - seeding device use case change, the sprout device (the one capturing new writes) will not clear the read-only status of the super block; this prevents accumulating space from deleted snapshots Performance improvements: - reduce lock contention when traversing extent buffers - reduce extent tree lock contention when searching for inline backref - switch from rb-trees to xarray for delayed ref tracking, improvements due to better cache locality, branching factors and more compact data structures - enable extent map shrinker again (prevent memory exhaustion under some types of IO load), reworked to run in a single worker thread (there used to be problems causing long stalls under memory pressure) Core changes: - raid-stripe-tree feature updates: - make device replace and scrub work - implement partial deletion of stripe extents - new selftests - split the config option BTRFS_DEBUG and add EXPERIMENTAL for features that are experimental or with known problems so we don't misuse debugging config for that - subpage mode updates (sector < page): - update compression implementations - update writepage, writeback - continued folio API conversions: - buffered writes - make buffered write copy one page at a time, preparatory work for future integration with large folios, may cause performance drop - proper locking of root item regarding starting send - error handling improvements - code cleanups and refactoring: - dead code removal - unused parameter reduction - lockdep assertions" * tag 'for-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (119 commits) btrfs: send: check for read-only send root under critical section btrfs: send: check for dead send root under critical section btrfs: remove check for NULL fs_info at btrfs_folio_end_lock_bitmap() btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl() btrfs: fix a typo in btrfs_use_zone_append btrfs: avoid superfluous calls to free_extent_map() in btrfs_encoded_read() btrfs: simplify logic to decrement snapshot counter at btrfs_mksnapshot() btrfs: remove hole from struct btrfs_delayed_node btrfs: update stale comment for struct btrfs_delayed_ref_node::add_list btrfs: add new ioctl to wait for cleaned subvolumes btrfs: simplify range tracking in cow_file_range() btrfs: remove conditional path allocation in btrfs_read_locked_inode() btrfs: push cleanup into btrfs_read_locked_inode() io_uring/cmd: let cmds to know about dying task btrfs: add struct io_btrfs_cmd as type for io_uring_cmd_to_pdu() btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl) btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages() btrfs: don't sleep in btrfs_encoded_read() if IOCB_NOWAIT is set btrfs: change btrfs_encoded_read() so that reading of extent is done by caller btrfs: remove pointless iocb::ki_pos addition in btrfs_encoded_read() ...
2024-11-18Merge tag 'ext4_for_linus-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A lot of miscellaneous ext4 bug fixes and cleanups this cycle, most notably in the journaling code, bufered I/O, and compiler warning cleanups" * tag 'ext4_for_linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits) jbd2: Fix comment describing journal_init_common() ext4: prevent an infinite loop in the lazyinit thread ext4: use struct_size() to improve ext4_htree_store_dirent() ext4: annotate struct fname with __counted_by() jbd2: avoid dozens of -Wflex-array-member-not-at-end warnings ext4: use str_yes_no() helper function ext4: prevent delalloc to nodelalloc on remount jbd2: make b_frozen_data allocation always succeed ext4: cleanup variable name in ext4_fc_del() ext4: use string choices helpers jbd2: remove the 'success' parameter from the jbd2_do_replay() function jbd2: remove useless 'block_error' variable jbd2: factor out jbd2_do_replay() jbd2: refactor JBD2_COMMIT_BLOCK process in do_one_pass() jbd2: unified release of buffer_head in do_one_pass() jbd2: remove redundant judgments for check v1 checksum ext4: use ERR_CAST to return an error-valued pointer mm: zero range of eof folio exposed by inode size extension ext4: partial zero eof block on unaligned inode size extension ext4: disambiguate the return value of ext4_dio_write_end_io() ...
2024-11-18Merge branch 'for-6.13/bpf' into for-linusJiri Kosina
- improvement of the way hid-bpf coexists with specific drivers (others than hid-generic) that are already bound to devices (Benjamin Tissoires)
2024-11-18Merge branch 'for-6.13/core' into for-linusJiri Kosina
- assorted cleanups and small code fixes (Dmitry Torokhov, Yan Zhen, Nathan Chancellor, Andy Shevchenko)
2024-11-18Merge tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull xattr updates from Al Viro: "Sanitize xattr and io_uring interactions with it, add *xattrat() syscalls, sanitize struct filename handling in there" * tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: xattr: remove redundant check on variable err fs/xattr: add *at family syscalls new helpers: file_removexattr(), filename_removexattr() new helpers: file_listxattr(), filename_listxattr() replace do_getxattr() with saner helpers. replace do_setxattr() with saner helpers. new helper: import_xattr_name() fs: rename struct xattr_ctx to kernel_xattr_ctx xattr: switch to CLASS(fd) io_[gs]etxattr_prep(): just use getname() io_uring: IORING_OP_F[GS]ETXATTR is fine with REQ_F_FIXED_FILE getname_maybe_null() - the third variant of pathname copy-in teach filename_lookup() to treat NULL filename as ""
2024-11-18Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull 'struct fd' class updates from Al Viro: "The bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) deal with the last remaing boolean uses of fd_file() css_set_fork(): switch to CLASS(fd_raw, ...) memcg_write_event_control(): switch to CLASS(fd) assorted variants of irqfd setup: convert to CLASS(fd) do_pollfd(): convert to CLASS(fd) convert do_select() convert vfs_dedupe_file_range(). convert cifs_ioctl_copychunk() convert media_request_get_by_fd() convert spu_run(2) switch spufs_calls_{get,put}() to CLASS() use convert cachestat(2) convert do_preadv()/do_pwritev() fdget(), more trivial conversions fdget(), trivial conversions privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() o2hb_region_dev_store(): avoid goto around fdget()/fdput() introduce "fd_pos" class, convert fdget_pos() users to it. fdget_raw() users: switch to CLASS(fd_raw) convert vmsplice() to CLASS(fd) ...
2024-11-18Merge tag 'vfs-6.13.untorn.writes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs untorn write support from Christian Brauner: "An atomic write is a write issed with torn-write protection. This means for a power failure or any hardware failure all or none of the data from the write will be stored, never a mix of old and new data. This work is already supported for block devices. If a block device is opened with O_DIRECT and the block device supports atomic write, then FMODE_CAN_ATOMIC_WRITE is added to the file of the opened block device. This contains the work to expand atomic write support to filesystems, specifically ext4 and XFS. Currently, only support for writing exactly one filesystem block atomically is added. Since it's now possible to have filesystem block size > page size for XFS, it's possible to write 4K+ blocks atomically on x86" * tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: drop an obsolete comment in iomap_dio_bio_iter ext4: Do not fallback to buffered-io for DIO atomic write ext4: Support setting FMODE_CAN_ATOMIC_WRITE ext4: Check for atomic writes support in write iter ext4: Add statx support for atomic writes xfs: Support setting FMODE_CAN_ATOMIC_WRITE xfs: Validate atomic writes xfs: Support atomic write for statx fs: iomap: Atomic write support fs: Export generic_atomic_write_valid() block: Add bdev atomic write limits helpers fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() block/fs: Pass an iocb to generic_atomic_write_valid()
2024-11-18Merge tag 'vfs-6.13.tmpfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull tmpfs case folding updates from Christian Brauner: "This adds case-insensitive support for tmpfs. The work contained in here adds support for case-insensitive file names lookups in tmpfs. The main difference from other casefold filesystems is that tmpfs has no information on disk, just on RAM, so we can't use mkfs to create a case-insensitive tmpfs. For this implementation, there's a mount option for casefolding. The rest of the patchset follows a similar approach as ext4 and f2fs. The use case for this feature is similar to the use case for ext4, to better support compatibility layers (like Wine), particularly in combination with sandboxing/container tools (like Flatpak). Those containerization tools can share a subset of the host filesystem with an application. In the container, the root directory and any parent directories required for a shared directory are on tmpfs, with the shared directories bind-mounted into the container's view of the filesystem. If the host filesystem is using case-insensitive directories, then the application can do lookups inside those directories in a case-insensitive way, without this needing to be implemented in user-space. However, if the host is only sharing a subset of a case-insensitive directory with the application, then the parent directories of the mount point will be part of the container's root tmpfs. When the application tries to do case-insensitive lookups of those parent directories on a case-sensitive tmpfs, the lookup will fail" * tag 'vfs-6.13.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: tmpfs: Initialize sysfs during tmpfs init tmpfs: Fix type for sysfs' casefold attribute libfs: Fix kernel-doc warning in generic_ci_validate_strict_name docs: tmpfs: Add casefold options tmpfs: Expose filesystem features via sysfs tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs tmpfs: Add casefold lookup support libfs: Export generic_ci_ dentry functions unicode: Recreate utf8_parse_version() unicode: Export latest available UTF-8 version number ext4: Use generic_ci_validate_strict_name helper libfs: Create the helper function generic_ci_validate_strict_name()
2024-11-18Merge tag 'vfs-6.13.usercopy' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull copy_struct_to_user helper from Christian Brauner: "This adds a copy_struct_to_user() helper which is a companion helper to the already widely used copy_struct_from_user(). It copies a struct from kernel space to userspace, in a way that guarantees backwards-compatibility for struct syscall arguments as long as future struct extensions are made such that all new fields are appended to the old struct, and zeroed-out new fields have the same meaning as the old struct. The first user is sched_getattr() system call but the new extensible pidfs ioctl will be ported to it as well" * tag 'vfs-6.13.usercopy' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: sched_getattr: port to copy_struct_to_user uaccess: add copy_struct_to_user helper
2024-11-18Merge tag 'vfs-6.13.ovl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull overlayfs updates from Christian Brauner: "Make overlayfs support specifying layers through file descriptors. Currently overlayfs only allows specifying layers through path names. This is inconvenient for users that want to assemble an overlayfs mount purely based on file descriptors: This enables user to specify both: fsconfig(fd_overlay, FSCONFIG_SET_FD, "upperdir+", NULL, fd_upper); fsconfig(fd_overlay, FSCONFIG_SET_FD, "workdir+", NULL, fd_work); fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower1); fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower2); in addition to: fsconfig(fd_overlay, FSCONFIG_SET_STRING, "upperdir+", "/upper", 0); fsconfig(fd_overlay, FSCONFIG_SET_STRING, "workdir+", "/work", 0); fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower1", 0); fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower2", 0); There's also a large set of new overlayfs selftests to test new features and some older properties" * tag 'vfs-6.13.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests: add test for specifying 500 lower layers selftests: add overlayfs fd mounting selftests selftests: use shared header Documentation,ovl: document new file descriptor based layers ovl: specify layers via file descriptors fs: add helper to use mount option as path or fd
2024-11-18Merge tag 'vfs-6.13.file' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs file updates from Christian Brauner: "This contains changes the changes for files for this cycle: - Introduce a new reference counting mechanism for files. As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop it has O(N^2) behaviour under contention with N concurrent operations and it is in a hot path in __fget_files_rcu(). The rcuref infrastructures remedies this problem by using an unconditional increment relying on safe- and dead zones to make this work and requiring rcu protection for the data structure in question. This not just scales better it also introduces overflow protection. However, in contrast to generic rcuref, files require a memory barrier and thus cannot rely on *_relaxed() atomic operations and also require to be built on atomic_long_t as having massive amounts of reference isn't unheard of even if it is just an attack. This adds a file specific variant instead of making this a generic library. This has been tested by various people and it gives consistent improvement up to 3-5% on workloads with loads of threads. - Add a fastpath for find_next_zero_bit(). Skip 2-levels searching via find_next_zero_bit() when there is a free slot in the word that contains the next fd. This improves pts/blogbench-1.1.0 read by 8% and write by 4% on Intel ICX 160. - Conditionally clear full_fds_bits since it's very likely that a bit in full_fds_bits has been cleared during __clear_open_fds(). This improves pts/blogbench-1.1.0 read up to 13%, and write up to 5% on Intel ICX 160. - Get rid of all lookup_*_fdget_rcu() variants. They were used to lookup files without taking a reference count. That became invalid once files were switched to SLAB_TYPESAFE_BY_RCU and now we're always taking a reference count. Switch to an already existing helper and remove the legacy variants. - Remove pointless includes of <linux/fdtable.h>. - Avoid cmpxchg() in close_files() as nobody else has a reference to the files_struct at that point. - Move close_range() into fs/file.c and fold __close_range() into it. - Cleanup calling conventions of alloc_fdtable() and expand_files(). - Merge __{set,clear}_close_on_exec() into one. - Make __set_open_fd() set cloexec as well instead of doing it in two separate steps" * tag 'vfs-6.13.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests: add file SLAB_TYPESAFE_BY_RCU recycling stressor fs: port files to file_ref fs: add file_ref expand_files(): simplify calling conventions make __set_open_fd() set cloexec state as well fs: protect backing files with rcu file.c: merge __{set,clear}_close_on_exec() alloc_fdtable(): change calling conventions. fs/file.c: add fast path in find_next_fd() fs/file.c: conditionally clear full_fds fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd() move close_range(2) into fs/file.c, fold __close_range() into it close_files(): don't bother with xchg() remove pointless includes of <linux/fdtable.h> get rid of ...lookup...fdget_rcu() family
2024-11-18Merge tag 'vfs-6.13.pagecache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs pagecache updates from Christian Brauner: "Cleanup filesystem page flag usage: This continues the work to make the mappedtodisk/owner_2 flag available to filesystems which don't use buffer heads. Further patches remove uses of Private2. This brings us very close to being rid of it entirely" * tag 'vfs-6.13.pagecache' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: migrate: Remove references to Private2 ceph: Remove call to PagePrivate2() btrfs: Switch from using the private_2 flag to owner_2 mm: Remove PageMappedToDisk nilfs2: Convert nilfs_copy_buffer() to use folios fs: Move clearing of mappedtodisk to buffer.c
2024-11-18Merge tag 'vfs-6.13.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Fixup and improve NLM and kNFSD file lock callbacks Last year both GFS2 and OCFS2 had some work done to make their locking more robust when exported over NFS. Unfortunately, part of that work caused both NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send lock notifications to clients This in itself is not a huge problem because most NFS clients will still poll the server in order to acquire a conflicted lock It's important for NLM and kNFSD that they do not block their kernel threads inside filesystem's file_lock implementations because that can produce deadlocks. We used to make sure of this by only trusting that posix_lock_file() can correctly handle blocking lock calls asynchronously, so the lock managers would only setup their file_lock requests for async callbacks if the filesystem did not define its own lock() file operation However, when GFS2 and OCFS2 grew the capability to correctly handle blocking lock requests asynchronously, they started signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting posix_lock_file() was inadvertently dropped, so now most filesystems no longer produce lock notifications when exported over NFS Fix this by using an fop_flag which greatly simplifies the problem and grooms the way for future uses by both filesystems and lock managers alike - Add a sysctl to delete the dentry when a file is removed instead of making it a negative dentry Commit 681ce8623567 ("vfs: Delete the associated dentry when deleting a file") introduced an unconditional deletion of the associated dentry when a file is removed. However, this led to performance regressions in specific benchmarks, such as ilebench.sum_operations/s, prompting a revert in commit 4a4be1ad3a6e ("Revert "vfs: Delete the associated dentry when deleting a file""). This reintroduces the concept conditionally through a sysctl - Expand the statmount() system call: * Report the filesystem subtype in a new fs_subtype field to e.g., report fuse filesystem subtypes * Report the superblock source in a new sb_source field * Add a new way to return filesystem specific mount options in an option array that returns filesystem specific mount options separated by zero bytes and unescaped. This allows caller's to retrieve filesystem specific mount options and immediately pass them to e.g., fsconfig() without having to unescape or split them * Report security (LSM) specific mount options in a separate security option array. We don't lump them together with filesystem specific mount options as security mount options are generic and most users aren't interested in them The format is the same as for the filesystem specific mount option array - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command - Optimize acl_permission_check() to avoid costly {g,u}id ownership checks if possible - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict() - Add synchronous wakeup support for ep_poll_callback. Currently, epoll only uses wake_up() to wake up task. But sometimes there are epoll users which want to use the synchronous wakeup flag to give a hint to the scheduler, e.g., the Android binder driver. So add a wake_up_sync() define, and use wake_up_sync() when sync is true in ep_poll_callback() Fixes: - Fix kernel documentation for inode_insert5() and iget5_locked() - Annotate racy epoll check on file->f_ep - Make F_DUPFD_QUERY associative - Avoid filename buffer overrun in initramfs - Don't let statmount() return empty strings - Add a cond_resched() to dump_user_range() to avoid hogging the CPU - Don't query the device logical blocksize multiple times for hfsplus - Make filemap_read() check that the offset is positive or zero Cleanups: - Various typo fixes - Cleanup wbc_attach_fdatawrite_inode() - Add __releases annotation to wbc_attach_and_unlock_inode() - Add hugetlbfs tracepoints - Fix various vfs kernel doc parameters - Remove obsolete TODO comment from io_cancel() - Convert wbc_account_cgroup_owner() to take a folio - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add() - Reorder struct posix_acl to save 8 bytes - Annotate struct posix_acl with __counted_by() - Replace one-element array with flexible array member in freevxfs - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()" * tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) statmount: retrieve security mount options vfs: make evict() use smp_mb__after_spinlock instead of smp_mb statmount: add flag to retrieve unescaped options fs: add the ability for statmount() to report the sb_source writeback: wbc_attach_fdatawrite_inode out of line writeback: add a __releases annoation to wbc_attach_and_unlock_inode fs: add the ability for statmount() to report the fs_subtype fs: don't let statmount return empty strings fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel() hfsplus: don't query the device logical block size multiple times freevxfs: Replace one-element array with flexible array member fs: optimize acl_permission_check() initramfs: avoid filename buffer overrun fs/writeback: convert wbc_account_cgroup_owner to take a folio acl: Annotate struct posix_acl with __counted_by() acl: Realign struct posix_acl to save 8 bytes epoll: Add synchronous wakeup support for ep_poll_callback coredump: add cond_resched() to dump_user_range mm/page-writeback.c: Fix comment of wb_domain_writeout_add() mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL ...
2024-11-18PCI: endpoint: Fix pci_epc_map map_size kerneldoc stringRick Wertenbroek
Because some endpoint controllers have requirements on the alignment of the controller physical memory address that must be used to map a RC PCI address region, the map PCI start address is not necessarily the desired PCI base address to be mapped. This can result in map_pci_addr being lower than pci_addr as documented. This results in map_size covering the range map_pci_addr..pci_addr+pci_size. The old text had the pci_addr twice instead of map_pci_addr..pci_addr, so replace the erroneous kerneldoc string to reflect the actual range. Link: https://lore.kernel.org/r/20241114161032.3046202-1-rick.wertenbroek@gmail.com Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2024-11-18nvme: define the remaining used sgls constantsKeith Busch
This provides a little more context when reading the code than hardcoded magic numbers. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-18nvme-pci: add support for sgl metadataKeith Busch
Supporting this mode allows creating and merging multi-segment metadata requests that wouldn't be possible otherwise. It also allows directly using user space requests that straddle physically discontiguous pages. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-18Merge tag 'vfs-6.13.mgtime' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs multigrain timestamps from Christian Brauner: "This is another try at implementing multigrain timestamps. This time with significant help from the timekeeping maintainers to reduce the performance impact. Thomas provided a base branch that contains the required timekeeping interfaces for the VFS. It serves as the base for the multi-grain timestamp work: - Multigrain timestamps allow the kernel to use fine-grained timestamps when an inode's attributes is being actively observed via ->getattr(). With this support, it's possible for a file to get a fine-grained timestamp, and another modified after it to get a coarse-grained stamp that is earlier than the fine-grained time. If this happens then the files can appear to have been modified in reverse order, which breaks VFS ordering guarantees. To prevent this, a floor value is maintained for multigrain timestamps. Whenever a fine-grained timestamp is handed out, record it, and when later coarse-grained stamps are handed out, ensure they are not earlier than that value. If the coarse-grained timestamp is earlier than the fine-grained floor, return the floor value instead. The timekeeper changes add a static singleton atomic64_t into timekeeper.c that is used to keep track of the latest fine-grained time ever handed out. This is tracked as a monotonic ktime_t value to ensure that it isn't affected by clock jumps. Because it is updated at different times than the rest of the timekeeper object, the floor value is managed independently of the timekeeper via a cmpxchg() operation, and sits on its own cacheline. Two new public timekeeper interfaces are added: (1) ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the coarse-grained clock and the floor time (2) ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries to swap it into the floor. A timespec64 is filled with the result. - The VFS has always used coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide when to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This adds a way to only use fine-grained timestamps when they are being actively queried. Use the (unused) top bit in inode->i_ctime_nsec as a flag that indicates whether the current timestamps have been queried via stat() or the like. When it's set, we allow the kernel to use a fine-grained timestamp iff it's necessary to make the ctime show a different value. This solves the problem of being able to distinguish the timestamp between updates, but introduces a new problem: it's now possible for a file being changed to get a fine-grained timestamp. A file that is altered just a bit later can then get a coarse-grained one that appears older than the earlier fine-grained time. This violates timestamp ordering guarantees. This is where the earlier mentioned timkeeping interfaces help. A global monotonic atomic64_t value is kept that acts as a timestamp floor. When we go to stamp a file, we first get the latter of the current floor value and the current coarse-grained time. If the inode ctime hasn't been queried then we just attempt to stamp it with that value. If it has been queried, then first see whether the current coarse time is later than the existing ctime. If it is, then we accept that value. If it isn't, then we get a fine-grained time and try to swap that into the global floor. Whether that succeeds or fails, we take the resulting floor time, convert it to realtime and try to swap that into the ctime. We take the result of the ctime swap whether it succeeds or fails, since either is just as valid. Filesystems can opt into this by setting the FS_MGTIME fstype flag. Others should be unaffected (other than being subject to the same floor value as multigrain filesystems)" * tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: reduce pointer chasing in is_mgtime() test tmpfs: add support for multigrain timestamps btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps Documentation: add a new file documenting multigrain timestamps fs: add percpu counters for significant multigrain timestamp events fs: tracepoints around multigrain timestamp events fs: handle delegated timestamps in setattr_copy_mgtime timekeeping: Add percpu counter for tracking floor swap events timekeeping: Add interfaces for handling timestamps with a floor value fs: have setattr_copy handle multigrain timestamps appropriately fs: add infrastructure for multigrain timestamps
2024-11-18libceph: Remove unused ceph_osdc_watch_checkDr. David Alan Gilbert
ceph_osdc_watch_check() has been unused since it was added in commit b07d3c4bd727 ("libceph: support for checking on status of watch") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-11-18libceph: Remove unused pagevec functionsDr. David Alan Gilbert
ceph_copy_user_to_page_vector() has been unused since 2013's commit e8344e668915 ("ceph: Implement writev/pwritev for sync operation.") ceph_copy_to_page_vector() has been unused since 2012's commit 913d2fdcf605 ("rbd: always pass ops array to rbd_req_sync_op()") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-11-18libceph: Remove unused ceph_pagelist functionsDr. David Alan Gilbert
ceph_pagelist_truncate() and ceph_pagelist_set_cursor() have been unused since commit 39be95e9c8c0 ("ceph: ceph_pagelist_append might sleep while atomic") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-11-18rtc: m48t59: Use platform_data struct for year offset valueFinn Thain
Instead of hard-coded values and ifdefs, store the year offset in the platform_data struct. Tested-by: Daniel Palmer <daniel@0x0f.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Tested-by: Andreas Larsson <andreas@gaisler.com> Acked-by: Andreas Larsson <andreas@gaisler.com> Link: https://lore.kernel.org/r/665c3526184a8d0c4a6373297d8e7d9a12591d8b.1731450735.git.fthain@linux-m68k.org Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-11-18net/udp: Add 4-tuple hash list basisPhilo Lu
Add a new hash list, hash4, in udp table. It will be used to implement 4-tuple hash for connected udp sockets. This patch adds the hlist to table, and implements helpers and the initialization. 4-tuple hash is implemented in the following patch. hash4 uses hlist_nulls to avoid moving wrongly onto another hlist due to concurrent rehash, because rehash() can happen with lookup(). Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com> Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com> Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-16compiler.h: Fix undefined BUILD_BUG_ON_ZERO()Philipp Reisner
<linux/compiler.h> defines __must_be_array() and __must_be_cstr() and both expand to BUILD_BUG_ON_ZERO(), but <linux/build_bug.h> defines BUILD_BUG_ON_ZERO(). Including <linux/build_bug.h> in <linux/compiler.h> would create a cyclic dependency as <linux/build_bug.h> already includes <linux/compiler.h>. Fix that by defining __BUILD_BUG_ON_ZERO_MSG() in <linux/compiler.h> and using that for __must_be_array() and __must_be_cstr(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Link: https://lore.kernel.org/r/20241115204602.249590-1-philipp.reisner@linbit.com Signed-off-by: Kees Cook <kees@kernel.org>
2024-11-16Merge tag 'mm-hotfixes-stable-2024-11-16-15-33' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "10 hotfixes, 7 of which are cc:stable. All singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: revert "mm: shmem: fix data-race in shmem_getattr()" ocfs2: uncache inode which has failed entering the group mm: fix NULL pointer dereference in alloc_pages_bulk_noprof mm, doc: update read_ahead_kb for MADV_HUGEPAGE fs/proc/task_mmu: prevent integer overflow in pagemap_scan_get_args() sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32 mm/mremap: fix address wraparound in move_page_tables() tools/mm: fix compile error mm, swap: fix allocation and scanning race with swapoff
2024-11-16thermal: Add PCIe cooling driverIlpo Järvinen
Add a thermal cooling driver to provide path to access PCIe bandwidth controller using the usual thermal interfaces. A cooling device is instantiated for controllable PCIe Ports from the bwctrl service driver. If registering the cooling device fails, allow bwctrl's probe to succeed regardless. As cdev in that case contains IS_ERR() pseudo "pointer", clean that up inside the probe function so the remove side doesn't need to suddenly make an odd looking IS_ERR() check. The thermal side state 0 means no throttling, i.e., maximum supported PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-9-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: dropped data->cdev test per https://lore.kernel.org/r/ZzRm1SJTwEMRsAr8@wunner.de] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> # From the cooling device interface perspective
2024-11-16PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link SpeedIlpo Järvinen
Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>