summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-05-15SUNRPC: Remove svc_fill_write_vector()Chuck Lever
Clean up: This API is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: De-duplicate the svc_fill_write_vector() call sitesChuck Lever
All three call sites do the same thing. I'm struggling with this a bit, however. struct xdr_buf is an XDR layer object and unmarshaling a WRITE payload is clearly a task intended to be done by the proc and xdr functions, not by VFS. This feels vaguely like a layering violation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15sunrpc: Replace the rq_bvec array with dynamically-allocated memoryChuck Lever
As a step towards making NFSD's maximum rsize and wsize variable at run-time, replace the fixed-size rq_bvec[] array in struct svc_rqst with a chunk of dynamically-allocated memory. The rq_bvec[] array contains enough bio_vecs to handle each page in a maximum size RPC message. On a system with 8-byte pointers and 4KB pages, pahole reports that the rq_bvec[] array is 4144 bytes. This patch replaces that array with a single 8-byte pointer field. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15sunrpc: Replace the rq_pages array with dynamically-allocated memoryChuck Lever
As a step towards making NFSD's maximum rsize and wsize variable at run-time, replace the fixed-size rq_vec[] array in struct svc_rqst with a chunk of dynamically-allocated memory. On a system with 8-byte pointers and 4KB pages, pahole reports that the rq_pages[] array is 2080 bytes. This patch replaces that with a single 8-byte pointer field. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15sunrpc: Add a helper to derive maxpages from sv_max_mesgChuck Lever
This page count is to be used to allocate various arrays of pages and bio_vecs, replacing the fixed RPCSVC_MAXPAGES value. The documenting comment is somewhat stale -- of course NFSv4 COMPOUND procedures may have multiple payloads. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11sunrpc: add info about xprt queue times to svc_xprt_dequeue tracepointJeff Layton
I've been looking at a problem where we see increased RPC timeouts in clients when the nfs_layout_flexfiles dataserver_timeo value is tuned very low (6s). This is necessary to ensure quick failover to a different mirror if a server goes down, but it causes a lot more major RPC timeouts. Ultimately, the problem is server-side however. It's sometimes doesn't respond to connection attempts. My theory is that the interrupt handler runs when a connection comes in, the xprt ends up being enqueued, but it takes a significant amount of time for the nfsd thread to pick it up. Currently, the svc_xprt_dequeue tracepoint displays "wakeup-us". This is the time between the wake_up() call, and the thread dequeueing the xprt. If no thread was woken, or the thread ended up picking up a different xprt than intended, then this value won't tell us how long the xprt was waiting. Add a new xpt_qtime field to struct svc_xprt and set it in svc_xprt_enqueue(). When the dequeue tracepoint fires, also store the time that the xprt sat on the queue in total. Display it as "qtime-us". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11Merge tag 'timers-urgent-2025-05-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc timers fixes from Ingo Molnar: - Fix time keeping bugs in CLOCK_MONOTONIC_COARSE clocks - Work around absolute relocations into vDSO code that GCC erroneously emits in certain arm64 build environments - Fix a false positive lockdep warning in the i8253 clocksource driver * tag 'timers-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable() arm64: vdso: Work around invalid absolute relocations from GCC timekeeping: Prevent coarse clocks going backwards
2025-05-10Merge tag 'mm-hotfixes-stable-2025-05-10-14-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "22 hotfixes. 13 are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. About half are for MM. Five OCFS2 fixes and a few MAINTAINERS updates" * tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) mm: fix folio_pte_batch() on XEN PV nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs() mm/hugetlb: copy the CMA flag when demoting mm, swap: fix false warning for large allocation with !THP_SWAP selftests/mm: fix a build failure on powerpc selftests/mm: fix build break when compiling pkey_util.c mm: vmalloc: support more granular vrealloc() sizing tools/testing/selftests: fix guard region test tmpfs assumption ocfs2: stop quota recovery before disabling quotas ocfs2: implement handshaking with ocfs2 recovery thread ocfs2: switch osb->disable_recovery to enum mailmap: map Uwe's BayLibre addresses to a single one MAINTAINERS: add mm THP section mm/userfaultfd: fix uninitialized output field for -EAGAIN race selftests/mm: compaction_test: support platform with huge mount of memory MAINTAINERS: add core mm section ocfs2: fix panic in failed foilio allocation mm/huge_memory: fix dereferencing invalid pmd migration entry MAINTAINERS: add reverse mapping section x86: disable image size check for test builds ...
2025-05-10Merge tag 'char-misc-6.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc/IIO driver fixes from Greg KH: "Here are a bunch of small driver fixes (mostly all IIO) for 6.15-rc6. Included in here are: - loads of tiny IIO driver fixes for reported issues - hyperv driver fix for a much-reported and worked on sysfs ring buffer creation bug All of these have been in linux-next for over a week (the IIO ones for many weeks now), with no reported issues" * tag 'char-misc-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits) Drivers: hv: Make the sysfs node size for the ring buffer dynamic uio_hv_generic: Fix sysfs creation path for ring buffer iio: adis16201: Correct inclinometer channel resolution iio: adc: ad7606: fix serial register access iio: pressure: mprls0025pa: use aligned_s64 for timestamp iio: imu: adis16550: align buffers for timestamp staging: iio: adc: ad7816: Correct conditional logic for store mode iio: adc: ad7266: Fix potential timestamp alignment issue. iio: adc: ad7768-1: Fix insufficient alignment of timestamp. iio: adc: dln2: Use aligned_s64 for timestamp iio: accel: adxl355: Make timestamp 64-bit aligned using aligned_s64 iio: temp: maxim-thermocouple: Fix potential lack of DMA safe buffer. iio: chemical: pms7003: use aligned_s64 for timestamp iio: chemical: sps30: use aligned_s64 for timestamp iio: imu: inv_mpu6050: align buffer for timestamp iio: imu: st_lsm6dsx: Fix wakeup source leaks on device unbind iio: adc: qcom-spmi-iadc: Fix wakeup source leaks on device unbind iio: accel: fxls8962af: Fix wakeup source leaks on device unbind iio: adc: ad7380: fix event threshold shift iio: hid-sensor-prox: Fix incorrect OFFSET calculation ...
2025-05-08Merge tag 'net-6.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN, WiFi and netfilter. We have still a comple of regressions open due to the recent drivers locking refactor. The patches are in-flight, but not ready yet. Current release - regressions: - core: lock netdevices during dev_shutdown - sch_htb: make htb_deactivate() idempotent - eth: virtio-net: don't re-enable refill work too early Current release - new code bugs: - eth: icssg-prueth: fix kernel panic during concurrent Tx queue access Previous releases - regressions: - gre: fix again IPv6 link-local address generation. - eth: b53: fix learning on VLAN unaware bridges Previous releases - always broken: - wifi: fix out-of-bounds access during multi-link element defragmentation - can: - initialize spin lock on device probe - fix order of unregistration calls - openvswitch: fix unsafe attribute parsing in output_userspace() - eth: - virtio-net: fix total qstat values - mtk_eth_soc: reset all TX queues on DMA free - fbnic: firmware IPC mailbox fixes" * tag 'net-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits) virtio-net: fix total qstat values net: export a helper for adding up queue stats fbnic: Do not allow mailbox to toggle to ready outside fbnic_mbx_poll_tx_ready fbnic: Pull fbnic_fw_xmit_cap_msg use out of interrupt context fbnic: Improve responsiveness of fbnic_mbx_poll_tx_ready fbnic: Cleanup handling of completions fbnic: Actually flush_tx instead of stalling out fbnic: Add additional handling of IRQs fbnic: Gate AXI read/write enabling on FW mailbox fbnic: Fix initialization of mailbox descriptor rings net: dsa: b53: do not set learning and unicast/multicast on up net: dsa: b53: fix learning on VLAN unaware bridges net: dsa: b53: fix toggling vlan_filtering net: dsa: b53: do not program vlans when vlan filtering is off net: dsa: b53: do not allow to configure VLAN 0 net: dsa: b53: always rejoin default untagged VLAN on bridge leave net: dsa: b53: fix VLAN ID for untagged vlan on bridge leave net: dsa: b53: fix flushing old pvid VLAN on pvid change net: dsa: b53: fix clearing PVID of a port net: dsa: b53: keep CPU port always tagged again ...
2025-05-07mm: vmalloc: support more granular vrealloc() sizingKees Cook
Introduce struct vm_struct::requested_size so that the requested (re)allocation size is retained separately from the allocated area size. This means that KASAN will correctly poison the correct spans of requested bytes. This also means we can support growing the usable portion of an allocation that can already be supported by the existing area's existing allocation. Link: https://lkml.kernel.org/r/20250426001105.it.679-kees@kernel.org Fixes: 3ddc2fefe6f3 ("mm: vmalloc: implement vrealloc()") Signed-off-by: Kees Cook <kees@kernel.org> Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/all/20250408192503.6149a816@outsider.home/ Reviewed-by: Danilo Krummrich <dakr@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-06Merge tag 'wireless-2025-05-06' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple of fixes: * iwlwifi: add two missing device entries * cfg80211: fix a potential out-of-bounds access * mac80211: fix format of TID to link mapping action frames * tag 'wireless-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: add support for Killer on MTL wifi: mac80211: fix the type of status_code for negotiated TID to Link Mapping wifi: cfg80211: fix out-of-bounds access during multi-link element defragmentation ==================== Link: https://patch.msgid.link/20250506203506.158818-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-06net: add missing instance lock to dev_set_promiscuityStanislav Fomichev
Accidentally spotted while trying to understand what else needs to be renamed to netif_ prefix. Most of the calls to dev_set_promiscuity are adjacent to dev_set_allmulti or dev_disable_lro so it should be safe to add the lock. Note that new netif_set_promiscuity is currently unused, the locked paths call __dev_set_promiscuity directly. Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250506011919.2882313-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-06wifi: mac80211: fix the type of status_code for negotiated TID to Link MappingMichael-CY Lee
The status code should be type of __le16. Fixes: 83e897a961b8 ("wifi: ieee80211: add definitions for negotiated TID to Link map") Fixes: 8f500fbc6c65 ("wifi: mac80211: process and save negotiated TID to Link mapping request") Signed-off-by: Michael-CY Lee <michael-cy.lee@mediatek.com> Link: https://patch.msgid.link/20250505081946.3927214-1-michael-cy.lee@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-02Merge tag 'pm-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix three recent regressions, two in cpufreq and one in the Intel Soundwire driver, and an unchecked MSR access in the intel_pstate driver: - Fix a recent regression causing systems where frequency tables are used by cpufreq to have issues with setting frequency limits (Rafael Wysocki) - Fix a recent regressions causing frequency boost settings to become out-of-sync if platform firmware updates the registers associated with frequency boost during system resume (Viresh Kumar) - Fix a recent regression causing resume failures to occur in the Intel Soundwire driver if the device handled by it is in runtime suspend before a system-wide suspend (Rafael Wysocki) - Fix an unchecked MSR aceess in the intel_pstate driver occurring when CPUID indicates no turbo, but the driver attempts to enable turbo frequencies due to a misleading value read from an MSR (Srinivas Pandruvada)" * tag 'pm-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode soundwire: intel_auxdevice: Fix system suspend/resume handling cpufreq: Fix setting policy limits when frequency tables are used cpufreq: ACPI: Re-sync CPU boost state on system resume
2025-05-02Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge cpufreq fixes for 6.15-rc5: - Fix a recent regression causing systems where frequency tables are used by cpufreq to have issues with setting frequency limits (Rafael Wysocki). - Fix a recent regressions causing frequency boost settings to become out-of-sync if platform firmware updates the registers associated with them during system resume (Viresh Kumar). - Fix an unchecked MSR aceess in the intel_pstate driver occurring when CPUID indicates no turbo, but the driver attempts to enable turbo frequencies due to a misleading value read from an MSR (Srinivas Pandruvada). * pm-cpufreq: cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode cpufreq: Fix setting policy limits when frequency tables are used cpufreq: ACPI: Re-sync CPU boost state on system resume
2025-05-02Merge tag 'iommu-fixes-v6.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: "ARM-SMMU fixes: - Fix broken detection of the S2FWB feature - Ensure page-size bitmap is initialised for SVA domains - Fix handling of SMMU client devices with duplicate Stream IDs - Don't fail SMMU probe if Stream IDs are aliased across clients Intel VT-d fixes: - Add quirk for IGFX device - Revert an ATS change to fix a boot failure AMD IOMMU: - Fix potential buffer overflow Core: - Fix for iommu_copy_struct_from_user()" * tag 'iommu-fixes-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57) iommu/vt-d: Revert ATS timing change to fix boot failure iommu: Fix two issues in iommu_copy_struct_from_user() iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihid iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream ids iommu/arm-smmu-v3: Fix pgsize_bit for sva domains iommu/arm-smmu-v3: Add missing S2FWB feature detection
2025-05-02uio_hv_generic: Fix sysfs creation path for ring bufferNaman Jain
On regular bootup, devices get registered to VMBus first, so when uio_hv_generic driver for a particular device type is probed, the device is already initialized and added, so sysfs creation in hv_uio_probe() works fine. However, when the device is removed and brought back, the channel gets rescinded and the device again gets registered to VMBus. However this time, the uio_hv_generic driver is already registered to probe for that device and in this case sysfs creation is tried before the device's kobject gets initialized completely. Fix this by moving the core logic of sysfs creation of ring buffer, from uio_hv_generic to HyperV's VMBus driver, where the rest of the sysfs attributes for the channels are defined. While doing that, make use of attribute groups and macros, instead of creating sysfs directly, to ensure better error handling and code flow. Problematic path: vmbus_process_offer (A new offer comes for the VMBus device) vmbus_add_channel_work vmbus_device_register |-> device_register | |... | |-> hv_uio_probe | |... | |-> sysfs_create_bin_file (leads to a warning as | the primary channel's kobject, which is used to | create the sysfs file, is not yet initialized) |-> kset_create_and_add |-> vmbus_add_channel_kobj (initialization of the primary channel's kobject happens later) Above code flow is sequential and the warning is always reproducible in this path. Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel") Cc: stable@kernel.org Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com> Suggested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Link: https://lore.kernel.org/r/20250502074811.2022-2-namjain@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-30Merge tag 'modules-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull modules fixes from Petr Pavlu: "A single series to properly handle the module_kobject creation. This fixes a problem with missing /sys/module/<module>/drivers for built-in modules" * tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: drivers: base: handle module_kobject creation kernel: globalize lookup_or_create_module_kobject() kernel: refactor lookup_or_create_module_kobject() kernel: param: rename locate_module_kobject
2025-04-29Merge tag 'fsnotify_for_v6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fix from Jan Kara: "A fix for the recently merged mount notification support" * tag 'fsnotify_for_v6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: selftests/fs/mount-notify: test also remove/flush of mntns marks fanotify: fix flush of mntns marks
2025-04-28cpufreq: Fix setting policy limits when frequency tables are usedRafael J. Wysocki
Commit 7491cdf46b5c ("cpufreq: Avoid using inconsistent policy->min and policy->max") overlooked the fact that policy->min and policy->max were accessed directly in cpufreq_frequency_table_target() and in the functions called by it. Consequently, the changes made by that commit led to problems with setting policy limits. Address this by passing the target frequency limits to __resolve_freq() and cpufreq_frequency_table_target() and propagating them to the functions called by the latter. Fixes: 7491cdf46b5c ("cpufreq: Avoid using inconsistent policy->min and policy->max") Cc: 5.16+ <stable@vger.kernel.org> # 5.16+ Closes: https://lore.kernel.org/linux-pm/aAplED3IA_J0eZN0@linaro.org/ Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Stephan Gerhold <stephan.gerhold@linaro.org> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/5896780.DvuYhMxLoT@rjwysocki.net
2025-04-28timekeeping: Prevent coarse clocks going backwardsThomas Gleixner
Lei Chen raised an issue with CLOCK_MONOTONIC_COARSE seeing time inconsistencies. Lei tracked down that this was being caused by the adjustment: tk->tkr_mono.xtime_nsec -= offset; which is made to compensate for the unaccumulated cycles in offset when the multiplicator is adjusted forward, so that the non-_COARSE clockids don't see inconsistencies. However, the _COARSE clockid getter functions use the adjusted xtime_nsec value directly and do not compensate the negative offset via the clocksource delta multiplied with the new multiplicator. In that case the caller can observe time going backwards in consecutive calls. By design, this negative adjustment should be fine, because the logic run from timekeeping_adjust() is done after it accumulated approximately multiplicator * interval_cycles into xtime_nsec. The accumulated value is always larger then the mult_adj * offset value, which is subtracted from xtime_nsec. Both operations are done together under the tk_core.lock, so the net change to xtime_nsec is always always be positive. However, do_adjtimex() calls into timekeeping_advance() as well, to apply the NTP frequency adjustment immediately. In this case, timekeeping_advance() does not return early when the offset is smaller then interval_cycles. In that case there is no time accumulated into xtime_nsec. But the subsequent call into timekeeping_adjust(), which modifies the multiplicator, subtracts from xtime_nsec to correct for the new multiplicator. Here because there was no accumulation, xtime_nsec becomes smaller than before, which opens a window up to the next accumulation, where the _COARSE clockid getters, which don't compensate for the offset, can observe the inconsistency. This has been tried to be fixed by forwarding the timekeeper in the case that adjtimex() adjusts the multiplier, which resets the offset to zero: 757b000f7b93 ("timekeeping: Fix possible inconsistencies in _COARSE clockids") That works correctly, but unfortunately causes a regression on the adjtimex() side. There are two issues: 1) The forwarding of the base time moves the update out of the original period and establishes a new one. 2) The clearing of the accumulated NTP error is changing the behaviour as well. User-space expects that multiplier/frequency updates are in effect, when the syscall returns, so delaying the update to the next tick is not solving the problem either. Commit 757b000f7b93 was reverted so that the established expectations of user space implementations (ntpd, chronyd) are restored, but that obviously brought the inconsistencies back. One of the initial approaches to fix this was to establish a separate storage for the coarse time getter nanoseconds part by calculating it from the offset. That was dropped on the floor because not having yet another state to maintain was simpler. But given the result of the above exercise, this solution turns out to be the right one. Bring it back in a slightly modified form. Thus introduce timekeeper::coarse_nsec and store that nanoseconds part in it, switch the time getter functions and the VDSO update to use that value. coarse_nsec is set on operations which forward or initialize the timekeeper and after time was accumulated during a tick. If there is no accumulation the timestamp is unchanged. This leaves the adjtimex() behaviour unmodified and prevents coarse time from going backwards. [ jstultz: Simplified the coarse_nsec calculation and kept behavior so coarse clockids aren't adjusted on each inter-tick adjtimex call, slightly reworked the comments and commit message ] Fixes: da15cfdae033 ("time: Introduce CLOCK_REALTIME_COARSE") Reported-by: Lei Chen <lei.chen@smartx.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/20250419054706.2319105-1-jstultz@google.com Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/
2025-04-25Merge tag 'vfs-6.15-rc4.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - For some reason we went from zero to three maintainers for HFS/HFS+ in a matter of days. The lesson to learn from this might just be that we need to threaten code removal more often!? - Fix a regression introduced by enabling large folios for lage logical block sizes. This has caused issues for noref migration with large folios due to sleeping while in an atomic context. New sleeping variants of pagecache lookup helpers are introduced. These helpers take the folio lock instead of the mapping's private spinlock. The problematic users are converted to the sleeping variants and serialize against noref migration. Atomic users will bail on seeing the new BH_Migrate flag. This also shrinks the critical region of the mapping's private lock and the new blocking callers reduce contention on the spinlock for bdev mappings. - Fix two bugs in do_move_mount() when with MOVE_MOUNT_BENEATH. The first bug is using a mountpoint that is located on a mount we're not holding a reference to. The second bug is putting the mountpoint after we've called namespace_unlock() as it's no longer guaranteed that it does stay a mountpoint. - Remove a pointless call to vfs_getattr_nosec() in the devtmpfs code just to query i_mode instead of simply querying the inode directly. This also avoids lifetime issues for the dm code by an earlier bugfix this cycle that moved bdev_statx() handling into vfs_getattr_nosec(). - Fix AT_FDCWD handling with getname_maybe_null() in the xattr code. - Fix a performance regression for files when multiple callers issue a close when it's not the last reference. - Remove a duplicate noinline annotation from pipe_clear_nowait(). * tag 'vfs-6.15-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/xattr: Fix handling of AT_FDCWD in setxattrat(2) and getxattrat(2) MAINTAINERS: hfs/hfsplus: add myself as maintainer splice: remove duplicate noinline from pipe_clear_nowait devtmpfs: don't use vfs_getattr_nosec to query i_mode fix a couple of races in MNT_TREE_BENEATH handling by do_move_mount() fs: fall back to file_ref_put() for non-last reference mm/migrate: fix sleep in atomic for large folios and buffer heads fs/ext4: use sleeping version of sb_find_get_block() fs/jbd2: use sleeping version of __find_get_block() fs/ocfs2: use sleeping version of __find_get_block() fs/buffer: use sleeping version of __find_get_block() fs/buffer: introduce sleeping flavors for pagecache lookups MAINTAINERS: add HFS/HFS+ maintainers fs/buffer: split locking for pagecache lookups
2025-04-25Merge tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A small CephFS encryption-related fix and a dead code cleanup" * tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-client: ceph: Fix incorrect flush end position calculation ceph: Remove osd_client deadcode
2025-04-25Merge tag 'block-6.15-20250424' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix autoloading of drivers from stat*(2) - Fix losing read-ahead setting one suspend/resume, when a device is re-probed. - Fix race between setting the block size and page cache updates. Includes a helper that a coming XFS fix will use as well. - ublk cancelation fixes. - ublk selftest additions and fixes. - NVMe pull via Christoph: - fix an out-of-bounds access in nvmet_enable_port (Richard Weinberger) * tag 'block-6.15-20250424' of git://git.kernel.dk/linux: ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA block: don't autoload drivers on blk-cgroup configuration block: don't autoload drivers on stat block: remove the backing_inode variable in bdev_statx block: move blkdev_{get,put} _no_open prototypes out of blkdev.h block: never reduce ra_pages in blk_apply_bdi_limits selftests: ublk: common: fix _get_disk_dev_t for pre-9.0 coreutils selftests: ublk: remove useless 'delay_us' from 'struct dev_ctx' selftests: ublk: fix recover test block: hoist block size validation code to a separate function block: fix race between set_blocksize and read paths nvmet: fix out-of-bounds access in nvmet_enable_port
2025-04-25Merge tag 'driver-core-6.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fixes from Greg KH: "Here are some small driver core fixes to resolve a number of reported problems. Included in here are: - driver core sync fix revert to resolve a much reported problem, hopefully this is finally resolved - MAINTAINERS file update, documenting that the driver-core tree is now under a "shared" maintainership model, thanks to Rafael and Danilo for offering to do this! - auxbus documentation and MAINTAINERS file update - MAINTAINERS file update for Rust PCI code - firmware rust binding fixup - software node link fix All of these have been in linux-next for over a week with no reported issues" * tag 'driver-core-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: drivers/base/memory: Avoid overhead from for_each_present_section_nr() software node: Prevent link creation failure from causing kobj reference count imbalance device property: Add a note to the fwnode.h drivers/base: Add myself as auxiliary bus reviewer drivers/base: Extend documentation with preferred way to use auxbus driver core: fix potential NULL pointer dereference in dev_uevent() driver core: introduce device_set_driver() helper Revert "drivers: core: synchronize really_probe() and dev_uevent()" MAINTAINERS: update the location of the driver-core git tree rust: firmware: Use `ffi::c_char` type in `FwFunc` MAINTAINERS: pci: add entry for Rust PCI code
2025-04-25Merge tag 'dma-mapping-6.15-2025-04-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-maping fixes from Marek Szyprowski: - avoid unused variable warnings (Arnd Bergmann, Marek Szyprowski) - add runtume warnings and debug messages for devices with limited DMA capabilities (Balbir Singh, Chen-Yu Tsai) * tag 'dma-mapping-6.15-2025-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-coherent: Warn if OF reserved memory is beyond current coherent DMA mask dma-mapping: Fix warning reported for missing prototype dma-mapping: avoid potential unused data compilation warning dma/mapping.c: dev_dbg support for dma_addressing_limited dma/contiguous: avoid warning about unused size_bytes
2025-04-24Merge tag 'net-6.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "No fixes from any subtree. Current release - regressions: - net: fix the missing unlock for detached devices Previous releases - regressions: - sched: fix UAF vulnerability in HFSC qdisc - lwtunnel: disable BHs when required - mptcp: pm: defer freeing of MPTCP userspace path manager entries - tipc: fix NULL pointer dereference in tipc_mon_reinit_self() - eth: virtio-net: disable delayed refill when pausing rx Previous releases - always broken: - phylink: fix suspend/resume with WoL enabled and link down - eth: - mlx5: fix null-ptr-deref in mlx5_create_{inner_,}ttc_table() - xen-netfront: handle NULL returned by xdp_convert_buff_to_frame() - enetc: fix frame corruption on bpf_xdp_adjust_head/tail() and XDP_PASS - stmmac: fix dwmac1000 ptp timestamp status offset - pds_core: prevent possible adminq overflow/stuck condition Misc: - a bunch of MAINTAINERS updates" * tag 'net-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (32 commits) net: stmmac: fix multiplication overflow when reading timestamp net: stmmac: fix dwmac1000 ptp timestamp status offset net: dp83822: Fix OF_MDIO config check pds_core: make wait_context part of q_info pds_core: Remove unnecessary check in pds_client_adminq_cmd() pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result pds_core: Prevent possible adminq overflow/stuck condition net: dsa: mt7530: sync driver-specific behavior of MT7531 variants selftests/tc-testing: Add test for HFSC queue emptying during peek operation net_sched: hfsc: Fix a potential UAF in hfsc_dequeue() too net_sched: hfsc: Fix a UAF vulnerability in class handling selftests: mptcp: diag: use mptcp_lib_get_info_value mptcp: pm: Defer freeing of MPTCP userspace path manager entries net: ethernet: mtk_eth_soc: net: revise NETSYSv3 hardware configuration tipc: fix NULL pointer dereference in tipc_mon_reinit_self() virtio-net: disable delayed refill when pausing rx net: phy: leds: fix memory leak net: phylink: mac_link_(up|down)() clarifications net: phylink: fix suspend/resume with WoL enabled and link down net: lwtunnel: disable BHs when required ...
2025-04-24block: move blkdev_{get,put} _no_open prototypes out of blkdev.hChristoph Hellwig
These are only to be used by block internal code. Remove the comment as we grew more users due to reworking block device node opening. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20250423053810.1683309-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-24fanotify: fix flush of mntns marksAmir Goldstein
fanotify_mark(fd, FAN_MARK_FLUSH | FAN_MARK_MNTNS, ...) incorrectly ends up causing removal inode marks. Fixes: 0f46d81f2bce ("fanotify: notify on mount attach and detach") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250418193903.2607617-2-amir73il@gmail.com
2025-04-23block: hoist block size validation code to a separate functionDarrick J. Wong
Hoist the block size validation code to bdev_validate_blocksize so that we can call it from filesystems that don't care about the bdev pagecache manipulations of set_blocksize. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/174543795720.4139148.840349813093799165.stgit@frogsfrogsfrogs Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-23locking/local_lock: fix _Generic() matching of local_trylock_tVlastimil Babka
Michael Larabel reported [1] a nginx performance regression in v6.15-rc3 and bisected it to commit 51339d99c013 ("locking/local_lock, mm: replace localtry_ helpers with local_trylock_t type") The problem is the _Generic() usage with a default association that masks the fact that "local_trylock_t *" association is not being selected as expected. Replacing the default with the only other expected type "local_lock_t *" reveals the underlying problem: include/linux/local_lock_internal.h:174:26: error: ‘_Generic’ selector of type ‘__seg_gs local_lock_t *’ is not compatible with any association The local_locki's are part of __percpu structures and thus the __percpu attribute is needed to associate the type properly. Add the attribute and keep the default replaced to turn any further mismatches into compile errors. The failure to recognize local_try_lock_t in __local_lock_release() means that a local_trylock[_irqsave]() operation will set tl->acquired to 1 (there's no _Generic() part in the trylock code), but then local_unlock[_irqrestore]() will not set tl->acquired back to 0, so further trylock operations will always fail on the same cpu+lock, while non-trylock operations continue to work - a lockdep_assert() is also not being executed in the _Generic() part of local_lock() code. This means consume_stock() and refill_stock() operations will fail deterministically, resulting in taking the slow paths and worse performance. Fixes: 51339d99c013 ("locking/local_lock, mm: replace localtry_ helpers with local_trylock_t type") Reported-by: Michael Larabel <Michael@phoronix.com> Closes: https://www.phoronix.com/review/linux-615-nginx-regression/2 [1] Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-23Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "A small number of fixes: - virtgpu is exempt from reset shutdown fow now - a more complete fix is in the works - spec compliance fixes in: - virtio-pci cap commands - vhost_scsi_send_bad_target - virtio console resize - missing locking fix in vhost-scsi - virtio ring - a KCSAN false positive fix - VHOST_*_OWNER documentation fix" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-scsi: Fix vhost_scsi_send_status() vhost-scsi: Fix vhost_scsi_send_bad_target() vhost-scsi: protect vq->log_used with vq->mutex vhost_task: fix vhost_task_create() documentation virtio_console: fix order of fields cols and rows virtio_console: fix missing byte order handling for cols and rows virtgpu: don't reset on shutdown virtio_ring: Fix data race by tagging event_triggered as racy for KCSAN vhost: fix VHOST_*_OWNER documentation virtio_pci: Use self group type for cap commands
2025-04-22net: phylink: mac_link_(up|down)() clarificationsRussell King (Oracle)
As a result of an email from the fbnic author, I reviewed the phylink documentation, and I have decided to clarify the wording in the mac_link_(up|down)() kernel documentation as this was written from the point of view of mvneta/mvpp2 and is misleading. The documentation talks about forcing the link - indeed, this is what is done in the mvneta and mvpp2 drivers but not at the physical layer but the MACs idea, which has the effect of only allowing or stopping packet flow at the MAC. This "link" needs to be controlled when using a PHY or fixed link to start or stop packet flow at the MAC. However, as the MAC and PCS are tightly integrated, if the MACs idea of the link is forced down, it has the side effect that there is no way to determine that the media link has come up - in this mode, the MAC must be allowed to follow its built-in PCS so we can read the link state. Frame the documentation in more generic terms, to avoid the thought that the physical media link to the partner needs in some way to be forced up or down with these calls; it does not. If that were to be done, it would be a self-fulfilling prophecy - e.g. if the media link goes down, then mac_link_down() will be called, and if the media link is then placed into a forced down state, there is no possibility that the media link will ever come up again - clearly this is a wrong interpretation. These methods are notifications to the MAC about what has happened to the media link state - either from the PHY, or a PCS, or whatever mechanism fixed-link is using. Thus, reword them to get away from talking about changing link state to avoid confusion with media link state. This is not a change of any requirements of these methods. Also, remove the obsolete references to EEE for these methods, we now have the LPI functions for configuring the EEE parameters which renders this redundant, and also makes the passing of "phy" to the mac_link_up() function obsolete. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/E1u5Ah5-001GO1-7E@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22Merge tag 'for-6.15-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - subpage mode fixes: - access correct object (folio) when looking up bit offset - fix assertion condition for number of blocks per folio - fix upper boundary of locking range in hole punch - zoned fixes: - fix potential deadlock caught by lockdep when zone reporting and device freeze run in parallel - fix zone write pointer mismatch and NULL pointer dereference when metadata are converted from DUP to RAID1 - fix error handling when reloc inode creation fails - in tree-checker, unify error code for header level check - block layer: add helpers to read zone capacity * tag 'for-6.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: skip reporting zone for new block group block: introduce zone capacity helper btrfs: tree-checker: adjust error code for header level check btrfs: fix invalid inode pointer after failure to create reloc inode btrfs: zoned: return EIO on RAID1 block group write pointer mismatch btrfs: fix the ASSERT() inside GET_SUBPAGE_BITMAP() btrfs: avoid page_lockend underflow in btrfs_punch_hole_lock_range() btrfs: subpage: access correct object when reading bitmap start in subpage_calc_start_bit()
2025-04-22fs: fall back to file_ref_put() for non-last referenceMateusz Guzik
This reduces the slowdown in face of multiple callers issuing close on what turns out to not be the last reference. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250418125756.59677-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202504171513.6d6f8a16-lkp@intel.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22mm/migrate: fix sleep in atomic for large folios and buffer headsDavidlohr Bueso
The large folio + buffer head noref migration scenarios are being naughty and blocking while holding a spinlock. As a consequence of the pagecache lookup path taking the folio lock this serializes against migration paths, so they can wait for each other. For the private_lock atomic case, a new BH_Migrate flag is introduced which enables the lookup to bail. This allows the critical region of the private_lock on the migration path to be reduced to the way it was before ebdf4de5642fb6 ("mm: migrate: fix reference check race between __find_get_block() and migration"), that is covering the count checks. The scope is always noref migration. Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: syzbot+f3c6fda1297c748a7076@syzkaller.appspotmail.com Closes: https://lore.kernel.org/oe-lkp/202503101536.27099c77-lkp@intel.com Fixes: 3c20917120ce61 ("block/bdev: enable large folio support for large logical block sizes") Reviewed-by: Jan Kara <jack@suse.cz> Co-developed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-8-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev # [0] [1] Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-22fs/buffer: introduce sleeping flavors for pagecache lookupsDavidlohr Bueso
Add __find_get_block_nonatomic() and sb_find_get_block_nonatomic() calls for which users will be converted where safe. These versions will take the folio lock instead of the mapping's private_lock. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-3-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-19Merge tag 'mm-hotfixes-stable-2025-04-19-21-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "16 hotfixes. 2 are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. All patches are basically for MM although five are alterations to MAINTAINERS" [ Basic counting skills are clearly not a strictly necessary requirement for kernel maintainers. - Linus ] * tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: add section for locking of mm's and VMAs mm: vmscan: fix kswapd exit condition in defrag_mode mm: vmscan: restore high-cpu watermark safety in kswapd MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization mm, hugetlb: increment the number of pages to be reset on HVO writeback: fix false warning in inode_to_wb() docs: ABI: replace mcroce@microsoft.com with new Meta address mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable() MAINTAINERS: add memory advice section MAINTAINERS: add mmap trace events to MEMORY MAPPING mm: memcontrol: fix swap counter leak from offline cgroup MAINTAINERS: add MM subsection for the page allocator MAINTAINERS: update SLAB ALLOCATOR maintainers fs/dax: fix folio splitting issue by resetting old folio order + _nr_pages mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()
2025-04-19Merge tag 'vfs-6.15-rc3.fixes.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Revert the hfs{plus} deprecation warning that's also included in this pull request. The commit introducing the deprecation warning resides rather early in this branch. So simply dropping it would've rebased all other commits which I decided to avoid. Hence the revert in the same branch [ Background - the deprecation warning discussion resulted in people stepping up, and so hfs{plus} will have a maintainer taking care of it after all.. - Linus ] - Switch CONFIG_SYSFS_SYCALL default to n and decouple from CONFIG_EXPERT - Fix an audit bug caused by changes to our kernel path lookup helpers this cycle. Audit needs the parent path even if the dentry it tried to look up is negative - Ensure that the kernel path lookup helpers leave the passed in path argument clean when they return an error. This is consistent with all our other helpers - Ensure that vfs_getattr_nosec() calls bdev_statx() so the relevant information is available to kernel consumers as well - Don't set a timer and call schedule() if the timer will expire immediately in epoll - Make netfs lookup tables with __nonstring * tag 'vfs-6.15-rc3.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: Revert "hfs{plus}: add deprecation warning" fs: move the bdex_statx call to vfs_getattr_nosec netfs: Mark __nonstring lookup tables eventpoll: Set epoll timeout if it's in the future fs: ensure that *path_locked*() helpers leave passed path pristine fs: add kern_path_locked_negative() hfs{plus}: add deprecation warning Kconfig: switch CONFIG_SYSFS_SYCALL default to n
2025-04-19Merge tag 'nfsd-6.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - v6.15 libcrc clean-up makes invalid configurations possible - Fix a potential deadlock introduced during the v6.15 merge window * tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: decrease sc_count directly if fail to queue dl_recall nfs: add missing selections of CONFIG_CRC32
2025-04-18Merge tag 'irq-urgent-2025-04-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc irq fixes from Ingo Molnar: - Fix BCM2712 irqchip driver Kconfig dependencies required on the Raspberry PI5 - Fix spurious interrupts on RZ/G3E SMARC EVK systems - Fix crash regression on Sun/NIU hardware - Apply MSI driver quirk for Sun Neptune chips * tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-bcm2712-mip: Enable driver when ARCH_BCM2835 is enabled irqchip/renesas-rzv2h: Prevent TINT spurious interrupt net/niu: Niu requires MSIX ENTRY_DATA fields touch before entry reads PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads
2025-04-18virtgpu: don't reset on shutdownMichael S. Tsirkin
It looks like GPUs are used after shutdown is invoked. Thus, breaking virtio gpu in the shutdown callback is not a good idea - guest hangs attempting to finish console drawing, with these warnings: [ 20.504464] WARNING: CPU: 0 PID: 568 at drivers/gpu/drm/virtio/virtgpu_vq.c:358 virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.505685] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common nfit libnvdimm kvm_intel kvm rapl iTCO_wdt iTCO_vendor_support virtio_gpu virtio_dma_buf pcspkr drm_shmem_helper i2c_i801 drm_kms_helper lpc_ich i2c_smbus virtio_balloon joydev drm fuse xfs libcrc32c ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata virtio_net ghash_clmulni_intel net_failover virtio_blk failover serio_raw dm_mirror dm_region_hash dm_log dm_mod [ 20.511847] CPU: 0 PID: 568 Comm: kworker/0:3 Kdump: loaded Tainted: G W ------- --- 5.14.0-578.6675_1757216455.el9.x86_64 #1 [ 20.513157] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20241117-3.el9 11/17/2024 [ 20.513918] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper] [ 20.514626] RIP: 0010:virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.515332] Code: 00 00 48 85 c0 74 0c 48 8b 78 08 48 89 ee e8 51 50 00 00 65 ff 0d 42 e3 74 3f 0f 85 69 ff ff ff 0f 1f 44 00 00 e9 5f ff ff ff <0f> 0b e9 3f ff ff ff 48 83 3c 24 00 74 0e 49 8b 7f 40 48 85 ff 74 [ 20.517272] RSP: 0018:ff34f0a8c0787ad8 EFLAGS: 00010282 [ 20.517820] RAX: 00000000fffffffb RBX: 0000000000000000 RCX: 0000000000000820 [ 20.518565] RDX: 0000000000000000 RSI: ff34f0a8c0787be0 RDI: ff218bef03a26300 [ 20.519308] RBP: ff218bef03a26300 R08: 0000000000000001 R09: ff218bef07224360 [ 20.520059] R10: 0000000000008dc0 R11: 0000000000000002 R12: ff218bef02630028 [ 20.520806] R13: ff218bef0263fb48 R14: ff218bef00cb8000 R15: ff218bef07224360 [ 20.521555] FS: 0000000000000000(0000) GS:ff218bef7ba00000(0000) knlGS:0000000000000000 [ 20.522397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 20.522996] CR2: 000055ac4f7871c0 CR3: 000000010b9f2002 CR4: 0000000000771ef0 [ 20.523740] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 20.524477] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 20.525223] PKRU: 55555554 [ 20.525515] Call Trace: [ 20.525777] <TASK> [ 20.526003] ? show_trace_log_lvl+0x1c4/0x2df [ 20.526464] ? show_trace_log_lvl+0x1c4/0x2df [ 20.526925] ? virtio_gpu_queue_fenced_ctrl_buffer+0x82/0x2c0 [virtio_gpu] [ 20.527643] ? virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.528282] ? __warn+0x7e/0xd0 [ 20.528621] ? virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.529256] ? report_bug+0x100/0x140 [ 20.529643] ? handle_bug+0x3c/0x70 [ 20.530010] ? exc_invalid_op+0x14/0x70 [ 20.530421] ? asm_exc_invalid_op+0x16/0x20 [ 20.530862] ? virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.531506] ? virtio_gpu_queue_ctrl_sgs+0x174/0x290 [virtio_gpu] [ 20.532148] virtio_gpu_queue_fenced_ctrl_buffer+0x82/0x2c0 [virtio_gpu] [ 20.532843] virtio_gpu_primary_plane_update+0x3e2/0x460 [virtio_gpu] [ 20.533520] drm_atomic_helper_commit_planes+0x108/0x320 [drm_kms_helper] [ 20.534233] drm_atomic_helper_commit_tail+0x45/0x80 [drm_kms_helper] [ 20.534914] commit_tail+0xd2/0x130 [drm_kms_helper] [ 20.535446] drm_atomic_helper_commit+0x11b/0x140 [drm_kms_helper] [ 20.536097] drm_atomic_commit+0xa4/0xe0 [drm] [ 20.536588] ? __pfx___drm_printfn_info+0x10/0x10 [drm] [ 20.537162] drm_atomic_helper_dirtyfb+0x192/0x270 [drm_kms_helper] [ 20.537823] drm_fbdev_shmem_helper_fb_dirty+0x43/0xa0 [drm_shmem_helper] [ 20.538536] drm_fb_helper_damage_work+0x87/0x160 [drm_kms_helper] [ 20.539188] process_one_work+0x194/0x380 [ 20.539612] worker_thread+0x2fe/0x410 [ 20.540007] ? __pfx_worker_thread+0x10/0x10 [ 20.540456] kthread+0xdd/0x100 [ 20.540791] ? __pfx_kthread+0x10/0x10 [ 20.541190] ret_from_fork+0x29/0x50 [ 20.541566] </TASK> [ 20.541802] ---[ end trace 0000000000000000 ]--- It looks like the shutdown is called in the middle of console drawing, so we should either wait for it to finish, or let drm handle the shutdown. This patch implements this second option: Add an option for drivers to bypass the common break+reset handling. As DRM is careful to flush/synchronize outstanding buffers, it looks like GPU can just have a NOP there. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Fixes: 8bd2fa086a04 ("virtio: break and reset virtio devices on device_shutdown()") Cc: Eric Auger <eauger@redhat.com> Cc: Jocelyn Falempe <jfalempe@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <8490dbeb6f79ed039e6c11d121002618972538a3.1744293540.git.mst@redhat.com>
2025-04-17mm: vmscan: restore high-cpu watermark safety in kswapdJohannes Weiner
Vlastimil points out that commit a211c6550efc ("mm: page_alloc: defrag_mode kswapd/kcompactd watermarks") switched kswapd from zone_watermark_ok_safe() to the standard, percpu-cached version of reading free pages, thus dropping the watermark safety precautions for systems with high CPU counts (e.g. >212 cpus on 64G). Restore them. Since zone_watermark_ok_safe() is no longer the right interface, and this was the last caller of the function anyway, open-code the zone_page_state_snapshot() conditional and delete the function. Link: https://lkml.kernel.org/r/20250416135142.778933-2-hannes@cmpxchg.org Fixes: a211c6550efc ("mm: page_alloc: defrag_mode kswapd/kcompactd watermarks") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17writeback: fix false warning in inode_to_wb()Andreas Gruenbacher
inode_to_wb() is used also for filesystems that don't support cgroup writeback. For these filesystems inode->i_wb is stable during the lifetime of the inode (it points to bdi->wb) and there's no need to hold locks protecting the inode->i_wb dereference. Improve the warning in inode_to_wb() to not trigger for these filesystems. Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17fs/dax: fix folio splitting issue by resetting old folio order + _nr_pagesDavid Hildenbrand
Alison reports an issue with fsdax when large extends end up using large ZONE_DEVICE folios: [ 417.796271] BUG: kernel NULL pointer dereference, address: 0000000000000b00 [ 417.796982] #PF: supervisor read access in kernel mode [ 417.797540] #PF: error_code(0x0000) - not-present page [ 417.798123] PGD 2a5c5067 P4D 2a5c5067 PUD 2a5c6067 PMD 0 [ 417.798690] Oops: Oops: 0000 [#1] SMP NOPTI [ 417.799178] CPU: 5 UID: 0 PID: 1515 Comm: mmap Tainted: ... [ 417.800150] Tainted: [O]=OOT_MODULE [ 417.800583] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 417.801358] RIP: 0010:__lruvec_stat_mod_folio+0x7e/0x250 [ 417.801948] Code: ... [ 417.803662] RSP: 0000:ffffc90002be3a08 EFLAGS: 00010206 [ 417.804234] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000002 [ 417.804984] RDX: ffffffff815652d7 RSI: 0000000000000000 RDI: ffffffff82a2beae [ 417.805689] RBP: ffffc90002be3a28 R08: 0000000000000000 R09: 0000000000000000 [ 417.806384] R10: ffffea0007000040 R11: ffff888376ffe000 R12: 0000000000000001 [ 417.807099] R13: 0000000000000012 R14: ffff88807fe4ab40 R15: ffff888029210580 [ 417.807801] FS: 00007f339fa7a740(0000) GS:ffff8881fa9b9000(0000) knlGS:0000000000000000 [ 417.808570] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 417.809193] CR2: 0000000000000b00 CR3: 000000002a4f0004 CR4: 0000000000370ef0 [ 417.809925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 417.810622] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 417.811353] Call Trace: [ 417.811709] <TASK> [ 417.812038] folio_add_file_rmap_ptes+0x143/0x230 [ 417.812566] insert_page_into_pte_locked+0x1ee/0x3c0 [ 417.813132] insert_page+0x78/0xf0 [ 417.813558] vmf_insert_page_mkwrite+0x55/0xa0 [ 417.814088] dax_fault_iter+0x484/0x7b0 [ 417.814542] dax_iomap_pte_fault+0x1ca/0x620 [ 417.815055] dax_iomap_fault+0x39/0x40 [ 417.815499] __xfs_write_fault+0x139/0x380 [ 417.815995] ? __handle_mm_fault+0x5e5/0x1a60 [ 417.816483] xfs_write_fault+0x41/0x50 [ 417.816966] xfs_filemap_fault+0x3b/0xe0 [ 417.817424] __do_fault+0x31/0x180 [ 417.817859] __handle_mm_fault+0xee1/0x1a60 [ 417.818325] ? debug_smp_processor_id+0x17/0x20 [ 417.818844] handle_mm_fault+0xe1/0x2b0 [...] The issue is that when we split a large ZONE_DEVICE folio to order-0 ones, we don't reset the order/_nr_pages. As folio->_nr_pages overlays page[1]->memcg_data, once page[1] is a folio, it suddenly looks like it has folio->memcg_data set. And we never manually initialize folio->memcg_data in fsdax code, because we never expect it to be set at all. When __lruvec_stat_mod_folio() then stumbles over such a folio, it tries to use folio->memcg_data (because it's non-NULL) but it does not actually point at a memcg, resulting in the problem. Alison also observed that these folios sometimes have "locked" set, which is rather concerning (folios locked from the beginning ...). The reason is that the order for large folios is stored in page[1]->flags, which become the folio->flags of a new small folio. Let's fix it by adding a folio helper to clear order/_nr_pages for splitting purposes. Maybe we should reinitialize other large folio flags / folio members as well when splitting, because they might similarly cause harm once page[1] becomes a folio? At least other flags in PAGE_FLAGS_SECOND should not be set for fsdax, so at least page[1]->flags might be as expected with this fix. From a quick glimpse, initializing ->mapping, ->pgmap and ->share should re-initialize most things from a previous page[1] used by large folios that fsdax cares about. For example folio->private might not get reinitialized, but maybe that's not relevant -- no traces of it's use in fsdax code. Needs a closer look. Another thing that should be considered in the future is performing similar checks as we perform in free_tail_page_prepare() -- checking pincount etc. -- when freeing a large fsdax folio. Link: https://lkml.kernel.org/r/20250410091020.119116-1-david@redhat.com Fixes: 4996fc547f5b ("mm: let _folio_nr_pages overlay memcg_data in first tail page") Fixes: 38607c62b34b ("fs/dax: properly refcount fs dax pages") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Alison Schofield <alison.schofield@intel.com> Closes: https://lkml.kernel.org/r/Z_W9Oeg-D9FhImf3@aschofie-mobl2.lan Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: "Darrick J. Wong" <djwong@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()Kirill A. Shutemov
When the last page in the zone is accepted, __accept_page() calls static_branch_dec(). This function takes cpu_hotplug_lock, which can lead to a deadlock if the allocation occurs during CPU bringup path as _cpu_up() also takes the lock. To prevent this deadlock, defer static_branch_dec() to a workqueue. Call static_branch_dec() only when the workqueue is not yet initialized. Workqueues are initialized before CPU bring up, so this will not conflict with the first scenario. Link: https://lkml.kernel.org/r/20250329171030.3942298-1-kirill.shutemov@linux.intel.com Fixes: 55ad43e8ba0f ("mm: add a helper to accept page") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Srikanth Aithal <sraithal@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17Merge tag 'sound-6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes. All are device-specific like quirks, new IDs, and other safe (or rather boring) changes" * tag 'sound-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: firmware: cs_dsp: test_bin_error: Fix uninitialized data used as fw version ASoC: codecs: Add of_match_table for aw888081 driver ASoC: fsl: fsl_qmc_audio: Reset audio data pointers on TRIGGER_START event mailmap: Add entry for Srinivas Kandagatla MAINTAINERS: use kernel.org alias ASoC: cs42l43: Reset clamp override on jack removal ALSA: hda/realtek - Fixed ASUS platform headset Mic issue ALSA: hda/cirrus_scodec_test: Don't select dependencies ALSA: azt2320: Replace deprecated strcpy() with strscpy() ASoC: hdmi-codec: use RTD ID instead of DAI ID for ELD entry ASoC: Intel: avs: Constrain path based on BE capabilities ALSA: hda/tas2781: Remove unnecessary NULL check before release_firmware() ASoC: Intel: avs: Fix null-ptr-deref in avs_component_probe() ASoC: fsl_asrc_dma: get codec or cpu dai from backend ASoC: qcom: Fix sc7280 lpass potential buffer overflow ASoC: dwc: always enable/disable i2s irqs ASoC: Intel: sof_sdw: Add quirk for Asus Zenbook S16 ASoC: codecs:lpass-wsa-macro: Fix logic of enabling vi channels ASoC: codecs:lpass-wsa-macro: Fix vi feedback rate
2025-04-17Merge tag 'platform-drivers-x86-v6.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers fixes from Ilpo Järvinen: "Fixes: - amd/pmf: Fix STT limits - asus-laptop: Fix an uninitialized variable - intel_pmc_ipc: Allow building without ACPI - mlxbf-bootctl: Use sysfs_emit_at() in secure_boot_fuse_state_show() - msi-wmi-platform: Add locking to workaround ACPI firmware bug New HW support: - alienware-wmi-wmax: - Extended thermal control support to: - Alienware Area-51m R2 - Alienware m16 R1 - Alienware m16 R2 - Dell G16 7630 - Dell G5 5505 SE - G-Mode support to Alienware m16 R1 - x86-android-tablets: Add Vexia Edu Atla 10 tablet 5V data" * tag 'platform-drivers-x86-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: msi-wmi-platform: Workaround a ACPI firmware bug platform/x86: msi-wmi-platform: Rename "data" variable platform/x86: alienware-wmi-wmax: Extend support to more laptops platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1 platform/x86: amd: pmf: Fix STT limits mlxbf-bootctl: use sysfs_emit_at() in secure_boot_fuse_state_show() platform/x86: x86-android-tablets: Add Vexia Edu Atla 10 tablet 5V data platform/x86: x86-android-tablets: Add "9v" to Vexia EDU ATLA 10 tablet symbols asus-laptop: Fix an uninitialized variable platform/x86: intel_pmc_ipc: add option to build without ACPI
2025-04-17iommu: Fix two issues in iommu_copy_struct_from_user()Nicolin Chen
In the review for iommu_copy_struct_to_user() helper, Matt pointed out that a NULL pointer should be rejected prior to dereferencing it: https://lore.kernel.org/all/86881827-8E2D-461C-BDA3-FA8FD14C343C@nvidia.com And Alok pointed out a typo at the same time: https://lore.kernel.org/all/480536af-6830-43ce-a327-adbd13dc3f1d@oracle.com Since both issues were copied from iommu_copy_struct_from_user(), fix them first in the current header. Fixes: e9d36c07bb78 ("iommu: Add iommu_copy_struct_from_user helper") Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com> Link: https://lore.kernel.org/r/20250414191635.450472-1-nicolinc@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>