summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-11-05sysfs: treewide: constify attribute callback of bin_attribute::mmap()Thomas Weißschuh
The mmap() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-6-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05sysfs: treewide: constify attribute callback of bin_is_visible()Thomas Weißschuh
The is_bin_visible() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-5-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05sysfs: introduce callback attribute_group::bin_sizeThomas Weißschuh
Several drivers need to dynamically calculate the size of an binary attribute. Currently this is done by assigning attr->size from the is_bin_visible() callback. This has drawbacks: * It is not documented. * A single attribute can be instantiated multiple times, overwriting the shared size field. * It prevents the structure to be moved to read-only memory. Introduce a new dedicated callback to calculate the size of the attribute. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-2-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05iommu/vt-d: Remove unused dmar_msi_readDr. David Alan Gilbert
dmar_msi_read() has been unused since 2022 in commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture") Remove it. (dmar_msi_write still exists and is used once). Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20241022002702.302728-1-linux@treblig.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05perf/core: Add aux_pause, aux_resume, aux_start_pausedAdrian Hunter
Hardware traces, such as instruction traces, can produce a vast amount of trace data, so being able to reduce tracing to more specific circumstances can be useful. The ability to pause or resume tracing when another event happens, can do that. Add ability for an event to "pause" or "resume" AUX area tracing. Add aux_pause bit to perf_event_attr to indicate that, if the event happens, the associated AUX area tracing should be paused. Ditto aux_resume. Do not allow aux_pause and aux_resume to be set together. Add aux_start_paused bit to perf_event_attr to indicate to an AUX area event that it should start in a "paused" state. Add aux_paused to struct hw_perf_event for AUX area events to keep track of the "paused" state. aux_paused is initialized to aux_start_paused. Add PERF_EF_PAUSE and PERF_EF_RESUME modes for ->stop() and ->start() callbacks. Call as needed, during __perf_event_output(). Add aux_in_pause_resume to struct perf_buffer to prevent races with the NMI handler. Pause/resume in NMI context will miss out if it coincides with another pause/resume. To use aux_pause or aux_resume, an event must be in a group with the AUX area event as the group leader. Example (requires Intel PT and tools patches also): $ perf record --kcore -e intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/ uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.043 MB perf.data ] $ perf script --call-trace uname 30805 [000] 24001.058782799: name: 0x7ffc9c1865b0 uname 30805 [000] 24001.058784424: psb offs: 0 uname 30805 [000] 24001.058784424: cbr: 39 freq: 3904 MHz (139%) uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __x64_sys_newuname uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) down_read uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __cond_resched uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) in_lock_functions uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_sub uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) up_read uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) in_lock_functions uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) preempt_count_sub uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) _copy_to_user uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_to_user_mode uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_work uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) perf_syscall_exit uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_alloc uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_get_recursion_context uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_tp_event uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_update uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) tracing_gen_ctx_irq_test uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_event uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __perf_event_account_interrupt uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __this_cpu_preempt_check uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_output_forward uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_aux_pause uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) ring_buffer_get uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_lock uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_unlock uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) pt_event_stop uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) native_write_msr uname 30805 [000] 24001.058785463: ([kernel.kallsyms]) native_write_msr uname 30805 [000] 24001.058785639: 0x0 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: James Clark <james.clark@arm.com> Link: https://lkml.kernel.org/r/20241022155920.17511-3-adrian.hunter@intel.com
2024-11-05sched: Add Lazy preemption modelPeter Zijlstra
Change fair to use resched_curr_lazy(), which, when the lazy preemption model is selected, will set TIF_NEED_RESCHED_LAZY. This LAZY bit will be promoted to the full NEED_RESCHED bit on tick. As such, the average delay between setting LAZY and actually rescheduling will be TICK_NSEC/2. In short, Lazy preemption will delay preemption for fair class but will function as Full preemption for all the other classes, most notably the realtime (RR/FIFO/DEADLINE) classes. The goal is to bridge the performance gap with Voluntary, such that we might eventually remove that option entirely. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20241007075055.331243614@infradead.org
2024-11-05sched: Add TIF_NEED_RESCHED_LAZY infrastructurePeter Zijlstra
Add the basic infrastructure to split the TIF_NEED_RESCHED bit in two. Either bit will cause a resched on return-to-user, but only TIF_NEED_RESCHED will drive IRQ preemption. No behavioural change intended. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20241007075055.219540785@infradead.org
2024-11-05sched/ext: Remove sched_fork() hackThomas Gleixner
Instead of solving the underlying problem of the double invocation of __sched_fork() for idle tasks, sched-ext decided to hack around the issue by partially clearing out the entity struct to preserve the already enqueued node. A provided analysis and solution has been ignored for four months. Now that someone else has taken care of cleaning it up, remove the disgusting hack and clear out the full structure. Remove the comment in the structure declaration as well, as there is no requirement for @node being the last element anymore. Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/87ldy82wkc.ffs@tglx
2024-11-05kcsan, seqlock: Fix incorrect assumption in read_seqbegin()Marco Elver
During testing of the preceding changes, I noticed that in some cases, current->kcsan_ctx.in_flat_atomic remained true until task exit. This is obviously wrong, because _all_ accesses for the given task will be treated as atomic, resulting in false negatives i.e. missed data races. Debugging led to fs/dcache.c, where we can see this usage of seqlock: struct dentry *d_lookup(const struct dentry *parent, const struct qstr *name) { struct dentry *dentry; unsigned seq; do { seq = read_seqbegin(&rename_lock); dentry = __d_lookup(parent, name); if (dentry) break; } while (read_seqretry(&rename_lock, seq)); [...] As can be seen, read_seqretry() is never called if dentry != NULL; consequently, current->kcsan_ctx.in_flat_atomic will never be reset to false by read_seqretry(). Give up on the wrong assumption of "assume closing read_seqretry()", and rely on the already-present annotations in read_seqcount_begin/retry(). Fixes: 88ecd153be95 ("seqlock, kcsan: Add annotations for KCSAN") Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241104161910.780003-6-elver@google.com
2024-11-05seqlock, treewide: Switch to non-raw seqcount_latch interfaceMarco Elver
Switch all instrumentable users of the seqcount_latch interface over to the non-raw interface. Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241104161910.780003-5-elver@google.com
2024-11-05kcsan, seqlock: Support seqcount_latch_tMarco Elver
While fuzzing an arm64 kernel, Alexander Potapenko reported: | BUG: KCSAN: data-race in ktime_get_mono_fast_ns / timekeeping_update | | write to 0xffffffc082e74248 of 56 bytes by interrupt on cpu 0: | update_fast_timekeeper kernel/time/timekeeping.c:430 [inline] | timekeeping_update+0x1d8/0x2d8 kernel/time/timekeeping.c:768 | timekeeping_advance+0x9e8/0xb78 kernel/time/timekeeping.c:2344 | update_wall_time+0x18/0x38 kernel/time/timekeeping.c:2360 | [...] | | read to 0xffffffc082e74258 of 8 bytes by task 5260 on cpu 1: | __ktime_get_fast_ns kernel/time/timekeeping.c:372 [inline] | ktime_get_mono_fast_ns+0x88/0x174 kernel/time/timekeeping.c:489 | init_srcu_struct_fields+0x40c/0x530 kernel/rcu/srcutree.c:263 | init_srcu_struct+0x14/0x20 kernel/rcu/srcutree.c:311 | [...] | | value changed: 0x000002f875d33266 -> 0x000002f877416866 | | Reported by Kernel Concurrency Sanitizer on: | CPU: 1 UID: 0 PID: 5260 Comm: syz.2.7483 Not tainted 6.12.0-rc3-dirty #78 This is a false positive data race between a seqcount latch writer and a reader accessing stale data. Since its introduction, KCSAN has never understood the seqcount_latch interface (due to being unannotated). Unlike the regular seqlock interface, the seqcount_latch interface for latch writers never has had a well-defined critical section, making it difficult to teach tooling where the critical section starts and ends. Introduce an instrumentable (non-raw) seqcount_latch interface, with which we can clearly denote writer critical sections. This both helps readability and tooling like KCSAN to understand when the writer is done updating all latch copies. Fixes: 88ecd153be95 ("seqlock, kcsan: Add annotations for KCSAN") Reported-by: Alexander Potapenko <glider@google.com> Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241104161910.780003-4-elver@google.com
2024-11-05ACPI/IORT: Support CANWBS memory access flagNicolin Chen
The IORT spec, Issue E.f (April 2024), adds a new CANWBS bit to the Memory Access Flag field in the Memory Access Properties table, mainly for a PCI Root Complex. This CANWBS defines the coherency of memory accesses to be not marked IOWB cacheable/shareable. Its value further implies the coherency impact from a pair of mismatched memory attributes (e.g. in a nested translation case): 0x0: Use of mismatched memory attributes for accesses made by this device may lead to a loss of coherency. 0x1: Coherency of accesses made by this device to locations in Conventional memory are ensured as follows, even if the memory attributes for the accesses presented by the device or provided by the SMMU are different from Inner and Outer Write-back cacheable, Shareable. Note that the loss of coherency on a CANWBS-unsupported HW typically could occur to an SMMU that doesn't implement the S2FWB feature where additional cache flush operations would be required to prevent that from happening. Add a new ACPI_IORT_MF_CANWBS flag and set IOMMU_FWSPEC_PCI_RC_CANWBS upon the presence of this new flag. CANWBS and S2FWB are similar features, in that they both guarantee the VM can not violate coherency, however S2FWB can be bypassed by PCI No Snoop TLPs, while CANWBS cannot. Thus CANWBS meets the requirements to set IOMMU_CAP_ENFORCE_CACHE_COHERENCY. Architecturally ARM has expected that VFIO would disable No Snoop through PCI Config space, if this is done then the two would have the same protections. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-05vfio: Remove VFIO_TYPE1_NESTING_IOMMUJason Gunthorpe
This control causes the ARM SMMU drivers to choose a stage 2 implementation for the IO pagetable (vs the stage 1 usual default), however this choice has no significant visible impact to the VFIO user. Further qemu never implemented this and no other userspace user is known. The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide SMMU translation services to the guest operating system" however the rest of the API to set the guest table pointer for the stage 1 and manage invalidation was never completed, or at least never upstreamed, rendering this part useless dead code. Upstream has now settled on iommufd as the uAPI for controlling nested translation. Choosing the stage 2 implementation should be done by through the IOMMU_HWPT_ALLOC_NEST_PARENT flag during domain allocation. Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the enable_nesting iommu_domain_op. Just in-case there is some userspace using this continue to treat requesting it as a NOP, but do not advertise support any more. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-05mm/writeback: add folio_mark_dirty_lock()Joanne Koong
Add a new convenience helper folio_mark_dirty_lock() that grabs the folio lock before calling folio_mark_dirty(). Refactor set_page_dirty_lock() to directly use folio_mark_dirty_lock(). Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-11-05Merge 6.12-rc6 into driver-core-nextGreg Kroah-Hartman
We need the driver-core fix/revert in here as well to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05watchdog: da9063: Do not use a global variableFabio Estevam
Using the 'use_sw_pm' variable as global is not recommended as it prevents multi instances of the driver to run. Make it a member of the da9063 structure instead. Signed-off-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20241018135821.274376-1-festevam@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2024-11-05Merge v6.12-rc6 into usb-nextGreg Kroah-Hartman
We need the USB fixes in here as well, and this resolves a merge conflict in: drivers/usb/typec/tcpm/tcpm.c Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20241101150730.090dc30f@canb.auug.org.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05Merge 6.12-rc6 into char-misc-nextGreg Kroah-Hartman
We need the char/misc/iio fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05misc: rtsx: Cleanup on DRV_NAME cardreader variablesDesnes Nunes
The rtsx_pci_ms memstick driver has been dropped, thus there is no more need for DRV_NAME_RTSX_PCI_MS variable. Additionally, this also stand- arizes DRV_NAME variables on alcor_pci and rtsx_usb drivers. Fixes: d0f459259c13 ("memstick: rtsx_pci_ms: Remove Realtek PCI memstick driver") Signed-off-by: Desnes Nunes <desnesn@redhat.com> Link: https://lore.kernel.org/r/20241031142801.1141680-1-desnesn@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-04net: tcp: replace the document for "lsndtime" in tcp_sockMenglong Dong
Commit d5fed5addb2b ("tcp: reorganize tcp_sock fast path variables") moved the fields around and misplaced the documentation for "lsndtime". So, let's replace it in the proper place. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241104070041.64302-1-dongml2@chinatelecom.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04Merge tag 'arm-fixes-6.12-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "Where the last set of fixes was mostly drivers, this time the devicetree changes all come at once, targeting mostly the Rockchips, Qualcomm and NXP platforms. The Qualcomm bugfixes target the Snapdragon X Elite laptops, specifically problems with PCIe and NVMe support to improve reliability, and a boot regresion on msm8939. Also for Snapdragon platforms, there are a number of correctness changes in the several platform specific device drivers, but none of these are as impactful. On the NXP i.MX platform, the fixes are all for 64-bit i.MX8 variants, correcting individual entries in the devicetree that were incorrect and causing the media, video, mmc and spi drivers to misbehave in minor ways. The Arm SCMI firmware driver gets fixes for a use-after-free bug and for correctly parsing firmware information. On the RISC-V side, there are three minor devicetree fixes for starfive and sophgo, again addressing only minor mistakes. One device driver patch fixes a problem with spurious interrupt handling" * tag 'arm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (63 commits) firmware: arm_scmi: Use vendor string in max-rx-timeout-ms dt-bindings: firmware: arm,scmi: Add missing vendor string riscv: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices arm64: dts: rockchip: Correct GPIO polarity on brcm BT nodes arm64: dts: rockchip: Drop invalid clock-names from es8388 codec nodes ARM: dts: rockchip: Fix the realtek audio codec on rk3036-kylin ARM: dts: rockchip: Fix the spi controller on rk3036 ARM: dts: rockchip: drop grf reference from rk3036 hdmi ARM: dts: rockchip: fix rk3036 acodec node arm64: dts: rockchip: remove orphaned pinctrl-names from pinephone pro soc: qcom: pmic_glink: Handle GLINK intent allocation rejections rpmsg: glink: Handle rejected intent request better arm64: dts: qcom: x1e80100: fix PCIe5 interconnect arm64: dts: qcom: x1e80100: fix PCIe4 interconnect arm64: dts: qcom: x1e80100: Fix up BAR spaces MAINTAINERS: invert Misc RISC-V SoC Support's pattern soc: qcom: socinfo: fix revision check in qcom_socinfo_probe() arm64: dts: qcom: x1e80100-qcp: fix nvme regulator boot glitch arm64: dts: qcom: x1e80100-microsoft-romulus: fix nvme regulator boot glitch arm64: dts: qcom: x1e80100-yoga-slim7x: fix nvme regulator boot glitch ...
2024-11-04fs: iomap: Atomic write supportJohn Garry
Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC flag set. Initially FSes (XFS) should only support writing a single FS block atomically. As with any atomic write, we should produce a single bio which covers the complete write length. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> [djwong: clarify a couple of things in the docs] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-04PM: EM: Add min/max available performance state limitsLukasz Luba
On some devices there are HW dependencies for shared frequency and voltage between devices. It will impact Energy Aware Scheduler (EAS) decision, where CPUs share the voltage & frequency domain with other CPUs or devices e.g. - Mid CPUs + Big CPU - Little CPU + L3 cache in DSU - some other device + Little CPUs Detailed explanation of one example: When the L3 cache frequency is increased, the affected Little CPUs might run at higher voltage and frequency. That higher voltage causes higher CPU power and thus more energy is used for running the tasks. This is important for background running tasks, which try to run on energy efficient CPUs. Therefore, add performance state limits which are applied for the device (in this case CPU). This is important on SoCs with HW dependencies mentioned above so that the Energy Aware Scheduler (EAS) does not use performance states outside the valid min-max range for energy calculation. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20241030164126.1263793-2-lukasz.luba@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-04Merge tag 'mtk-soc-for-v6.13' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into arm/drivers MediaTek soc driver updates for v6.13 This adds support for the MT8188 SoC in the MediaTek Regulator Coupler driver, allowing stable GPU DVFS on this chip; Moreover, this adds a new MediaTek DVFS Resource Collector (DVFSRC) driver, allowing to enable other drivers (interconnect, regulator) which can now communicate with the DVFSRC hardware. Last but not least, this includes some cleanups for the CMDQ Helper and MediaTek SVS drivers. * tag 'mtk-soc-for-v6.13' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux: soc: mediatek: mtk-svs: Call of_node_put(np) only once in svs_get_subsys_device() soc: mediatek: mediatek-regulator-coupler: Support mt8188 soc: mediatek: mtk-cmdq: Move cmdq_instruction init to declaration soc: mediatek: mtk-cmdq: Move mask build and append to function soc: mediatek: Add MediaTek DVFS Resource Collector (DVFSRC) driver dt-bindings: soc: mediatek: Add DVFSRC bindings for MT8183 and MT8195 Link: https://lore.kernel.org/r/20241104112625.161365-2-angelogioacchino.delregno@collabora.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-04rust: add tracepoint supportAlice Ryhl
Make it possible to have Rust code call into tracepoints defined by C code. It is still required that the tracepoint is declared in a C header, and that this header is included in the input to bindgen. Instead of calling __DO_TRACE directly, the exported rust_do_trace_ function calls an inline helper function. This is because the `cond` argument does not exist at the callsite of DEFINE_RUST_DO_TRACE. __DECLARE_TRACE always emits an inline static and an extern declaration that is only used when CREATE_RUST_TRACE_POINTS is set. These should not end up in the final binary so it is not a problem that they sometimes are emitted without a user. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: " =?utf-8?q?Bj=C3=B6rn_Roy_Baron?= " <bjorn3_gh@protonmail.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anup Patel <apatel@ventanamicro.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tianrui Zhao <zhaotianrui@loongson.cn> Link: https://lore.kernel.org/20241030-tracepoint-v12-2-eec7f0f8ad22@google.com Reviewed-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-04bpf: Mark raw_tp arguments with PTR_MAYBE_NULLKumar Kartikeya Dwivedi
Arguments to a raw tracepoint are tagged as trusted, which carries the semantics that the pointer will be non-NULL. However, in certain cases, a raw tracepoint argument may end up being NULL. More context about this issue is available in [0]. Thus, there is a discrepancy between the reality, that raw_tp arguments can actually be NULL, and the verifier's knowledge, that they are never NULL, causing explicit NULL checks to be deleted, and accesses to such pointers potentially crashing the kernel. To fix this, mark raw_tp arguments as PTR_MAYBE_NULL, and then special case the dereference and pointer arithmetic to permit it, and allow passing them into helpers/kfuncs; these exceptions are made for raw_tp programs only. Ensure that we don't do this when ref_obj_id > 0, as in that case this is an acquired object and doesn't need such adjustment. The reason we do mask_raw_tp_trusted_reg logic is because other will recheck in places whether the register is a trusted_reg, and then consider our register as untrusted when detecting the presence of the PTR_MAYBE_NULL flag. To allow safe dereference, we enable PROBE_MEM marking when we see loads into trusted pointers with PTR_MAYBE_NULL. While trusted raw_tp arguments can also be passed into helpers or kfuncs where such broken assumption may cause issues, a future patch set will tackle their case separately, as PTR_TO_BTF_ID (without PTR_TRUSTED) can already be passed into helpers and causes similar problems. Thus, they are left alone for now. It is possible that these checks also permit passing non-raw_tp args that are trusted PTR_TO_BTF_ID with null marking. In such a case, allowing dereference when pointer is NULL expands allowed behavior, so won't regress existing programs, and the case of passing these into helpers is the same as above and will be dealt with later. Also update the failure case in tp_btf_nullable selftest to capture the new behavior, as the verifier will no longer cause an error when directly dereference a raw tracepoint argument marked as __nullable. [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb Reviewed-by: Jiri Olsa <jolsa@kernel.org> Reported-by: Juri Lelli <juri.lelli@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241104171959.2938862-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-04bpf: Move btf_type_is_struct_ptr() under CONFIG_BPF_SYSCALLAlistair Francis
The static inline btf_type_is_struct_ptr() function calls btf_type_skip_modifiers() which is guarded by CONFIG_BPF_SYSCALL. btf_type_is_struct_ptr() is also only called by CONFIG_BPF_SYSCALL ifdef code, so let's only expose btf_type_is_struct_ptr() if CONFIG_BPF_SYSCALL is defined. Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Link: https://lore.kernel.org/r/20241104060300.421403-1-alistair.francis@wdc.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-04block: pre-calculate max_zone_append_sectorsChristoph Hellwig
max_zone_append_sectors differs from all other queue limits in that the final value used is not stored in the queue_limits but needs to be obtained using queue_limits_max_zone_append_sectors helper. This not only adds (tiny) extra overhead to the I/O path, but also can be easily forgotten in file system code. Add a new max_hw_zone_append_sectors value to queue_limits which is set by the driver, and calculate max_zone_append_sectors from that and the other inputs in blk_validate_zoned_limits, similar to how max_sectors is calculated to fix this. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241104073955.112324-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-04nfs_common: fix localio to cope with racing nfs_local_probe()Mike Snitzer
Fix the possibility of racing nfs_local_probe() resulting in: list_add double add: new=ffff8b99707f9f58, prev=ffff8b99707f9f58, next=ffffffffc0f30000. ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:35! Add nfs_uuid_init() to properly initialize all nfs_uuid_t members (particularly its list_head). Switch to returning bool from nfs_uuid_begin(), returns false if nfs_uuid_t is already in-use (its list_head is on a list). Update nfs_local_probe() to return early if the nfs_client's cl_uuid (nfs_uuid_t) is in-use. Also, switch nfs_uuid_begin() from using list_add_tail_rcu() to list_add_tail() -- rculist was used in an earlier version of the localio code that had a lockless nfs_uuid_lookup interface. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-11-04Merge tag 'zynqmp-soc-for-6.13' of https://github.com/Xilinx/linux-xlnx into ↵Arnd Bergmann
arm/drivers arm64: ZynqMP SoC changes for 6.13 event_manager: - cleanup error path firmware: - add support for new SMC layout - fix feature check logic - extend debug interface - update reset ID format - report about unsupported feature in pinctrl * tag 'zynqmp-soc-for-6.13' of https://github.com/Xilinx/linux-xlnx: firmware: xilinx: fix feature check logic for TF-A specific APIs firmware: xilinx: add support for new SMC call format firmware: xilinx: add a warning print for unsupported feature firmware: xilinx: use u32 for reset ID in reset APIs firmware: xilinx: Add missing debug firmware interfaces drivers: soc: xilinx: add the missing kfree in xlnx_add_cb_for_suspend() Link: https://lore.kernel.org/r/CAHTX3dK9PKmG_UG4MW=x5KmZCrd5PkcAZiNVgPFQ_zsPRgu+dg@mail.gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-04Merge tag 'qcom-drivers-fixes-for-6.12' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm driver fixes for v6.12 The Qualcomm EDAC driver's configuration of interrupts is made optional, to avoid violating security constriants on X Elite platform . The SCM drivers' detection mechanism for the presence of SHM bridge in QTEE, is corrected to handle the case where firmware successfully returns that the interface isn't supported. The GLINK driver and the PMIC GLINK interface is updated to handle buffer allocation issues during initialization of the communication channel. Allocation error handling in the socinfo dirver is corrected, and then the fix is corrected. * tag 'qcom-drivers-fixes-for-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: soc: qcom: pmic_glink: Handle GLINK intent allocation rejections rpmsg: glink: Handle rejected intent request better soc: qcom: socinfo: fix revision check in qcom_socinfo_probe() firmware: qcom: scm: Return -EOPNOTSUPP for unsupported SHM bridge enabling EDAC/qcom: Make irq configuration optional firmware: qcom: scm: fix a NULL-pointer dereference firmware: qcom: scm: suppress download mode error soc: qcom: Add check devm_kasprintf() returned value MAINTAINERS: Qualcomm SoC: Match reserved-memory bindings Link: https://lore.kernel.org/r/20241101161455.746290-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-04net: enetc: add initial netc-blk-ctrl driver supportWei Fang
The netc-blk-ctrl driver is used to configure Integrated Endpoint Register Block (IERB) and Privileged Register Block (PRB) of NETC. For i.MX platforms, it is also used to configure the NETCMIX block. The IERB contains registers that are used for pre-boot initialization, debug, and non-customer configuration. The PRB controls global reset and global error handling for NETC. The NETCMIX block is mainly used to set MII protocol and PCS protocol of the links, it also contains settings for some other functions. Note the IERB configuration registers can only be written after being unlocked by PRB, otherwise, all write operations are inhibited. A warm reset is performed when the IERB is unlocked, and it results in an FLR to all NETC devices. Therefore, all NETC device drivers must be probed or initialized after the warm reset is finished. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-04net/mlx5: Introduce data placement ordering bitsEdward Srouji
Introduce out-of-order (OOO) data placement (DP) IFC related bits to support OOO DP QP. Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/f30e5cbb5459fd02f27f35909bb545cab346b58b.1725362773.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04Merge tag 'v6.12-rc6' of ↵Bartosz Golaszewski
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into gpio/for-next Linux 6.12-rc6
2024-11-04Backmerge v6.12-rc6 of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next Backmerge Linus tree for some drm-fixes needed for msm and xe merges. Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-11-03soc: qcom: llcc: add support for SAR2130P and SAR1130PDmitry Baryshkov
Implement necessary support for the LLCC control on the SAR1130P and SAR2130P platforms. These two platforms use different ATTR1_MAX_CAP shift and also require manual override for num_banks. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20241026-sar2130p-llcc-v3-3-2a58fa1b4d12@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-11-03dim: pass dim_sample to net_dim() by referenceCaleb Sander Mateos
net_dim() is currently passed a struct dim_sample argument by value. struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64 passes it on the stack. All callers have already initialized dim_sample on the stack, so passing it by value requires pushing a duplicated copy to the stack. Either witing to the stack and immediately reading it, or perhaps dereferencing addresses relative to the stack pointer in a chain of push instructions, seems to perform quite poorly. In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time, 94% of which is attributed to the first push instruction to copy dim_sample on the stack for the call to net_dim(): // Call ktime_get() 0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47> // Pass the address of struct dim in %rdi |4ead7: lea 0x3d0(%rbx),%rdi // Set dim_sample.pkt_ctr |4eade: mov %r13d,0x8(%rsp) // Set dim_sample.byte_ctr |4eae3: mov %r12d,0xc(%rsp) // Set dim_sample.event_ctr 0.15 |4eae8: mov %bp,0x10(%rsp) // Duplicate dim_sample on the stack 94.16 |4eaed: push 0x10(%rsp) 2.79 |4eaf1: push 0x10(%rsp) 0.07 |4eaf5: push %rax // Call net_dim() 0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b> To allow the caller to reuse the struct dim_sample already on the stack, pass the struct dim_sample by reference to net_dim(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03dim: make dim_calc_stats() inputs const pointersCaleb Sander Mateos
Make the start and end arguments to dim_calc_stats() const pointers to clarify that the function does not modify their values. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> Link: https://patch.msgid.link/20241031002326.3426181-1-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03iio: events: make IIO_EVENT_CODE macro privateDavid Lechner
Make IIO_EVENT_CODE "private" by adding a leading underscore. There are no more users of this macro in the kernel so we can make it "private" and encourage developers to use the specialized versions of the macro instead. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20241101-iio-fix-event-macro-use-v1-3-0000c5d09f6d@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03iio: events.h: add event identifier macros for differential channelJulien Stephan
Currently, there are 3 helper macros in iio/events.h to create event identifiers: - IIO_EVENT_CODE : create generic event identifier for differential and non differential channels - IIO_MOD_EVENT_CODE : create event identifier for modified (non differential) channels - IIO_UNMOD_EVENT_CODE : create event identifier for unmodified (non differential) channels For differential channels, drivers are expected to use IIO_EVENT_CODE. However, only one driver in drivers/iio currently uses it correctly, leading to inconsistent event identifiers for differential channels that don’t match the intended attributes (such as max1363.c that supports differential channels, but only uses IIO_UNMOD_EVENT_CODE). To prevent such issues in future drivers, a new helper macro, IIO_DIFF_EVENT_CODE, is introduced to specifically create event identifiers for differential channels. Only one helper is needed for differential channels since they cannot have modifiers. Additionally, the descriptions for IIO_MOD_EVENT_CODE and IIO_UNMOD_EVENT_CODE have been updated to clarify that they are intended for non-differential channels, Signed-off-by: Julien Stephan <jstephan@baylibre.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20241028-iio-add-macro-for-even-identifier-for-differential-channels-v1-1-b452c90f7ea6@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03iio: fix write_event_config signatureJulien Stephan
write_event_config callback use an int for state, but it is actually a boolean. iio_ev_state_store is actually using kstrtobool to check user input, then gives the converted boolean value to write_event_config. Fix signature and update all iio drivers to use the new signature. This patch has been partially written using coccinelle with the following script: $ cat iio-bool.cocci // Options: --all-includes virtual patch @c1@ identifier iioinfo; identifier wecfunc; @@ static const struct iio_info iioinfo = { ..., .write_event_config = ( wecfunc | &wecfunc ), ..., }; @@ identifier c1.wecfunc; identifier indio_dev, chan, type, dir, state; @@ int wecfunc(struct iio_dev *indio_dev, const struct iio_chan_spec *chan, enum iio_event_type type, enum iio_event_direction dir, -int +bool state) { ... } make coccicheck MODE=patch COCCI=iio-bool.cocci M=drivers/iio Unfortunately, this script didn't match all files: * all write_event_config callbacks using iio_device_claim_direct_scoped were not detected and not patched. * all files that do not assign and declare the write_event_config callback in the same file. iio.h was also manually updated. The patch was build tested using allmodconfig config. cc: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: Julien Stephan <jstephan@baylibre.com> Link: https://patch.msgid.link/20241031-iio-fix-write-event-config-signature-v2-7-2bcacbb517a2@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03iio: hid-sensors: Add proximity and attention IDsRicardo Ribalda
The HID Usage Table at https://usb.org/sites/default/files/hut1_5.pdf reserves: - 0x4b2 for Human Proximity Range Distance between a human and the computer. Default unit of measure is meters; https://www.usb.org/sites/default/files/hutrr39b_0.pdf - 0x4bd for Human Attention Detected Human-Presence sensors detect the presence of humans in the sensor’s field-of-view using diverse and evolving technologies. Some presence sensors are implemented with low resolution video cameras, which can additionally track a subject’s attention (i.e. if the user is ‘looking’ at the system with the integrated sensor). A Human-Presence sensor, providing a Host with the user’s attention state, allows the Host to optimize its behavior. For example, to brighten/dim the system display, based on the user’s attention to the system (potentially prolonging battery life). Default unit is true/false; https://www.usb.org/sites/default/files/hutrr107-humanpresenceattention_1.pdf Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Link: https://patch.msgid.link/20241101-hpd-v3-1-e9c80b7c7164@chromium.org Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03iio: Mark iio_dev::priv member with __privateAndy Shevchenko
The member is not supposed to be accessed directly, mark it with __private to catch the misuses up. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20241101105342.3645018-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03Merge tag 'mm-hotfixes-stable-2024-11-03-10-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. 9 are cc:stable. 13 are MM and 4 are non-MM. The usual collection of singletons - please see the changelogs" * tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes mm: shrinker: avoid memleak in alloc_shrinker_info .mailmap: update e-mail address for Eugen Hristev vmscan,migrate: fix page count imbalance on node stats when demoting pages mailmap: update Jarkko's email addresses mm: allow set/clear page_type again nilfs2: fix potential deadlock with newly created symlinks Squashfs: fix variable overflow in squashfs_readpage_block kasan: remove vmalloc_percpu test tools/mm: -Werror fixes in page-types/slabinfo mm, swap: avoid over reclaim of full clusters mm: fix PSWPIN counter for large folios swap-in mm: avoid VM_BUG_ON when try to map an anon large folio to zero page. mm/codetag: fix null pointer check logic for ref and tag mm/gup: stop leaking pinned pages in low memory conditions
2024-11-03pwm: core: export pwm_get_state_hw()David Lechner
Export the pwm_get_state_hw() function. This is useful in cases where we want to know what the hardware is actually doing, rather than what what we requested it should do. Locking had to be rearranged to ensure that the chip is still operational before trying to access ops now that this can be called from outside the pwm core. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20241029-pwm-export-pwm_get_state_hw-v2-1-03ba063a3230@baylibre.com [ukleinek: Add dummy for !CONFIG_PWM] Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2024-11-03net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warningsGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Change the type of the middle struct member currently causing trouble from `struct ethtool_link_settings` to `struct ethtool_link_settings_hdr`. Additionally, update the type of some variables in various functions that don't access the flexible-array member, changing them to the newly created `struct ethtool_link_settings_hdr`. These changes are needed because the type of the conflicting middle members changed. So, those instances that expect the type to be `struct ethtool_link_settings` should be adjusted to the newly created type `struct ethtool_link_settings_hdr`. Also, adjust variable declarations to follow the reverse xmas tree convention. Fix 3338 of the following -Wflex-array-member-not-at-end warnings: include/linux/ethtool.h:214:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()Yu Zhao
When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton@google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: James Houghton <jthoughton@google.com> Reported-by: David Stevens <stevensd@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-03mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL statsYu Zhao
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google.com/ This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: James Houghton <jthoughton@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: David Stevens <stevensd@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-03Merge tag 'input-for-v6.12-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - a fix for regression in input core introduced in 6.11 preventing re-registering input handlers - a fix for adp5588-keys driver tyring to disable interrupt 0 at suspend when devices is used without interrupt - a fix for edt-ft5x06 to stop leaking regmap structure when probing fails and to make sure it is not released too early on removal. * tag 'input-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: fix regression when re-registering input handlers Input: adp5588-keys - do not try to disable interrupt 0 Input: edt-ft5x06 - fix regmap leak when probe fails
2024-11-03Merge tag 'timers-urgent-2024-11-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix for posix CPU timers. When a thread is cloned, the posix CPU timers are not inherited. If the parent has a CPU timer armed the corresponding tick dependency in the tasks tick_dep_mask is set and copied to the new thread, which means the new thread and all decendants will prevent the system to go into full NOHZ operation. Clear the tick dependency mask in copy_process() to fix this" * tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone