linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-06-28	irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs	James Morse
	To support virtual CPU hotplug, ACPI has added an 'online capable' bit to the MADT GICC entries. This indicates a disabled CPU entry may not be possible to online via PSCI until firmware has set enabled bit in _STA. This means that a "usable" GIC redistributor is one that is marked as either enabled, or online capable. The meaning of the acpi_gicc_is_usable() would become less clear than just checking the pair of flags at call sites. As such, drop that helper function. The test in gic_acpi_match_gicc() remains as testing just the enabled bit so the count of enabled distributors is correct. What about the redistributor in the GICC entry? ACPI doesn't want to say. Assume the worst: When a redistributor is described in the GICC entry, but the entry is marked as disabled at boot, assume the redistributor is inaccessible. The GICv3 driver doesn't support late online of redistributors, so this means the corresponding CPU can't be brought online either. Rather than modifying cpu masks that may already have been used, register a new cpuhp callback to fail this case. This must run earlier than the main gic_starting_cpu() so that this case can be rejected before the section of cpuhp that runs on the CPU that is coming up as that is not allowed to fail. This solution keeps the handling of this broken firmware corner case local to the GIC driver. As precise ordering of this callback doesn't need to be controlled as long as it is in that initial prepare phase, use CPUHP_BP_PREPARE_DYN. Systems that want CPU hotplug in a VM can ensure their redistributors are always-on, and describe them that way with a GICR entry in the MADT. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Miguel Luis <miguel.luis@oracle.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240529133446.28446-15-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()	James Morse
	gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions(). It should only count the number of enabled redistributors, but it also tries to sanity check the GICC entry, currently returning an error if the Enabled bit is set, but the gicr_base_address is zero. Adding support for the online-capable bit to the sanity check will complicate it, for no benefit. The existing check implicitly depends on gic_acpi_count_gicr_regions() previous failing to find any GICR regions (as it is valid to have gicr_base_address of zero if the redistributors are described via a GICR entry). Instead of complicating the check, remove it. Failures that happen at this point cause the irqchip not to register, meaning no irqs can be requested. The kernel grinds to a panic() pretty quickly. Without the check, MADT tables that exhibit this problem are still caught by gic_populate_rdist(), which helpfully also prints what went wrong: \| CPU4: mpidr 100 has no re-distributor! Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240529133446.28446-14-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	arm64: acpi: Harden get_cpu_for_acpi_id() against missing CPU entry	Jonathan Cameron
	In a review discussion of the changes to support vCPU hotplug where a check was added on the GICC being enabled if was online, it was noted that there is need to map back to the cpu and use that to index into a cpumask. As such, a valid ID is needed. If an MPIDR check fails in acpi_map_gic_cpu_interface() it is possible for the entry in cpu_madt_gicc[cpu] == NULL. This function would then cause a NULL pointer dereference. Whilst a path to trigger this has not been established, harden this caller against the possibility. Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-13-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	arm64: acpi: Move get_cpu_for_acpi_id() to a header	James Morse
	ACPI identifies CPUs by UID. get_cpu_for_acpi_id() maps the ACPI UID to the Linux CPU number. The helper to retrieve this mapping is only available in arm64's NUMA code. Move it to live next to get_acpi_id_for_cpu(). Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Miguel Luis <miguel.luis@oracle.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Jianyong Wu <jianyong.wu@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Link: https://lore.kernel.org/r/20240529133446.28446-12-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug	James Morse
	struct acpi_scan_handler has a detach callback that is used to remove a driver when a bus is changed. When interacting with an eject-request, the detach callback is called before _EJ0. This means the ACPI processor driver can't use _STA to determine if a CPU has been made not-present, or some of the other _STA bits have been changed. acpi_processor_remove() needs to know the value of _STA after _EJ0 has been called. Add a post_eject callback to struct acpi_scan_handler. This is called after acpi_scan_hot_remove() has successfully called _EJ0. Because acpi_scan_check_and_detach() also clears the handler pointer, it needs to be told if the caller will go on to call acpi_bus_post_eject(), so that acpi_device_clear_enumerated() and clearing the handler pointer can be deferred. An extra flag is added to flags field introduced in the previous patch to achieve this. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Miguel Luis <miguel.luis@oracle.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Jianyong Wu <jianyong.wu@arm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-11-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	ACPI: scan: switch to flags for acpi_scan_check_and_detach()	Jonathan Cameron
	Precursor patch adds the ability to pass a uintptr_t of flags into acpi_scan_check_and detach() so that additional flags can be added to indicate whether to defer portions of the eject flow. The new flag follows in the next patch. Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-10-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	ACPI: processor: Register deferred CPUs from acpi_processor_get_info()	James Morse
	The arm64 specific arch_register_cpu() call may defer CPU registration until the ACPI interpreter is available and the _STA method can be evaluated. If this occurs, then a second attempt is made in acpi_processor_get_info(). Note that the arm64 specific call has not yet been added so for now this will be called for the original hotplug case. For architectures that do not defer until the ACPI Processor driver loads (e.g. x86), for initially present CPUs there will already be a CPU device. If present do not try to register again. Systems can still be booted with 'acpi=off', or not include an ACPI description at all as in these cases arch_register_cpu() will not have deferred registration when first called. This moves the CPU register logic back to a subsys_initcall(), while the memory nodes will have been registered earlier. Note this is where the call was prior to the cleanup series so there should be no side effects of moving it back again for this specific case. [PATCH 00/21] Initial cleanups for vCPU HP. https://lore.kernel.org/all/ZVyz%2FVe5pPu8AWoA@shell.armlinux.org.uk/ commit 5b95f94c3b9f ("x86/topology: Switch over to GENERIC_CPU_DEVICES") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Miguel Luis <miguel.luis@oracle.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Jianyong Wu <jianyong.wu@arm.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-9-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	ACPI: processor: Add acpi_get_processor_handle() helper	Jonathan Cameron
	If CONFIG_ACPI_PROCESSOR provide a helper to retrieve the acpi_handle for a given CPU allowing access to methods in DSDT. Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-8-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	ACPI: processor: Move checks and availability of acpi_processor earlier	Jonathan Cameron
	Make the per_cpu(processors, cpu) entries available earlier so that they are available in arch_register_cpu() as ARM64 will need access to the acpi_handle to distinguish between acpi_processor_add() and earlier registration attempts (which will fail as _STA cannot be checked). Reorder the remove flow to clear this per_cpu() after arch_unregister_cpu() has completed, allowing it to be used in there as well. Note that on x86 for the CPU hotplug case, the pr->id prior to acpi_map_cpu() may be invalid. Thus the per_cpu() structures must be initialized after that call or after checking the ID is valid (not hotplug path). Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-7-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	ACPI: processor: Fix memory leaks in error paths of processor_add()	Jonathan Cameron
	If acpi_processor_get_info() returned an error, pr and the associated pr->throttling.shared_cpu_map were leaked. The unwind code was in the wrong order wrt to setup, relying on some unwind actions having no affect (clearing variables that were never set etc). That makes it harder to reason about so reorder and add appropriate labels to only undo what was actually set up in the first place. Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-6-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	ACPI: processor: Return an error if acpi_processor_get_info() fails in ↵	Jonathan Cameron
	processor_add() Rafael observed [1] that returning 0 from processor_add() will result in acpi_default_enumeration() being called which will attempt to create a platform device, but that makes little sense when the processor is known to be not available. So just return the error code from acpi_processor_get_info() instead. Link: https://lore.kernel.org/all/CAJZ5v0iKU8ra9jR+EmgxbuNm=Uwx2m1-8vn_RAZ+aCiUVLe3Pw@mail.gmail.com/ [1] Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-5-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	ACPI: processor: Drop duplicated check on _STA (enabled + present)	Jonathan Cameron
	The ACPI bus scan will only result in acpi_processor_add() being called if _STA has already been checked and the result is that the processor is enabled and present. Hence drop this additional check. Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-4-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	cpu: Do not warn on arch_register_cpu() returning -EPROBE_DEFER	Jonathan Cameron
	For arm64 the CPU registration cannot complete until the ACPI interpreter us up and running so in those cases the arch specific arch_register_cpu() will return -EPROBE_DEFER at this stage and the registration will be attempted later. Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-3-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	ACPI: processor: Simplify initial onlining to use same path for cold and hotplug	Jonathan Cameron
	Separate code paths, combined with a flag set in acpi_processor.c to indicate a struct acpi_processor was for a hotplugged CPU ensured that per CPU data was only set up the first time that a CPU was initialized. This appears to be unnecessary as the paths can be combined by letting the online logic also handle any CPUs online at the time of driver load. Motivation for this change, beyond simplification, is that ARM64 virtual CPU HP uses the same code paths for hotplug and cold path in acpi_processor.c so had no easy way to set the flag for hotplug only. Removing this necessity will enable ARM64 vCPU HP to reuse the existing code paths. Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-2-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28	bcachefs: Delete old faulty bch2_trans_unlock() call	Kent Overstreet
	the unlock is now in read_extent, this fixes an assertion pop in read_from_stale_dirty_pointer() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-28	hwmon: add MP2891 driver	Noah Wang
	Add support for MPS VR controller mp2891. This driver exposes telemetry and limit value readings and writtings. Signed-off-by: Noah Wang <noahwang.wang@outlook.com> Link: https://lore.kernel.org/r/SEYPR04MB64828A352836982C0184AA10FAD62@SEYPR04MB6482.apcprd04.prod.outlook.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-06-28	dt-bindings: hwmon: Add MPS mp2891	Noah Wang
	Add support for MPS mp2891 controller Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Noah Wang <noahwang.wang@outlook.com> Link: https://lore.kernel.org/r/SEYPR04MB6482BC95D1242A5675FF9DAEFAD62@SEYPR04MB6482.apcprd04.prod.outlook.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-06-28	cgroup/cpuset: Prevent UAF in proc_cpuset_show()	Chen Ridong
	An UAF can happen when /proc/cpuset is read as reported in [1]. This can be reproduced by the following methods: 1.add an mdelay(1000) before acquiring the cgroup_lock In the cgroup_path_ns function. 2.$cat /proc/<pid>/cpuset repeatly. 3.$mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset/ $umount /sys/fs/cgroup/cpuset/ repeatly. The race that cause this bug can be shown as below: (umount) \| (cat /proc/<pid>/cpuset) css_release \| proc_cpuset_show css_release_work_fn \| css = task_get_css(tsk, cpuset_cgrp_id); css_free_rwork_fn \| cgroup_path_ns(css->cgroup, ...); cgroup_destroy_root \| mutex_lock(&cgroup_mutex); rebind_subsystems \| cgroup_free_root \| \| // cgrp was freed, UAF \| cgroup_path_ns_locked(cgrp,..); When the cpuset is initialized, the root node top_cpuset.css.cgrp will point to &cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated &cgroup_root.cgrp. When the umount operation is executed, top_cpuset.css.cgrp will be rebound to &cgrp_dfl_root.cgrp. The problem is that when rebinding to cgrp_dfl_root, there are cases where the cgroup_root allocated by setting up the root for cgroup v1 is cached. This could lead to a Use-After-Free (UAF) if it is subsequently freed. The descendant cgroups of cgroup v1 can only be freed after the css is released. However, the css of the root will never be released, yet the cgroup_root should be freed when it is unmounted. This means that obtaining a reference to the css of the root does not guarantee that css.cgrp->root will not be freed. Fix this problem by using rcu_read_lock in proc_cpuset_show(). As cgroup_root is kfree_rcu after commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe"), css->cgroup won't be freed during the critical section. To call cgroup_path_ns_locked, css_set_lock is needed, so it is safe to replace task_get_css with task_css. [1] https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-28	Merge tag 'mtk-soc-for-v6.11' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into soc/drivers MediaTek driver updates for v6.11 This adds the previously missed Tone Curve Conversion (TCC) MuteX bit to enable the same in MDP3 on the the MT8188 SoC, disables 9-bit Alpha for display HDR support in MT8195 and adds math operation support in the Global Command Engine for all MediaTek SoCs, which will be used in the near future in the ISP driver. * tag 'mtk-soc-for-v6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux: soc: mtk-cmdq: Add cmdq_pkt_logic_command to support math operation soc: mediatek: Disable 9-bit alpha in ETHDR soc: mediatek: mtk-mutex: Add MDP_TCC0 mod to MT8188 mutex table Link: https://lore.kernel.org/r/20240628093801.126013-3-angelogioacchino.delregno@collabora.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-06-28	rnbd-cnt: don't set QUEUE_FLAG_SAME_FORCE	Christoph Hellwig
	QUEUE_FLAG_SAME_FORCE has been set by rnbd-cnt since the initial merge. There is no good reason for a driver to force exact core delivery, which is tunable for very specific workloads and not a driver setting. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240627124926.512662-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	rnbd: don't set QUEUE_FLAG_SAME_COMP	Christoph Hellwig
	QUEUE_FLAG_SAME_COMP is already set by default. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240627124926.512662-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	mpt3sas_scsih: don't set QUEUE_FLAG_NOMERGES	Christoph Hellwig
	Setting QUEUE_FLAG_NOMERGES was added in commit d1b01d14b7ba ("scsi: mpt3sas: Set NVMe device queue depth as 128") without any explanation. Drivers should second guess the block layer merge decisions, so remove the flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240627124926.512662-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	megaraid_sas: don't set QUEUE_FLAG_NOMERGES	Christoph Hellwig
	Setting QUEUE_FLAG_NOMERGES was added in commit 15dd03811d99dcf ("scsi: megaraid_sas: NVME Interface detection and prop settings") without any explanation. Drivers should second guess the block layer merge decisions, so remove the flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240627124926.512662-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	loop: don't set QUEUE_FLAG_NOMERGES	Christoph Hellwig
	QUEUE_FLAG_NOMERGES isn't really a driver interface, but a user tunable. There also isn't any good reason to set it in the loop driver. The original commit adding it (5b5e20f421c0b6d "block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop") claims that "It doesn't make sense to enable merge because the I/O submitted to backing file is handled page by page." which of course isn't true for multi-page bvec now, and it never has been for direct I/O, for which commit 40326d8a33d ("block/loop: allow request merge for directio mode") alredy disabled the nomerges flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240627124926.512662-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	Merge tag 'nfsd-6.10-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Due to a late review, revert and re-fix a recent crasher fix * tag 'nfsd-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "nfsd: fix oops when reading pool_stats before server is started" nfsd: initialise nfsd_info.mutex early.
2024-06-28	block: check bio alignment in blk_mq_submit_bio	Ming Lei
	IO logical block size is one fundamental queue limit, and every IO has to be aligned with logical block size because our bio split can't deal with unaligned bio. The check has to be done with queue usage counter grabbed because device reconfiguration may change logical block size, and we can prevent the reconfiguration from happening by holding queue usage counter. logical_block_size stays in the 1st cache line of queue_limits, and this cache line is always fetched in fast path via bio_may_exceed_limits(), so IO perf won't be affected by this check. Cc: Yi Zhang <yi.zhang@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ye Bin <yebin10@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240620030631.3114026-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	block: Add ioprio to block_rq tracepoint	Dongliang Cui
	Sometimes we need to track the processing order of requests with ioprio set. So the ioprio of request can be useful information. Example： block_rq_insert: 8,0 RA 16384 () 6500840 + 32 be,0,6 [binder:815_3] block_rq_issue: 8,0 RA 16384 () 6500840 + 32 be,0,6 [binder:815_3] block_rq_complete: 8,0 RA () 6500840 + 32 be,0,6 [0] Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20240614074936.113659-1-dongliang.cui@unisoc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	block: remove bio_integrity_process	Christoph Hellwig
	Move the bvec interation into the generate/verify helpers to avoid a bit of argument passing churn. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240626045950.189758-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	block: switch on bio operation in bio_integrity_prep	Christoph Hellwig
	Use a single switch to perform read and write specific checks and exit early for other operations instead of having two checks using different predicates. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240626045950.189758-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	block: remove allocation failure warnings in bio_integrity_prep	Christoph Hellwig
	Allocation failures already leave a stack trace, so don't spew another warning. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240626045950.189758-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	block: simplify adding the payload in bio_integrity_prep	Christoph Hellwig
	bio_integrity_add_page can add physically contiguous regions of any size, so don't bother chunking up the kmalloced buffer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240626045950.189758-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	block: only zero non-PI metadata tuples in bio_integrity_prep	Christoph Hellwig
	The PI generation helpers already zero the app tag, so relax the zeroing to non-PI metadata. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240626045950.189758-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	Merge tag 'bcachefs-2024-06-28' of https://evilpiepirate.org/git/bcachefs	Linus Torvalds
	Pull bcachefs fixes from Kent Overstreet: "Simple stuff: - NULL ptr/err ptr deref fixes - fix for getting wedged on shutdown after journal error - fix missing recalc_capacity() call, capacity now changes correctly after a device goes read only however: our capacity calculation still doesn't take into account when we have mixed ro/rw devices and the ro devices have data on them, that's going to be a more involved fix to separate accounting for "capacity used on ro devices" and "capacity used on rw devices" - boring syzbot stuff Slightly more involved: - discard, invalidate workers are now per device this has the effect of simplifying how we take device refs in these paths, and the device ref cleanup fixes a longstanding race between the device removal path and the discard path - fixes for how the debugfs code takes refs on btree_trans objects we have debugfs code that prints in use btree_trans objects. It uses closure_get() on trans->ref, which is mainly for the cycle detector, but the debugfs code was using it on a closure that may have hit 0, which is not allowed; for performance reasons we cannot avoid having not-in-use transactions on the global list. Introduce some new primitives to fix this and make the synchronization here a whole lot saner" * tag 'bcachefs-2024-06-28' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix kmalloc bug in __snapshot_t_mut bcachefs: Discard, invalidate workers are now per device bcachefs: Fix shift-out-of-bounds in bch2_blacklist_entries_gc bcachefs: slab-use-after-free Read in bch2_sb_errors_from_cpu bcachefs: Add missing bch2_journal_do_writes() call bcachefs: Fix null ptr deref in journal_pins_to_text() bcachefs: Add missing recalc_capacity() call bcachefs: Fix btree_trans list ordering bcachefs: Fix race between trans_put() and btree_transactions_read() closures: closure_get_not_zero(), closure_return_sync() bcachefs: Make btree_deadlock_to_text() clearer bcachefs: fix seqmutex_relock() bcachefs: Fix freeing of error pointers
2024-06-28	bcache: work around a __bitwise to bool conversion sparse warning	Christoph Hellwig
	Sparse is a bit dumb about bitwise operation on __bitwise types used in boolean contexts. Add a !! to explicitly propagate to boolean without a warning. Fixes: fcf865e357f8 ("block: convert features and flags to __bitwise types") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240628131657.667797-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28	Merge tag 'block-6.10-20240628' of git://git.kernel.dk/linux	Linus Torvalds
	Pull block fixes from Jens Axboe: "NVMe fixes via Keith: - Fabrics fixes (Hannes) - Missing module description (Jeff) - Clang warning fix (Nathan)" * tag 'block-6.10-20240628' of git://git.kernel.dk/linux: nvmet-fc: Remove __counted_by from nvmet_fc_tgt_queue.fod[] nvmet: make 'tsas' attribute idempotent for RDMA nvme: fixup comment for nvme RDMA Provider Type nvme-apple: add missing MODULE_DESCRIPTION() nvmet: do not return 'reserved' for empty TSAS values nvme: fix NVME_NS_DEAC may incorrectly identifying the disk as EXT_LBA.
2024-06-28	Merge tag 'iommu-fixes-v6.10-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - Two cache flushing fixes for Intel and AMD drivers - AMD guest translation enabling fix - Update IOMMU tree location in MAINTAINERS file * tag 'iommu-fixes-v6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: MAINTAINERS: Update IOMMU tree location iommu/amd: Fix GT feature enablement again iommu/vt-d: Fix missed device TLB cache tag iommu/amd: Invalidate cache before removing device from domain list
2024-06-28	Merge tag 'gpio-fixes-for-v6.10-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: "An assortment of driver fixes and two commits addressing a bad behavior of the GPIO uAPI when reconfiguring requested lines. - fix a race condition in i2c transfers by adding a missing i2c lock section in gpio-pca953x - validate the number of obtained interrupts in gpio-davinci - add missing raw_spinlock_init() in gpio-graniterapids - fix bad character device behavior: disallow GPIO line reconfiguration without set direction both in v1 and v2 uAPI" * tag 'gpio-fixes-for-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: cdev: Ignore reconfiguration without direction gpiolib: cdev: Disallow reconfiguration without direction (uAPI v1) gpio: graniterapids: Add missing raw_spinlock_init() gpio: davinci: Validate the obtained number of IRQs gpio: pca953x: fix pca953x_irq_bus_sync_unlock race
2024-06-28	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "A pair of small arm64 fixes for -rc6. One is a fix for the recently merged uffd-wp support (which was triggering a spurious warning) and the other is a fix to the clearing of the initial idmap pgd in some configurations Summary: - Fix spurious page-table warning when clearing PTE_UFFD_WP in a live pte - Fix clearing of the idmap pgd when using large addressing modes" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Clear the initial ID map correctly before remapping arm64: mm: Permit PTE SW bits to change in live mappings
2024-06-28	Merge tag 'v6.10-rc-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat fixes from Len Brown: "Fix three recent minor turbostat regressions" * tag 'v6.10-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: Add local build_bug.h header for snapshot target tools/power turbostat: Fix unc freq columns not showing with '-q' or '-l' tools/power turbostat: option '-n' is ambiguous
2024-06-28	tty: mxser: Remove __counted_by from mxser_board.ports[]	Nathan Chancellor
	Work for __counted_by on generic pointers in structures (not just flexible array members) has started landing in Clang 19 (current tip of tree). During the development of this feature, a restriction was added to __counted_by to prevent the flexible array member's element type from including a flexible array member itself such as: struct foo { int count; char buf[]; }; struct bar { int count; struct foo data[] __counted_by(count); }; because the size of data cannot be calculated with the standard array size formula: sizeof(struct foo) * count This restriction was downgraded to a warning but due to CONFIG_WERROR, it can still break the build. The application of __counted_by on the ports member of 'struct mxser_board' triggers this restriction, resulting in: drivers/tty/mxser.c:291:2: error: 'counted_by' should not be applied to an array with element of unknown size because 'struct mxser_port' is a struct type with a flexible array member. This will be an error in a future compiler version [-Werror,-Wbounds-safety-counted-by-elt-type-unknown-size] 291 \| struct mxser_port ports[] __counted_by(nports); \| ^~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Remove this use of __counted_by to fix the warning/error. However, rather than remove it altogether, leave it commented, as it may be possible to support this in future compiler releases. Cc: <stable@vger.kernel.org> Closes: https://github.com/ClangBuiltLinux/linux/issues/2026 Fixes: f34907ecca71 ("mxser: Annotate struct mxser_board with __counted_by") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240529-drop-counted-by-ports-mxser-board-v1-1-0ab217f4da6d@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-06-28	randomize_kstack: Remove non-functional per-arch entropy filtering	Kees Cook
	An unintended consequence of commit 9c573cd31343 ("randomize_kstack: Improve entropy diffusion") was that the per-architecture entropy size filtering reduced how many bits were being added to the mix, rather than how many bits were being used during the offsetting. All architectures fell back to the existing default of 0x3FF (10 bits), which will consume at most 1KiB of stack space. It seems that this is working just fine, so let's avoid the confusion and update everything to use the default. The prior intent of the per-architecture limits were: arm64: capped at 0x1FF (9 bits), 5 bits effective powerpc: uncapped (10 bits), 6 or 7 bits effective riscv: uncapped (10 bits), 6 bits effective x86: capped at 0xFF (8 bits), 5 (x86_64) or 6 (ia32) bits effective s390: capped at 0xFF (8 bits), undocumented effective entropy Current discussion has led to just dropping the original per-architecture filters. The additional entropy appears to be safe for arm64, x86, and s390. Quoting Arnd, "There is no point pretending that 15.75KB is somehow safe to use while 15.00KB is not." Co-developed-by: Yuntao Liu <liuyuntao12@huawei.com> Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com> Fixes: 9c573cd31343 ("randomize_kstack: Improve entropy diffusion") Link: https://lore.kernel.org/r/20240617133721.377540-1-liuyuntao12@huawei.com Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Link: https://lore.kernel.org/r/20240619214711.work.953-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-06-28	string: kunit: add missing MODULE_DESCRIPTION() macros	Jeff Johnson
	make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/string_kunit.o WARNING: modpost: missing MODULE_DESCRIPTION() in lib/string_helpers_kunit.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240531-md-lib-string-v1-1-2738cf057d94@quicinc.com Signed-off-by: Kees Cook <kees@kernel.org>
2024-06-28	bcachefs: Switch online_reserved shutdown assert to WARN()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-28	ata: libata-core: Add ATA_HORKAGE_NOLPM for all Crucial BX SSD1 models	Niklas Cassel
	We got another report that CT1000BX500SSD1 does not work with LPM. If you look in libata-core.c, we have six different Crucial devices that are marked with ATA_HORKAGE_NOLPM. This model would have been the seventh. (This quirk is used on Crucial models starting with both CT* and Crucial_CT*) It is obvious that this vendor does not have a great history of supporting LPM properly, therefore, add the ATA_HORKAGE_NOLPM quirk for all Crucial BX SSD1 models. Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Cc: stable@vger.kernel.org Reported-by: Alessandro Maggio <alex.tkd.alex@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832 Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240627105551.4159447-2-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-06-28	selftests/harness: Fix tests timeout and race condition	Mickaël Salaün
	We cannot use CLONE_VFORK because we also need to wait for the timeout signal. Restore tests timeout by using the original fork() call in __run_test() but also in __TEST_F_IMPL(). Also fix a race condition when waiting for the test child process. Because test metadata are shared between test processes, only the parent process must set the test PID (child). Otherwise, t->pid may be set to zero, leading to inconsistent error cases: # RUN layout1.rule_on_mountpoint ... # rule_on_mountpoint: Test ended in some other way [127] # OK layout1.rule_on_mountpoint ok 20 layout1.rule_on_mountpoint As safeguards, initialize the "status" variable with a valid exit code, and handle unknown test exits as errors. The use of fork() introduces a new race condition in landlock/fs_test.c which seems to be specific to hostfs bind mounts, but I haven't found the root cause and it's difficult to trigger. I'll try to fix it with another patch. Cc: Christian Brauner <brauner@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Closes: https://lore.kernel.org/r/9341d4db-5e21-418c-bf9e-9ae2da7877e1@sirena.org.uk Fixes: a86f18903db9 ("selftests/harness: Fix interleaved scheduling leading to race conditions") Fixes: 24cf65a62266 ("selftests/harness: Share _metadata between forked processes") Link: https://lore.kernel.org/r/20240621180605.834676-1-mic@digikod.net Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-06-28	rcu/exp: Remove redundant full memory barrier at the end of GP	Frederic Weisbecker
	A full memory barrier is necessary at the end of the expedited grace period to order: 1) The grace period completion (pictured by the GP sequence number) with all preceding accesses. This pairs with rcu_seq_end() performed by the concurrent kworker. 2) The grace period completion and subsequent post-GP update side accesses. Pairs again against rcu_seq_end(). This full barrier is already provided by the final sync_exp_work_done() test, making the subsequent explicit one redundant. Remove it and improve comments. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-06-28	rcu: Remove full memory barrier on RCU stall printout	Frederic Weisbecker
	RCU stall printout fetches the EQS state of a CPU with a preceding full memory barrier. However there is nothing to order this read against at this debugging stage. It is inherently racy when performed remotely. Do a plain read instead. This was the last user of rcu_dynticks_snap(). Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-06-28	rcu: Remove full memory barrier on boot time eqs sanity check	Frederic Weisbecker
	When the boot CPU initializes the per-CPU data on behalf of all possible CPUs, a sanity check is performed on each of them to make sure none is initialized in an extended quiescent state. This check involves a full memory barrier which is useless at this early boot stage. Do a plain access instead. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-06-28	rcu/exp: Remove superfluous full memory barrier upon first EQS snapshot	Frederic Weisbecker
	When the grace period kthread checks the extended quiescent state counter of a CPU, full ordering is necessary to ensure that either: * If the GP kthread observes the remote target in an extended quiescent state, then that target must observe all accesses prior to the current grace period, including the current grace period sequence number, once it exits that extended quiescent state. or: * If the GP kthread observes the remote target NOT in an extended quiescent state, then the target further entering in an extended quiescent state must observe all accesses prior to the current grace period, including the current grace period sequence number, once it enters that extended quiescent state. This ordering is enforced through a full memory barrier placed right before taking the first EQS snapshot. However this is superfluous because the snapshot is taken while holding the target's rnp lock which provides the necessary ordering through its chain of smp_mb__after_unlock_lock(). Remove the needless explicit barrier before the snapshot and put a comment about the implicit barrier newly relied upon here. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2024-06-28	rcu: Remove superfluous full memory barrier upon first EQS snapshot	Frederic Weisbecker
	When the grace period kthread checks the extended quiescent state counter of a CPU, full ordering is necessary to ensure that either: * If the GP kthread observes the remote target in an extended quiescent state, then that target must observe all accesses prior to the current grace period, including the current grace period sequence number, once it exits that extended quiescent state. or: * If the GP kthread observes the remote target NOT in an extended quiescent state, then the target further entering in an extended quiescent state must observe all accesses prior to the current grace period, including the current grace period sequence number, once it enters that extended quiescent state. This ordering is enforced through a full memory barrier placed right before taking the first EQS snapshot. However this is superfluous because the snapshot is taken while holding the target's rnp lock which provides the necessary ordering through its chain of smp_mb__after_unlock_lock(). Remove the needless explicit barrier before the snapshot and put a comment about the implicit barrier newly relied upon here. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>