summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-16drivers: base: handle module_kobject creationShyam Saini
module_add_driver() relies on module_kset list for /sys/module/<built-in-module>/drivers directory creation. Since, commit 96a1a2412acba ("kernel/params.c: defer most of param_sysfs_init() to late_initcall time") drivers which are initialized from subsys_initcall() or any other higher precedence initcall couldn't find the related kobject entry in the module_kset list because module_kset is not fully populated by the time module_add_driver() refers it. As a consequence, module_add_driver() returns early without calling make_driver_name(). Therefore, /sys/module/<built-in-module>/drivers is never created. Fix this issue by letting module_add_driver() handle module_kobject creation itself. Fixes: 96a1a2412acb ("kernel/params.c: defer most of param_sysfs_init() to late_initcall time") Cc: stable@vger.kernel.org # requires all other patches from the series Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20250227184930.34163-5-shyamsaini@linux.microsoft.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-04-16ASoC: Intel: sof_sdw: Add NULL check in asoc_sdw_rt_dmic_rtd_init()Chenyuan Yang
mic_name returned by devm_kasprintf() could be NULL. Add a check for it. Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Fixes: bee2fe44679f ("ASoC: Intel: sof_sdw: use generic rtd_init function for Realtek SDW DMICs") Link: https://patch.msgid.link/20250415194134.292830-1-chenyuan0y@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-16kernel: globalize lookup_or_create_module_kobject()Shyam Saini
lookup_or_create_module_kobject() is marked as static and __init, to make it global drop static keyword. Since this function can be called from non-init code, use __modinit instead of __init, __modinit marker will make it __init if CONFIG_MODULES is not defined. Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com> Link: https://lore.kernel.org/r/20250227184930.34163-4-shyamsaini@linux.microsoft.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-04-16kernel: refactor lookup_or_create_module_kobject()Shyam Saini
In the unlikely event of the allocation failing, it is better to let the machine boot with a not fully populated sysfs than to kill it with this BUG_ON(). All callers are already prepared for lookup_or_create_module_kobject() returning NULL. This is also preparation for calling this function from non __init code, where using BUG_ON for allocation failure handling is not acceptable. Since we are here, also start using IS_ENABLED instead of #ifdef construct. Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com> Link: https://lore.kernel.org/r/20250227184930.34163-3-shyamsaini@linux.microsoft.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-04-16irqchip/irq-bcm2712-mip: Enable driver when ARCH_BCM2835 is enabledPeter Robinson
The BCM2712 MIP driver is required for Raspberry PI5, but it's not automatically enabled when ARCH_BCM2835 is enabled and depends on ARCH_BRCMSTB. ARCH_BCM2835 shares drivers with ARCH_BRCMSTB platforms, but Raspberry PI5 does not require the BRCMSTB specific drivers, which are selected via ARCH_BRCMSTB. Enable the interrupt controller for both ARCH_BRCMSTB and ARCH_BCM2835. [ tglx: Massage changelog ] Fixes: 32c6c054661a ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller") Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250416082523.179507-1-pbrobinson@gmail.com
2025-04-16kernel: param: rename locate_module_kobjectShyam Saini
The locate_module_kobject() function looks up an existing module_kobject for a given module name. If it cannot find the corresponding module_kobject, it creates one for the given name. This commit renames locate_module_kobject() to lookup_or_create_module_kobject() to better describe its operations. This doesn't change anything functionality wise. Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com> Link: https://lore.kernel.org/r/20250227184930.34163-2-shyamsaini@linux.microsoft.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-04-16irqchip/renesas-rzv2h: Prevent TINT spurious interruptBiju Das
A spurious TINT interrupt is seen during boot on RZ/G3E SMARC EVK. A glitch in the edge detection circuit can cause a spurious interrupt. Clear the status flag after setting the ICU_TSSRk registers, which is recommended in the hardware manual as a countermeasure. Fixes: 0d7605e75ac2 ("irqchip: Add RZ/V2H(P) Interrupt Control Unit (ICU) driver") Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
2025-04-16powerpc/boot: Check for ld-option supportMadhavan Srinivasan
Commit 579aee9fc594 ("powerpc: suppress some linker warnings in recent linker versions") enabled support to add linker option "--no-warn-rwx-segments", if the version is greater than 2.39. Similar build warning were reported recently from linker version 2.35.2. ld: warning: arch/powerpc/boot/zImage.epapr has a LOAD segment with RWX permissions ld: warning: arch/powerpc/boot/zImage.pseries has a LOAD segment with RWX permissions Fix the warning by checking for "--no-warn-rwx-segments" option support in linker to enable it, instead of checking for the version range. Fixes: 579aee9fc594 ("powerpc: suppress some linker warnings in recent linker versions") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/linuxppc-dev/61cf556c-4947-4bd6-af63-892fc0966dad@linux.ibm.com/ Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250401004218.24869-1-maddy@linux.ibm.com
2025-04-16riscv: Avoid fortify warning in syscall_get_arguments()Nathan Chancellor
When building with CONFIG_FORTIFY_SOURCE=y and W=1, there is a warning because of the memcpy() in syscall_get_arguments(): In file included from include/linux/string.h:392, from include/linux/bitmap.h:13, from include/linux/cpumask.h:12, from arch/riscv/include/asm/processor.h:55, from include/linux/sched.h:13, from kernel/ptrace.c:13: In function 'fortify_memcpy_chk', inlined from 'syscall_get_arguments.isra' at arch/riscv/include/asm/syscall.h:66:2: include/linux/fortify-string.h:580:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 580 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors The fortified memcpy() routine enforces that the source is not overread and the destination is not overwritten if the size of either field and the size of the copy are known at compile time. The memcpy() in syscall_get_arguments() intentionally overreads from a1 to a5 in 'struct pt_regs' but this is bigger than the size of a1. Normally, this could be solved by wrapping a1 through a5 with struct_group() but there was already a struct_group() applied to these members in commit bba547810c66 ("riscv: tracing: Fix __write_overflow_field in ftrace_partial_regs()"). Just avoid memcpy() altogether and write the copying of args from regs manually, which clears up the warning at the expense of three extra lines of code. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Dmitry V. Levin <ldv@strace.io> Link: https://lore.kernel.org/r/20250409-riscv-avoid-fortify-warning-syscall_get_arguments-v1-1-7853436d4755@kernel.org Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-04-16xfs: fix fsmap for internal zoned devicesDarrick J. Wong
Filesystems with an internal zoned rt section use xfs_rtblock_t values that are relative to the start of the data device. When fsmap reports on internal rt sections, it reports the space used by the data section as "OWN_FS". Unfortunately, the logic for resuming a query isn't quite right, so xfs/273 fails because it stress-tests GETFSMAP with a single-record buffer. If we enter the "report fake space as OWN_FS" block with a nonzero key[0].fmr_length, we should add that to key[0].fmr_physical and recheck if we still need to emit the fake record. We should /not/ just return 0 from the whole function because that prevents all rmap record iteration. If we don't enter that block, the resumption is still wrong. keys[*].fmr_physical is a reflection of what we copied out to userspace on a previous query, which means that it already accounts for rgstart. It is not correct to add rtstart_daddr when computing start_rtb or end_rtb, so stop that. While we're at it, add a xfs_has_zoned to make it clear that this is a zoned filesystem thing. Fixes: e50ec7fac81aa2 ("xfs: enable fsmap reporting for internal RT devices") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-16md/raid1: Add check for missing source disk in process_checks()Meir Elisha
During recovery/check operations, the process_checks function loops through available disks to find a 'primary' source with successfully read data. If no suitable source disk is found after checking all possibilities, the 'primary' index will reach conf->raid_disks * 2. Add an explicit check for this condition after the loop. If no source disk was found, print an error message and return early to prevent further processing without a valid primary source. Link: https://lore.kernel.org/linux-raid/20250408143808.1026534-1-meir.elisha@volumez.com Signed-off-by: Meir Elisha <meir.elisha@volumez.com> Suggested-and-reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
2025-04-16bonding: Fix multiple long standing offload racesCosmin Ratiu
Refactor the bonding ipsec offload operations to fix a number of long-standing control plane races between state migration and user deletion and a few other issues. xfrm state deletion can happen concurrently with bond_change_active_slave() operation. This manifests itself as a bond_ipsec_del_sa() call with x->lock held, followed by a bond_ipsec_free_sa() a bit later from a wq. The alternate path of these calls coming from xfrm_dev_state_flush() can't happen, as that needs the RTNL lock and bond_change_active_slave() already holds it. 1. bond_ipsec_del_sa_all() might call xdo_dev_state_delete() a second time on an xfrm state that was concurrently killed. This is bad. 2. bond_ipsec_add_sa_all() can add a state on the new device, but pending bond_ipsec_free_sa() calls from the old device will then hit the WARN_ON() and then, worse, call xdo_dev_state_free() on the new device without a corresponding xdo_dev_state_delete(). 3. Resolve a sleeping in atomic context introduced by the mentioned "Fixes" commit. bond_ipsec_del_sa_all() and bond_ipsec_add_sa_all() now acquire x->lock and check for x->km.state to help with problems 1 and 2. And since xso.real_dev is now a private pointer managed by the bonding driver in xfrm state, make better use of it to fully fix problems 1 and 2. In bond_ipsec_del_sa_all(), set xso.real_dev to NULL while holding both the mutex and x->lock, which makes sure that neither bond_ipsec_del_sa() nor bond_ipsec_free_sa() could run concurrently. Fix problem 3 by moving the list cleanup (which requires the mutex) from bond_ipsec_del_sa() (called from atomic context) to bond_ipsec_free_sa() Finally, simplify bond_ipsec_del_sa() and bond_ipsec_free_sa() by using xso->real_dev directly, since it's now protected by locks and can be trusted to always reflect the offload device. Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Tested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-04-16bonding: Mark active offloaded xfrm_statesCosmin Ratiu
When the active link is changed for a bond device, the existing xfrm states need to be migrated over to the new link. This is done with: - bond_ipsec_del_sa_all() goes through the offloaded states list and removes all of them from hw. - bond_ipsec_add_sa_all() re-offloads all states to the new device. But because the offload status of xfrm states isn't marked in any way, there can be bugs. When all bond links are down, bond_ipsec_del_sa_all() unoffloads everything from the previous active link. If the same link then comes back up, nothing gets reoffloaded by bond_ipsec_add_sa_all(). This results in a stack trace like this a bit later when user space removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to remove a rule that's no longer offloaded: [] Call Trace: [] <TASK> [] ? __warn+0x7d/0x110 [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core] [] ? report_bug+0x16d/0x180 [] ? handle_bug+0x4f/0x90 [] ? exc_invalid_op+0x14/0x70 [] ? asm_exc_invalid_op+0x16/0x20 [] ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core] [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core] [] bond_ipsec_del_sa+0x1ab/0x200 [bonding] [] xfrm_dev_state_delete+0x1f/0x60 [] __xfrm_state_delete+0x196/0x200 [] xfrm_state_delete+0x21/0x40 [] xfrm_del_sa+0x69/0x110 [] xfrm_user_rcv_msg+0x11d/0x300 [] ? release_pages+0xca/0x140 [] ? copy_to_user_tmpl.part.0+0x110/0x110 [] netlink_rcv_skb+0x54/0x100 [] xfrm_netlink_rcv+0x31/0x40 [] netlink_unicast+0x1fc/0x2d0 [] netlink_sendmsg+0x1e4/0x410 [] __sock_sendmsg+0x38/0x60 [] sock_write_iter+0x94/0xf0 [] vfs_write+0x338/0x3f0 [] ksys_write+0xba/0xd0 [] do_syscall_64+0x4c/0x100 [] entry_SYSCALL_64_after_hwframe+0x4b/0x53 There's also another theoretical bug: Calling bond_ipsec_del_sa_all() multiple times can result in corruption in the driver implementation if the double-free isn't tolerated. This isn't nice. Before the "Fixes" commit, xs->xso.real_dev was set to NULL when an xfrm state was unoffloaded from a device, but a race with netdevsim's .xdo_dev_offload_ok() accessing real_dev was considered a sufficient reason to not set real_dev to NULL anymore. This unfortunately introduced the new bugs. Since .xdo_dev_offload_ok() was significantly refactored by [1] and there are no more users in the stack of xso.real_dev, that race is now gone and xs->xso.real_dev can now once again be used to represent which device (if any) currently holds the offloaded rule. Go one step further and set real_dev after add/before delete calls, to catch any future driver misuses of real_dev. [1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/ Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Tested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-04-16xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}Cosmin Ratiu
Previously, device driver IPSec offload implementations would fall into two categories: 1. Those that used xso.dev to determine the offload device. 2. Those that used xso.real_dev to determine the offload device. The first category didn't work with bonding while the second did. In a non-bonding setup the two pointers are the same. This commit adds explicit pointers for the offload netdevice to .xdo_dev_state_add() / .xdo_dev_state_delete() / .xdo_dev_state_free() which eliminates the confusion and allows drivers from the first category to work with bonding. xso.real_dev now becomes a private pointer managed by the bonding driver. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-04-16xfrm: Remove unneeded device check from validate_xmit_xfrmCosmin Ratiu
validate_xmit_xfrm checks whether a packet already passed through it on the master device (xso.dev) and skips processing the skb again on the slave device (xso.real_dev). This check was added in commit [1] to avoid tx packets on a bond device pass through xfrm twice and get two sets of headers, but the check was soon obsoleted by commit [2], which was added around the same time to fix a similar but unrelated problem. Commit [3] set XFRM_XMIT only when packets are hw offloaded. xso.dev is usually equal to xso.real_dev, unless bonding is used, in which case the bonding driver uses xso.real_dev to manage offloaded xfrm states. Since commit [3], the check added in commit [1] is unused on all cases, since packets going through validate_xmit_xfrm twice bail out on the check added in commit [2]. Here's a breakdown of relevant scenarios: 1. ESP offload off: validate_xmit_xfrm returns early on !xo. 2. ESP offload on, no bond: skb->dev == xso.real_dev == xso.dev. 3. ESP offload on, bond, xs on bond dev: 1st pass adds XFRM_XMIT, 2nd pass returns early on XFRM_XMIT. 3. ESP offload on, bond, xs on slave dev: 1st pass returns early on !xo, 2nd pass adds XFRM_XMIT. 4. ESP offload on, bond, xs on both bond AND slave dev: only 1 offload possible in secpath. Either 1st pass adds XFRM_XMIT and 2nd pass returns early on XFRM_XMIT, or 1st pass is sw and returns early on !xo. 6. ESP offload on, crypto fallback triggered in esp_xmit/esp6_xmit: 1st pass does sw crypto & secpath_reset, 2nd pass returns on !xo. This commit removes the unnecessary check, so xso.real_dev becomes what it is in practice: a private field managed by bonding driver. The check immediately below that can be simplified as well. [1] commit 272c2330adc9 ("xfrm: bail early on slave pass over skb") [2] commit 94579ac3f6d0 ("xfrm: Fix double ESP trailer insertion in IPsec crypto offload.") [3] commit c7dbf4c08868 ("xfrm: Provide private skb extensions for segmented and hw offloaded ESP packets") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-04-16xfrm: Use xdo.dev instead of xdo.real_devCosmin Ratiu
The policy offload struct was reused from the state offload and real_dev was copied from dev, but it was never set to anything else. Simplify the code by always using xdo.dev for policies. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-04-16net/mlx5: Avoid using xso.real_dev unnecessarilyCosmin Ratiu
xso.real_dev is the active device of an offloaded xfrm state and is managed by bonding. As such, it's subject to change when states are migrated to a new device. Using it in places other than offloading/unoffloading the states is risky. This commit saves the device into the driver-specific struct mlx5e_ipsec_sa_entry and switches mlx5e_ipsec_init_macs() and mlx5e_ipsec_netevent_event() to make use of it. Additionally, mlx5e_xfrm_update_stats() used xso.real_dev to validate that correct net locks are held. But in a bonding config, the net of the master device is the same as the underlying devices, and the net is already a local var, so use that instead. The only remaining references to xso.real_dev are now in the .xdo_dev_state_add() / .xdo_dev_state_delete() path. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-04-16xfs: Fix spelling mistake "drity" -> "dirty"Zhang Xianwei
There is a spelling mistake in fs/xfs/xfs_log.c. Fix it. Signed-off-by: Zhang Xianwei <zhang.xianwei8@zte.com.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-16ata: libata-sata: Save all fields from sense data descriptorNiklas Cassel
When filling the taskfile result for a successful NCQ command, we use the SDB FIS from the FIS Receive Area, see e.g. ahci_qc_ncq_fill_rtf(). However, the SDB FIS only has fields STATUS and ERROR. For a successful NCQ command that has sense data, we will have a successful sense data descriptor, in the Sense Data for Successful NCQ Commands log. Since we have access to additional taskfile result fields, fill in these additional fields in qc->result_tf. This matches how for failing/aborted NCQ commands, we will use e.g. ahci_qc_fill_rtf() to fill in some fields, but then for the command that actually caused the NCQ error, we will use ata_eh_read_log_10h(), which provides additional fields, saving additional fields/overriding the qc->result_tf that was fetched using ahci_qc_fill_rtf(). Fixes: 18bd7718b5c4 ("scsi: ata: libata: Handle completion of CDL commands using policy 0xD") Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2025-04-16platform/x86: msi-wmi-platform: Workaround a ACPI firmware bugArmin Wolf
The ACPI byte code inside the ACPI control method responsible for handling the WMI method calls uses a global buffer for constructing the return value, yet the ACPI control method itself is not marked as "Serialized". This means that calling WMI methods on this WMI device is not thread-safe, as concurrent WMI method calls will corrupt the global buffer. Fix this by serializing the WMI method calls using a mutex. Cc: stable@vger.kernel.org # 6.x.x: 912d614ac99e: platform/x86: msi-wmi-platform: Rename "data" variable Fixes: 9c0beb6b29e7 ("platform/x86: wmi: Add MSI WMI Platform driver") Tested-by: Antheas Kapenekakis <lkml@antheas.dev> Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20250414140453.7691-2-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-04-16cpufreq: cppc: Fix invalid return value in .get() callbackMarc Zyngier
Returning a negative error code in a function with an unsigned return type is a pretty bad idea. It is probably worse when the justification for the change is "our static analisys tool found it". Fixes: cf7de25878a1 ("cppc_cpufreq: Fix possible null pointer dereference") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-04-16fs: ensure that *path_locked*() helpers leave passed path pristineChristian Brauner
The functions currently leaving dangling pointers in the passed-in path leading to hard to debug bugs in the long run. Ensure that the path is left in pristine state just like we do in e.g., path_parentat() and other helpers. Link: https://lore.kernel.org/20250414-rennt-wimmeln-f186c3a780f1@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-16x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove coresPi Xiange
Bartlett Lake has a P-core only product with Raptor Cove. [ mingo: Switch around the define as pointed out by Christian Ludloff: Ratpr Cove is the core, Bartlett Lake is the product. Signed-off-by: Pi Xiange <xiange.pi@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Christian Ludloff <ludloff@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: "Ahmed S. Darwish" <darwi@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250414032839.5368-1-xiange.pi@intel.com
2025-04-16USB: serial: simple: add OWON HDS200 series oscilloscope supportCraig Hesling
Add serial support for OWON HDS200 series oscilloscopes and likely many other pieces of OWON test equipment. OWON HDS200 series devices host two USB endpoints, designed to facilitate bidirectional SCPI. SCPI is a predominately ASCII text protocol for test/measurement equipment. Having a serial/tty interface for these devices lowers the barrier to entry for anyone trying to write programs to communicate with them. The following shows the USB descriptor for the OWON HDS272S running firmware V5.7.1: Bus 001 Device 068: ID 5345:1234 Owon PDS6062T Oscilloscope Negotiated speed: Full Speed (12Mbps) Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 0 [unknown] bDeviceSubClass 0 [unknown] bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x5345 Owon idProduct 0x1234 PDS6062T Oscilloscope bcdDevice 1.00 iManufacturer 1 oscilloscope iProduct 2 oscilloscope iSerial 3 oscilloscope bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x0029 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 100mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 5 Physical Interface Device bInterfaceSubClass 0 [unknown] bInterfaceProtocol 0 iInterface 0 ** UNRECOGNIZED: 09 21 11 01 00 01 22 5f 00 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 32 Device Status: 0x0000 (Bus Powered) OWON appears to be using the same USB Vendor and Product ID for many of their oscilloscopes. Looking at the discussion about the USB vendor/product ID, in the link bellow, suggests that this VID/PID is shared with VDS, SDS, PDS, and now the HDS series oscilloscopes. Available documentation for these devices seems to indicate that all use a similar SCPI protocol, some with RS232 options. It is likely that this same simple serial setup would work correctly for them all. Link: https://usb-ids.gowdy.us/read/UD/5345/1234 Signed-off-by: Craig Hesling <craig@hesling.com> Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>
2025-04-16USB: serial: ftdi_sio: add support for Abacus Electrics Optical ProbeMichael Ehrenreich
Abacus Electrics makes optical probes for interacting with smart meters over an optical interface. At least one version uses an FT232B chip (as detected by ftdi_sio) with a custom USB PID, which needs to be added to the list to make the device work in a plug-and-play fashion. Signed-off-by: Michael Ehrenreich <michideep@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>
2025-04-16USB: serial: option: add Sierra Wireless EM9291Adam Xue
Add Sierra Wireless EM9291. Interface 0: MBIM control 1: MBIM data 3: AT port 4: Diagnostic port T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1199 ProdID=90e3 Rev=00.06 S: Manufacturer=Sierra Wireless, Incorporated S: Product=Sierra Wireless EM9291 S: SerialNumber=xxxxxxxxxxxxxxxx C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=32ms I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=(none) E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=(none) E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Adam Xue <zxue@semtech.com> Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>
2025-04-16nvmet: pci-epf: cleanup link state managementDamien Le Moal
Since the link_up boolean field of struct nvmet_pci_epf_ctrl is always set to true when nvmet_pci_epf_start_ctrl() is called, assign true to this field in nvmet_pci_epf_start_ctrl(). Conversely, since this field is set to false when nvmet_pci_epf_stop_ctrl() is called, set this field to false directly inside that function. While at it, also add information messages to notify the user of the PCI link state changes to help troubleshoot any link stability issues without needing to enable debug messages. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-16nvmet: pci-epf: clear CC and CSTS when disabling the controllerDamien Le Moal
When a host shuts down the controller when shutting down but does so without first disabling the controller, the enable bit remains set in the controller configuration register. When the host restarts and attempts to enable the controller again, the nvmet_pci_epf_poll_cc_work() function is unable to detect the change from 0 to 1 of the enable bit, and thus the controller is not enabled again, which result in a device scan timeout on the host. This problem also occurs if the host shuts down uncleanly or if the PCIe link goes down: as the CC.EN value is not reset, the controller is not enabled again when the host restarts. Fix this by introducing the function nvmet_pci_epf_clear_ctrl_config() to clear the CC and CSTS registers of the controller when the PCIe link is lost (nvmet_pci_epf_stop_ctrl() function), or when starting the controller fails (nvmet_pci_epf_enable_ctrl() fails). Also use this function in nvmet_pci_epf_init_bar() to simplify the initialization of the CC and CSTS registers. Furthermore, modify the function nvmet_pci_epf_disable_ctrl() to clear the CC.EN bit and write this updated value to the BAR register when the controller is shutdown by the host, to ensure that upon restart, we can detect the host setting CC.EN. Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-16nvmet: pci-epf: always fully initialize completion entriesDamien Le Moal
For a command that is normally processed through the command request execute() function, the completion entry for the command is initialized by __nvmet_req_complete() and nvmet_pci_epf_cq_work() only needs to set the status field and the phase of the completion entry before posting the entry to the completion queue. However, for commands that are failed due to an internal error (e.g. the command data buffer allocation fails), the command request execute() function is not called and __nvmet_req_complete() is never executed for the command, leaving the command completion entry uninitialized. For such command failed before calling req->execute(), the host ends up seeing completion entries with an invalid submission queue ID and command ID. Avoid such issue by always fully initilizing a command completion entry in nvmet_pci_epf_cq_work(), setting the entry submission queue head, ID and command ID. Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-16nvmet: auth: use NULL to clear a pointer in nvmet_auth_sq_free()Damien Le Moal
When compiling with C=1, the following sparse warning is generated: auth.c:243:23: warning: Using plain integer as NULL pointer Avoid this warning by using NULL to instead of 0 to set the sq tls_key pointer. Fixes: fa2e0f8bbc68 ("nvmet-tcp: support secure channel concatenation") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-16nvme-multipath: sysfs links may not be created for devicesHannes Reinecke
When rapidly rescanning for new namespaces nvme_mpath_add_sysfs_link() may be called for a block device not added to sysfs. But NVME_NS_SYSFS_ATTR_LINK had already been set, so when checking this device a second time we will fail to create the link. Fix this by exchanging the order of the block device check and the NVME_NS_SYSFS_ATTR_LINK bit check. Fixes: 4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin io-policy") Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>** Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-16nvme: fixup scan failure for non-ANA multipath controllersHannes Reinecke
Commit 62baf70c3274 caused the ANA log page to be re-read, even on controllers that do not support ANA. While this should generally harmless, some controllers hang on the unsupported log page and never finish probing. Fixes: 62baf70c3274 ("nvme: re-read ANA log page after ns scan completes") Signed-off-by: Hannes Reinecke <hare@kernel.org> Tested-by: Srikanth Aithal <sraithal@amd.com> [hch: more detailed commit message] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2025-04-15Merge branch 'net-ptp-driver-opt-in-for-supported-ptp-ioctl-flags'Jakub Kicinski
Jacob Keller says: ==================== net: ptp: driver opt-in for supported PTP ioctl flags Both the PTP_EXTTS_REQUEST(2) and PTP_PEROUT_REQUEST(2) ioctls take flags from userspace to modify their behavior. Drivers are supposed to check these flags, rejecting requests for flags they do not support. Many drivers today do not check these flags, despite many attempts to squash individual drivers as these mistakes are discovered. Additionally, any new flags added can require updating every driver if their validation checks are poorly implemented. It is clear that driver authors will not reliably check for unsupported flags. The root of the issue is that drivers must essentially opt out of every flag, rather than opt in to the ones they support. Instead, lets introduce .supported_perout_flags and .supported_extts_flags to the ptp_clock_info structure. This is a pattern taken from several ethtool ioctls which enabled validation to move out of the drivers and into the shared ioctl handlers. This pattern has worked quite well and makes it much more difficult for drivers to accidentally accept flags they do not support. With this approach, drivers which do not set the supported fields will have the core automatically reject any request which has flags. Drivers must opt in to each flag they support by adding it to the list, with the sole exception being the PTP_ENABLE_FEATURE flag of the PTP_EXTTS_REQUEST ioctl since it is entirely handled by the ptp_chardev.c file. This change will ensure that all current and future drivers are safe for extension when we need to extend these ioctls. I opted to keep all the driver changes into one patch per ioctl type. The changes are relatively small and straight forward. Splitting it per-driver would make the series large, and also break flags between the introduction of the supported field and setting it in each driver. The non-Intel drivers are compile-tested only, and I would appreciate confirmation and testing from their respective maintainers. (It is also likely that I missed some of the driver authors especially for drivers which didn't make any checks at all and do not set either of the supported flags yet) v1: https://lore.kernel.org/20250408-jk-supported-perout-flags-v1-0-d2f8e3df64f3@intel.com ==================== Link: https://patch.msgid.link/20250414-jk-supported-perout-flags-v2-0-f6b17d15475c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: ptp: introduce .supported_perout_flags to ptp_clock_infoJacob Keller
The PTP_PEROUT_REQUEST2 ioctl has gained support for flags specifying specific output behavior including PTP_PEROUT_ONE_SHOT, PTP_PEROUT_DUTY_CYCLE, PTP_PEROUT_PHASE. Driver authors are notorious for not checking the flags of the request. This results in misinterpreting the request, generating an output signal that does not match the requested value. It is anticipated that even more flags will be added in the future, resulting in even more broken requests. Expecting these issues to be caught during review or playing whack-a-mole after the fact is not a great solution. Instead, introduce the supported_perout_flags field in the ptp_clock_info structure. Update the core character device logic to explicitly reject any request which has a flag not on this list. This ensures that drivers must 'opt in' to the flags they support. Drivers which don't set the .supported_perout_flags field will not need to check that unsupported flags aren't passed, as the core takes care of this. Update the drivers which do support flags to set this new field. Note the following driver files set n_per_out to a non-zero value but did not check the flags at all: • drivers/ptp/ptp_clockmatrix.c • drivers/ptp/ptp_idt82p33.c • drivers/ptp/ptp_fc3.c • drivers/net/ethernet/ti/am65-cpts.c • drivers/net/ethernet/aquantia/atlantic/aq_ptp.c • drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c • drivers/net/dsa/sja1105/sja1105_ptp.c • drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c • drivers/net/ethernet/mscc/ocelot_vsc7514.c • drivers/net/ethernet/intel/i40e/i40e_ptp.c Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250414-jk-supported-perout-flags-v2-2-f6b17d15475c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: ptp: introduce .supported_extts_flags to ptp_clock_infoJacob Keller
The PTP_EXTTS_REQUEST(2) ioctl has a flags field which specifies how the external timestamp request should behave. This includes which edge of the signal to timestamp, as well as a specialized "offset" mode. It is expected that more flags will be added in the future. Driver authors routinely do not check the flags, often accepting requests with flags which they do not support. Even drivers which do check flags may not be future-proofed to reject flags not yet defined. Thus, any future flag additions often require manually updating drivers to reject these flags. This approach of hoping we catch flag checks during review, or playing whack-a-mole after the fact is the wrong approach. Introduce the "supported_extts_flags" field to the ptp_clock_info structure. This field defines the set of flags the device actually supports. Update the core character device logic to check this field and reject unsupported requests. Getting this right is somewhat tricky. First, to avoid unnecessary repetition and make basic functionality work when .supported_extts_flags is 0, the core always accepts the PTP_ENABLE_FEATURE flag. This flag is used to set the 'on' parameter to the .enable function and is thus always 'supported' by all drivers. For backwards compatibility, the PTP_RISING_EDGE and PTP_FALLING_EDGE flags are merely "hints" when using the old PTP_EXTTS_REQUEST ioctl, and are not expected to be enforced. If the user issues PTP_EXTTS_REQUEST2, the PTP_STRICT_FLAGS flag is added which is supposed to inform the driver to strictly validate the flags and reject unsupported requests. To handle this, first check if the driver reports PTP_STRICT_FLAGS support. If it does not, then always allow the PTP_RISING_EDGE and PTP_FALLING_EDGE flags. This keeps backwards compatibility with the original PTP_EXTTS_REQUEST ioctl where these flags are not guaranteed to be honored. This way, drivers which do not set the supported_extts_flags will continue to accept requests for the original PTP_EXTTS_REQUEST ioctl. The core will automatically reject requests with new flags, and correctly reject requests with PTP_STRICT_FLAGS, where the driver is supposed to strictly validate the flags. Update the various drivers, refactoring their validation logic into the .supported_extts_flags field. For consistency and readability, PTP_ENABLE_FEATURE is not set in the supported flags list, and PTP_EXTTS_EDGES is expanded to PTP_RISING_EDGE | PTP_FALLING_EDGE in all cases. Note the following driver files set n_ext_ts to a non-zero value but did not check flags at all: • drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c • drivers/net/ethernet/freescale/enetc/enetc_ptp.c • drivers/net/ethernet/intel/i40e/i40e_ptp.c • drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c • drivers/net/ethernet/renesas/ravb_ptp.c • drivers/net/ethernet/renesas/rtsn.c • drivers/net/ethernet/renesas/rtsn.h • drivers/net/ethernet/ti/am65-cpts.c • drivers/net/ethernet/ti/cpts.h • drivers/net/ethernet/ti/icssg/icss_iep.c • drivers/net/ethernet/xscale/ptp_ixp46x.c • drivers/net/phy/bcm-phy-ptp.c • drivers/ptp/ptp_ocp.c • drivers/ptp/ptp_pch.c • drivers/ptp/ptp_qoriq.c These drivers behavior does change slightly: they will now reject the PTP_EXTTS_REQUEST2 ioctl, because they do not strictly validate their flags. This also makes them no longer incorrectly accept PTP_EXT_OFFSET. Also note that the renesas ravb driver does not support PTP_STRICT_FLAGS. We could leave the .supported_extts_flags as 0, but I added the PTP_RISING_EDGE | PTP_FALLING_EDGE since the driver previously manually validated these flags. This is equivalent to 0 because the core will allow these flags regardless unless PTP_STRICT_FLAGS is also set. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250414-jk-supported-perout-flags-v2-1-f6b17d15475c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15Merge tag 'linux-can-fixes-for-6.15-20250415' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-04-15 The first patch is by Davide Caratti and fixes the missing derement in the protocol inuse counter for the J1939 CAN protocol. The last patch is by Weizhao Ouyang and fixes a broken quirks check in the rockchip CAN-FD driver. * tag 'linux-can-fixes-for-6.15-20250415' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: rockchip_canfd: fix broken quirks checks can: fix missing decrement of j1939_proto.inuse_idx ==================== Link: https://patch.msgid.link/20250415103401.445981-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15ublk: don't suggest CONFIG_BLK_DEV_UBLK=YCaleb Sander Mateos
The CONFIG_BLK_DEV_UBLK help text suggests setting the config option to Y so task_work_add() can be used to dispatch I/O, improving performance. However, this mechanism was removed in commit 29dc5d06613f2 ("ublk: kill queuing request by task_work_add"). So remove this paragraph from the config help text. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Uday Shankar <ushankar@purestorage.com> Link: https://lore.kernel.org/r/20250416004111.3242817-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15loop: stop using vfs_iter_{read,write} for buffered I/OChristoph Hellwig
vfs_iter_{read,write} always perform direct I/O when the file has the O_DIRECT flag set, which breaks disabling direct I/O using the LOOP_SET_STATUS / LOOP_SET_STATUS64 ioctls. This was recenly reported as a regression, but as far as I can tell was only uncovered by better checking for block sizes and has been around since the direct I/O support was added. Fix this by using the existing aio code that calls the raw read/write iter methods instead. Note that despite the comments there is no need for block drivers to ever call flush_dcache_page themselves, and the call is a left-over from prehistoric times. Fixes: ab1cb278bc70 ("block: loop: introduce ioctl command of LOOP_SET_DIRECT_IO") Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20250409130940.3685677-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15batman-adv: Fix double-hold of meshif when getting enabledSven Eckelmann
It was originally meant to replace the dev_hold with netdev_hold. But this was missed in batadv_hardif_enable_interface(). As result, there was an imbalance and a hang when trying to remove the mesh-interface with (previously) active hard-interfaces: unregister_netdevice: waiting for batadv0 to become free. Usage count = 3 Fixes: 00b35530811f ("batman-adv: adopt netdev_hold() / netdev_put()") Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+ff3aa851d46ab82953a3@syzkaller.appspotmail.com Reported-by: syzbot+4036165fc595a74b09b2@syzkaller.appspotmail.com Reported-by: syzbot+c35d73ce910d86c0026e@syzkaller.appspotmail.com Reported-by: syzbot+48c14f61594bdfadb086@syzkaller.appspotmail.com Reported-by: syzbot+f37372d86207b3bb2941@syzkaller.appspotmail.com Signed-off-by: Sven Eckelmann <sven@narfation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250414-double_hold_fix-v5-1-10e056324cde@narfation.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15Merge branch 'fib_rules-fix-iif-oif-matching-on-l3-master-device'Jakub Kicinski
Ido Schimmel says: ==================== fib_rules: Fix iif / oif matching on L3 master device Patch #1 fixes a recently reported regression regarding FIB rules that match on iif / oif being a VRF device. Patch #2 adds test cases to the FIB rules selftest. ==================== Link: https://patch.msgid.link/20250414172022.242991-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15selftests: fib_rule_tests: Add VRF match testsIdo Schimmel
Add tests for FIB rules that match on iif / oif being a VRF device. Test both good and bad flows. With previous patch ("net: fib_rules: Fix iif / oif matching on L3 master device"): # ./fib_rule_tests.sh [...] Tests passed: 328 Tests failed: 0 Without it: # ./fib_rule_tests.sh [...] Tests passed: 324 Tests failed: 4 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250414172022.242991-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: fib_rules: Fix iif / oif matching on L3 master deviceIdo Schimmel
Before commit 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") it was possible to use FIB rules to match on a L3 domain. This was done by having a FIB rule match on iif / oif being a L3 master device. It worked because prior to the FIB rule lookup the iif / oif fields in the flow structure were reset to the index of the L3 master device to which the input / output device was enslaved to. The above scheme made it impossible to match on the original input / output device. Therefore, cited commit stopped overwriting the iif / oif fields in the flow structure and instead stored the index of the enslaving L3 master device in a new field ('flowi_l3mdev') in the flow structure. While the change enabled new use cases, it broke the original use case of matching on a L3 domain. Fix this by interpreting the iif / oif matching on a L3 master device as a match against the L3 domain. In other words, if the iif / oif in the FIB rule points to a L3 master device, compare the provided index against 'flowi_l3mdev' rather than 'flowi_{i,o}if'. Before cited commit, a FIB rule that matched on 'iif vrf1' would only match incoming traffic from devices enslaved to 'vrf1'. With the proposed change (i.e., comparing against 'flowi_l3mdev'), the rule would also match traffic originating from a socket bound to 'vrf1'. Avoid that by adding a new flow flag ('FLOWI_FLAG_L3MDEV_OIF') that indicates if the L3 domain was derived from the output interface or the input interface (when not set) and take this flag into account when evaluating the FIB rule against the flow structure. Avoid unnecessary checks in the data path by detecting that a rule matches on a L3 master device when the rule is installed and marking it as such. Tested using the following script [1]. Output before 40867d74c374 (v5.4.291): default dev dummy1 table 100 scope link default dev dummy1 table 200 scope link Output after 40867d74c374: default dev dummy1 table 300 scope link default dev dummy1 table 300 scope link Output with this patch: default dev dummy1 table 100 scope link default dev dummy1 table 200 scope link [1] #!/bin/bash ip link add name vrf1 up type vrf table 10 ip link add name dummy1 up master vrf1 type dummy sysctl -wq net.ipv4.conf.all.forwarding=1 sysctl -wq net.ipv4.conf.all.rp_filter=0 ip route add table 100 default dev dummy1 ip route add table 200 default dev dummy1 ip route add table 300 default dev dummy1 ip rule add prio 0 oif vrf1 table 100 ip rule add prio 1 iif vrf1 table 200 ip rule add prio 2 table 300 ip route get 192.0.2.1 oif dummy1 fibmatch ip route get 192.0.2.1 iif dummy1 from 198.51.100.1 fibmatch Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") Reported-by: hanhuihui <hanhuihui5@huawei.com> Closes: https://lore.kernel.org/netdev/ec671c4f821a4d63904d0da15d604b75@huawei.com/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250414172022.242991-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15eth: bnxt: fix missing ring index trim on error pathJakub Kicinski
Commit under Fixes converted tx_prod to be free running but missed masking it on the Tx error path. This crashes on error conditions, for example when DMA mapping fails. Fixes: 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250414143210.458625-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: bridge: locally receive all multicast packets if IFF_ALLMULTI is setShengyu Qu
If multicast snooping is enabled, multicast packets may not always end up on the local bridge interface, if the host is not a member of the multicast group. Similar to how IFF_PROMISC allows all packets to be received locally, let IFF_ALLMULTI allow all multicast packets to be received. OpenWrt uses a user space daemon for DHCPv6/RA/NDP handling, and in relay mode it sets the ALLMULTI flag in order to receive all relevant queries on the network. This works for normal network interfaces and non-snooping bridges, but not snooping bridges (unless multicast routing is enabled). Reported-by: Felix Fietkau <nbd@nbd.name> Closes: https://github.com/openwrt/openwrt/issues/15857#issuecomment-2662851243 Signed-off-by: Shengyu Qu <wiagn233@outlook.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/OSZPR01MB8434308370ACAFA90A22980798B32@OSZPR01MB8434.jpnprd01.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: ethernet: ti: am65-cpsw: fix port_np reference countingMichael Walle
A reference to the device tree node is stored in a private struct, thus the reference count has to be incremented. Also, decrement the count on device removal and in the error path. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250414083942.4015060-1-mwalle@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15net: phy: remove redundant dependency on NETDEVICES for PHYLINK and PHYLIBHeiner Kallweit
drivers/net/phy/Kconfig is included from drivers/net/Kconfig in an "if NETDEVICES" section. Therefore we don't have to duplicate the dependency here. And if e.g. PHYLINK is selected somewhere, then the dependency is ignored anyway (see note in Kconfig help). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/085892cd-aa11-4c22-bf8a-574a5c6dcd7c@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15octeontx2-pf: handle otx2_mbox_get_rsp errorsChenyuan Yang
Adding error pointer check after calling otx2_mbox_get_rsp(). This is similar to the commit bd3110bc102a ("octeontx2-pf: handle otx2_mbox_get_rsp errors in otx2_flows.c"). Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Fixes: 6c40ca957fe5 ("octeontx2-pf: Adds TC offload support") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250412183327.3550970-1-chenyuan0y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15tc: Return an error if filters try to attach too many actionsToke Høiland-Jørgensen
While developing the fix for the buffer sizing issue in [0], I noticed that the kernel will happily accept a long list of actions for a filter, and then just silently truncate that list down to a maximum of 32 actions. That seems less than ideal, so this patch changes the action parsing to return an error message and refuse to create the filter in this case. This results in an error like: # ip link add type veth # tc qdisc replace dev veth0 root handle 1: fq_codel # tc -echo filter add dev veth0 parent 1: u32 match u32 0 0 $(for i in $(seq 33); do echo action pedit munge ip dport set 22; done) Error: Only 32 actions supported per filter. We have an error talking to the kernel Instead of just creating a filter with 32 actions and dropping the last one. This is obviously a change in UAPI. But seeing as creating more than 32 filters has never actually *worked*, it seems that returning an explicit error is better, and any use cases that get broken by this were already broken just in more subtle ways. [0] https://lore.kernel.org/r/20250407105542.16601-1-toke@redhat.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250409145523.164506-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15Revert "PCI: Avoid reset when disabled via sysfs"Alex Williamson
This reverts commit 479380efe1625e251008d24b2810283db60d6fcd. The reset_method attribute on a PCI device is only intended to manage the availability of function scoped resets for a device. It was never intended to restrict resets targeting the bus or slot. In introducing a restriction that each device must support function level reset by testing pci_reset_supported(), we essentially create a catch-22, that a device must have a function scope reset in order to support bus/slot reset, when we use bus/slot reset to effect a reset of a device that does not support a function scoped reset, especially multi-function devices. This breaks the majority of uses cases where vfio-pci uses bus/slot resets to manage multifunction devices that do not support function scoped resets. Fixes: 479380efe162 ("PCI: Avoid reset when disabled via sysfs") Reported-by: Cal Peake <cp@absolutedigital.net> Closes: https://lore.kernel.org/all/808e1111-27b7-f35b-6d5c-5b275e73677b@absolutedigital.net Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220010 Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250414211828.3530741-1-alex.williamson@redhat.com
2025-04-15bcachefs: snapshot_node_missing is now autofixKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>