summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-20net: vlan: don't propagate flags on openStanislav Fomichev
With the device instance lock, there is now a possibility of a deadlock: [ 1.211455] ============================================ [ 1.211571] WARNING: possible recursive locking detected [ 1.211687] 6.14.0-rc5-01215-g032756b4ca7a-dirty #5 Not tainted [ 1.211823] -------------------------------------------- [ 1.211936] ip/184 is trying to acquire lock: [ 1.212032] ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_set_allmulti+0x4e/0xb0 [ 1.212207] [ 1.212207] but task is already holding lock: [ 1.212332] ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_open+0x50/0xb0 [ 1.212487] [ 1.212487] other info that might help us debug this: [ 1.212626] Possible unsafe locking scenario: [ 1.212626] [ 1.212751] CPU0 [ 1.212815] ---- [ 1.212871] lock(&dev->lock); [ 1.212944] lock(&dev->lock); [ 1.213016] [ 1.213016] *** DEADLOCK *** [ 1.213016] [ 1.213143] May be due to missing lock nesting notation [ 1.213143] [ 1.213294] 3 locks held by ip/184: [ 1.213371] #0: ffffffff838b53e0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0x1b/0xa0 [ 1.213543] #1: ffffffff84e5fc70 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0x37/0xa0 [ 1.213727] #2: ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_open+0x50/0xb0 [ 1.213895] [ 1.213895] stack backtrace: [ 1.213991] CPU: 0 UID: 0 PID: 184 Comm: ip Not tainted 6.14.0-rc5-01215-g032756b4ca7a-dirty #5 [ 1.213993] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 1.213994] Call Trace: [ 1.213995] <TASK> [ 1.213996] dump_stack_lvl+0x8e/0xd0 [ 1.214000] print_deadlock_bug+0x28b/0x2a0 [ 1.214020] lock_acquire+0xea/0x2a0 [ 1.214027] __mutex_lock+0xbf/0xd40 [ 1.214038] dev_set_allmulti+0x4e/0xb0 # real_dev->flags & IFF_ALLMULTI [ 1.214040] vlan_dev_open+0xa5/0x170 # ndo_open on vlandev [ 1.214042] __dev_open+0x145/0x270 [ 1.214046] __dev_change_flags+0xb0/0x1e0 [ 1.214051] netif_change_flags+0x22/0x60 # IFF_UP vlandev [ 1.214053] dev_change_flags+0x61/0xb0 # for each device in group from dev->vlan_info [ 1.214055] vlan_device_event+0x766/0x7c0 # on netdevsim0 [ 1.214058] notifier_call_chain+0x78/0x120 [ 1.214062] netif_open+0x6d/0x90 [ 1.214064] dev_open+0x5b/0xb0 # locks netdevsim0 [ 1.214066] bond_enslave+0x64c/0x1230 [ 1.214075] do_set_master+0x175/0x1e0 # on netdevsim0 [ 1.214077] do_setlink+0x516/0x13b0 [ 1.214094] rtnl_newlink+0xaba/0xb80 [ 1.214132] rtnetlink_rcv_msg+0x440/0x490 [ 1.214144] netlink_rcv_skb+0xeb/0x120 [ 1.214150] netlink_unicast+0x1f9/0x320 [ 1.214153] netlink_sendmsg+0x346/0x3f0 [ 1.214157] __sock_sendmsg+0x86/0xb0 [ 1.214160] ____sys_sendmsg+0x1c8/0x220 [ 1.214164] ___sys_sendmsg+0x28f/0x2d0 [ 1.214179] __x64_sys_sendmsg+0xef/0x140 [ 1.214184] do_syscall_64+0xec/0x1d0 [ 1.214190] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 1.214191] RIP: 0033:0x7f2d1b4a7e56 Device setup: netdevsim0 (down) ^ ^ bond netdevsim1.100@netdevsim1 allmulticast=on (down) When we enslave the lower device (netdevsim0) which has a vlan, we propagate vlan's allmuti/promisc flags during ndo_open. This causes (re)locking on of the real_dev. Propagate allmulti/promisc on flags change, not on the open. There is a slight semantics change that vlans that are down now propagate the flags, but this seems unlikely to result in the real issues. Reproducer: echo 0 1 > /sys/bus/netdevsim/new_device dev_path=$(ls -d /sys/bus/netdevsim/devices/netdevsim0/net/*) dev=$(echo $dev_path | rev | cut -d/ -f1 | rev) ip link set dev $dev name netdevsim0 ip link set dev netdevsim0 up ip link add link netdevsim0 name netdevsim0.100 type vlan id 100 ip link set dev netdevsim0.100 allmulticast on down ip link add name bond1 type bond mode 802.3ad ip link set dev netdevsim0 down ip link set dev netdevsim0 master bond1 ip link set dev bond1 up ip link show Reported-by: syzbot+b0c03d76056ef6cd12a6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/Z9CfXjLMKn6VLG5d@mini-arch/T/#m15ba130f53227c883e79fb969687d69d670337a0 Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250313100657.2287455-1-sdf@fomichev.me Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20fs: reduce work in fdget_pos()Mateusz Guzik
1. predict the file was found 2. explicitly compare the ref to "one", ignoring the dead zone The latter arguably improves the behavior to begin with. Suppose the count turned bad -- the previously used ref routine is going to check for it and return 0, indicating the count does not necessitate taking ->f_pos_lock. But there very well may be several users. i.e. not paying for special-casing the dead zone improves semantics. While here spell out each condition in a dedicated if statement. This has no effect on generated code. Sizes are as follows (in bytes; gcc 13, x86-64): stock: 321 likely(): 298 likely()+ref: 280 Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250319215801.1870660-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20Merge branches 'apple/dart', 'arm/smmu/updates', 'arm/smmu/bindings', ↵Joerg Roedel
'rockchip', 's390', 'core', 'intel/vt-d' and 'amd/amd-vi' into next
2025-03-20iommu/vt-d: Fix possible circular locking dependencyLu Baolu
We have recently seen report of lockdep circular lock dependency warnings on platforms like Skylake and Kabylake: ====================================================== WARNING: possible circular locking dependency detected 6.14.0-rc6-CI_DRM_16276-gca2c04fe76e8+ #1 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: ffffffff8360ee48 (iommu_probe_device_lock){+.+.}-{3:3}, at: iommu_probe_device+0x1d/0x70 but task is already holding lock: ffff888102c7efa8 (&device->physical_node_lock){+.+.}-{3:3}, at: intel_iommu_init+0xe75/0x11f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (&device->physical_node_lock){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 intel_iommu_init+0xe75/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #5 (dmar_global_lock){++++}-{3:3}: down_read+0x43/0x1d0 enable_drhd_fault_handling+0x21/0x110 cpuhp_invoke_callback+0x4c6/0x870 cpuhp_issue_call+0xbf/0x1f0 __cpuhp_setup_state_cpuslocked+0x111/0x320 __cpuhp_setup_state+0xb0/0x220 irq_remap_enable_fault_handling+0x3f/0xa0 apic_intr_mode_init+0x5c/0x110 x86_late_time_init+0x24/0x40 start_kernel+0x895/0xbd0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xbf/0x110 common_startup_64+0x13e/0x141 -> #4 (cpuhp_state_mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 __cpuhp_setup_state_cpuslocked+0x67/0x320 __cpuhp_setup_state+0xb0/0x220 page_alloc_init_cpuhp+0x2d/0x60 mm_core_init+0x18/0x2c0 start_kernel+0x576/0xbd0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xbf/0x110 common_startup_64+0x13e/0x141 -> #3 (cpu_hotplug_lock){++++}-{0:0}: __cpuhp_state_add_instance+0x4f/0x220 iova_domain_init_rcaches+0x214/0x280 iommu_setup_dma_ops+0x1a4/0x710 iommu_device_register+0x17d/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #2 (&domain->iova_cookie->mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 iommu_setup_dma_ops+0x16b/0x710 iommu_device_register+0x17d/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #1 (&group->mutex){+.+.}-{3:3}: __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 __iommu_probe_device+0x24c/0x4e0 probe_iommu_group+0x2b/0x50 bus_for_each_dev+0x7d/0xe0 iommu_device_register+0xe1/0x260 intel_iommu_init+0xda4/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 -> #0 (iommu_probe_device_lock){+.+.}-{3:3}: __lock_acquire+0x1637/0x2810 lock_acquire+0xc9/0x300 __mutex_lock+0xb4/0xe40 mutex_lock_nested+0x1b/0x30 iommu_probe_device+0x1d/0x70 intel_iommu_init+0xe90/0x11f0 pci_iommu_init+0x13/0x70 do_one_initcall+0x62/0x3f0 kernel_init_freeable+0x3da/0x6a0 kernel_init+0x1b/0x200 ret_from_fork+0x44/0x70 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: iommu_probe_device_lock --> dmar_global_lock --> &device->physical_node_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&device->physical_node_lock); lock(dmar_global_lock); lock(&device->physical_node_lock); lock(iommu_probe_device_lock); *** DEADLOCK *** This driver uses a global lock to protect the list of enumerated DMA remapping units. It is necessary due to the driver's support for dynamic addition and removal of remapping units at runtime. Two distinct code paths require iteration over this remapping unit list: - Device registration and probing: the driver iterates the list to register each remapping unit with the upper layer IOMMU framework and subsequently probe the devices managed by that unit. - Global configuration: Upper layer components may also iterate the list to apply configuration changes. The lock acquisition order between these two code paths was reversed. This caused lockdep warnings, indicating a risk of deadlock. Fix this warning by releasing the global lock before invoking upper layer interfaces for device registration. Fixes: b150654f74bf ("iommu/vt-d: Fix suspicious RCU usage") Closes: https://lore.kernel.org/linux-iommu/SJ1PR11MB612953431F94F18C954C4A9CB9D32@SJ1PR11MB6129.namprd11.prod.outlook.com/ Tested-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250317035714.1041549-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-20iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changesSean Christopherson
Don't overwrite an IRTE that is posting IRQs to a vCPU with a posted MSI entry if the host IRQ affinity happens to change. If/when the IRTE is reverted back to "host mode", it will be reconfigured as a posted MSI or remapped entry as appropriate. Drop the "mode" field, which doesn't differentiate between posted MSIs and posted vCPUs, in favor of a dedicated posted_vcpu flag. Note! The two posted_{msi,vcpu} flags are intentionally not mutually exclusive; an IRTE can transition between posted MSI and posted vCPU. Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs") Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250315025135.2365846-3-seanjc@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-20iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabledSean Christopherson
Add a helper to take care of reconfiguring an IRTE to deliver IRQs to the host, i.e. not to a vCPU, and use the helper when an IRTE's vCPU affinity is nullified, i.e. when KVM puts an IRTE back into "host" mode. Because posted MSIs use an ephemeral IRTE, using modify_irte() puts the IRTE into full remapped mode, i.e. unintentionally disables posted MSIs on the IRQ. Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs") Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250315025135.2365846-2-seanjc@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-20iommu: apple-dart: fix potential null pointer derefQasim Ijaz
If kzalloc() fails, accessing cfg->supports_bypass causes a null pointer dereference. Fix by checking for NULL immediately after allocation and returning -ENOMEM. Fixes: 3bc0102835f6 ("iommu: apple-dart: Allow mismatched bypass support") Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Link: https://lore.kernel.org/r/20250314230102.11008-1-qasdev00@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-20iommu/rockchip: Retire global dma_dev workaroundRobin Murphy
The global dma_dev trick was mostly because the old domain_alloc op provided no context, so no way to know which IOMMU was to own the pagetable, or if a suitable one even existed at all. In the new multi-instance world with domain_alloc_paging this is no longer a concern - now we know that the given device must be associated with a valid IOMMU instance which provided the op to call in the first place, and therefore that instance can and should be the pagetable owner. To avoid worrying about the lifetime and stability of the rk_domain->iommus list, and keep the lookups simple and efficient, we'll still stash a dma_dev pointer, but now it's accurately per-domain. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Quentin Schulz <quentin.schulz@cherry.de> Tested-by: Dang Huynh <danct12@riseup.net> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/25dc948a7d35c8142c5719ac22bc523f8524d006.1741886382.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-20iommu/rockchip: Register in a sensible orderRobin Murphy
Currently Rockchip calls iommu_device_register() before it's finished setting up the hardware and driver state, and as such it now gets unhappy in various ways when registration starts working the way it was always intended to, and probing client devices straight away. Reorder the operations to ensure that what we're registering is a prepared and functional IOMMU instance. Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Quentin Schulz <quentin.schulz@cherry.de> Tested-by: Dang Huynh <danct12@riseup.net> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/e69532f00bf49d98322b96788edb7e2e305e4006.1741886382.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-20iommu/rockchip: Allocate per-device data sensiblyRobin Murphy
Now that DT-based probing is finally happening in the right order again, it reveals an issue in Rockchip's of_xlate, which can now be called during registration, but is using the global dma_dev which is only assigned later. However, this makes little sense when we're already looking up the correct IOMMU device, who should logically be the owner of the devm allocation anyway. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Anders Roxell <anders.roxell@linaro.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Quentin Schulz <quentin.schulz@cherry.de> Tested-by: Dang Huynh <danct12@riseup.net> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/771e91cf16b3048e93f657153b76905665878fa2.1741886382.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-20Merge branch 'net-ptp-fix-egregious-supported-flag-checks'Paolo Abeni
Jacob Keller says: ==================== net: ptp: fix egregious supported flag checks In preparation for adding .supported_extts_flags and .supported_perout_flags to the ptp_clock_info structure, fix a couple of places where drivers get existing flag gets grossly incorrect. The igb driver claims 82580 supports strictly validating PTP_RISING_EDGE and PTP_FALLING_EDGE, but doesn't actually check the flags. Fix the driver to require that the request match both edges, as this is implied by the datasheet description. The renesas driver also claims to support strict flag checking, but does not actually check the flags either. I do not have the data sheet for this device, so I do not know what edge it timestamps. For simplicity, just reject all requests with PTP_STRICT_FLAGS. This essentially prevents the PTP_EXTTS_REQUEST2 ioctl from working. Updating to correctly validate the flags will require someone who has the hardware to confirm the behavior. The lan743x driver supports (and strictly validates) that the request is either PTP_RISING_EDGE or PTP_FALLING_EDGE but not both. However, it does not check the flags are one of the known valid flags. Thus, requests for PTP_EXT_OFF (and any future flag) will be accepted and misinterpreted. Add the appropriate check to reject unsupported PTP_EXT_OFF requests and future proof against new flags. The broadcom PHY driver checks that PTP_PEROUT_PHASE is not set. This appears to be an attempt at rejecting unsupported flags. It is not robust against flag additions such as the PTP_PEROUT_ONE_SHOT, or anything added in the future. Fix this by instead checking against the negation of the supported PTP_PEROUT_DUTY_CYCLE instead. The ptp_ocp driver supports PTP_PEROUT_PHASE and PTP_PEROUT_DUTY_CYCLE, but does not check unsupported flags. Add the appropriate check to ensure PTP_PEROUT_ONE_SHOT and any future flags are rejected as unsupported. These are changes compile-tested, but I do not have hardware to validate the behavior. There are a number of other drivers which enable periodic output or external timestamp requests, but which do not check flags at all. We could go through each of these drivers one-by-one and meticulously add a flag check. Instead, these drivers will be covered only by the upcoming .supported_extts_flags and .supported_perout_flags checks in a net-next series. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> ==================== Link: https://patch.msgid.link/20250312-jk-net-fixes-supported-extts-flags-v2-0-ea930ba82459@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20ptp: ocp: reject unsupported periodic output flagsJacob Keller
The ptp_ocp_signal_from_perout() function supports PTP_PEROUT_DUTY_CYCLE and PTP_PEROUT_PHASE. It does not support PTP_PEROUT_ONE_SHOT, but does not reject a request with such an unsupported flag. Add the appropriate check to ensure that unsupported requests are rejected both for PTP_PEROUT_ONE_SHOT as well as any future flags. Fixes: 1aa66a3a135a ("ptp: ocp: Program the signal generators via PTP_CLK_REQ_PEROUT") Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312-jk-net-fixes-supported-extts-flags-v2-5-ea930ba82459@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20broadcom: fix supported flag check in periodic output functionJacob Keller
In bcm_ptp_perout_locked, the driver rejects requests which have PTP_PEROUT_PHASE set. This appears to be an attempt to reject any unsupported flags. Unfortunately, this only checks one flag, but does not protect against PTP_PEROUT_ONE_SHOT, or any future flags which may be added. Fix the check to ensure that no flag other than the supported PTP_PEROUT_DUTY_CYCLE is set. Fixes: 7bfe91efd525 ("net: phy: Add support for 1PPS out and external timestamps") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312-jk-net-fixes-supported-extts-flags-v2-4-ea930ba82459@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20net: lan743x: reject unsupported external timestamp requestsJacob Keller
The lan743x_ptp_io_event_cap_en() function checks that the given request sets only one of PTP_RISING_EDGE or PTP_FALLING_EDGE, but not both. However, this driver does not check whether other flags (such as PTP_EXT_OFF) are set, nor whether any future unrecognized flags are set. Fix this by adding the appropriate check to the lan743x_ptp_io_extts() function. Fixes: 60942c397af6 ("net: lan743x: Add support for PTP-IO Event Input External Timestamp (extts)") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312-jk-net-fixes-supported-extts-flags-v2-3-ea930ba82459@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20renesas: reject PTP_STRICT_FLAGS as unsupportedJacob Keller
The ravb_ptp_extts() function checks the flags coming from the PTP_EXTTS_REQUEST ioctl, to ensure that future flags are not accepted on accident. This was updated to 'honor' the PTP_STRICT_FLAGS in commit 6138e687c7b6 ("ptp: Introduce strict checking of external time stamp options."). However, the driver does not *actually* validate the flags. I originally fixed this driver to reject future flags in commit 592025a03b34 ("renesas: reject unsupported external timestamp flags"). It is still unclear whether this hardware timestamps the rising, falling, or both edges of the input signal. Accepting requests with PTP_STRICT_FLAGS is a bug, as this could lead to users mistakenly assuming a request with PTP_RISING_EDGE actually timestamps the rising edge only. Reject requests with PTP_STRICT_FLAGS (and hence all PTP_EXTTS_REQUEST2 requests) until someone with access to the datasheet or hardware knowledge can confirm the timestamping behavior and update this driver. Fixes: 6138e687c7b6 ("ptp: Introduce strict checking of external time stamp options.") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312-jk-net-fixes-supported-extts-flags-v2-2-ea930ba82459@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20igb: reject invalid external timestamp requests for 82580-based HWJacob Keller
The igb_ptp_feature_enable_82580 function correctly checks that unknown flags are not passed to the function. However, it does not actually check PTP_RISING_EDGE or PTP_FALLING_EDGE when configuring the external timestamp function. The data sheet for the 82580 product says: Upon a change in the input level of one of the SDP pins that was configured to detect Time stamp events using the TSSDP register, a time stamp of the system time is captured into one of the two auxiliary time stamp registers (AUXSTMPL/H0 or AUXSTMPL/H1). For example to define timestamping of events in the AUXSTMPL0 and AUXSTMPH0 registers, Software should: 1. Set the TSSDP.AUX0_SDP_SEL field to select the SDP pin that detects the level change and set the TSSDP.AUX0_TS_SDP_EN bit to 1. 2. Set the TSAUXC.EN_TS0 bit to 1 to enable timestamping The same paragraph is in the i350 and i354 data sheets. The wording implies that the time stamps are captured at any level change. There does not appear to be any way to only timestamp one edge of the signal. Reject requests which do not set both PTP_RISING_EDGE and PTP_FALLING_EDGE when operating under PTP_STRICT_FLAGS mode via PTP_EXTTS_REQUEST2. Fixes: 38970eac41db ("igb: support EXTTS on 82580/i354/i350") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312-jk-net-fixes-supported-extts-flags-v2-1-ea930ba82459@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20Merge branch 'support-loopback-mode-speed-selection'Paolo Abeni
Gerhard Engleder says: ==================== Support loopback mode speed selection Previously to commit 6ff3cddc365b ("net: phylib: do not disable autoneg for fixed speeds >= 1G") it was possible to select the speed of the loopback mode by configuring a fixed speed before enabling the loopback mode. Now autoneg is always enabled for >= 1G and a fixed speed of >= 1G requires successful autoneg. Thus, the speed of the loopback mode depends on the link partner for >= 1G. There is no technical reason to depend on the link partner for loopback mode. With this behavior the loopback mode is less useful for testing. Allow PHYs to support optional speed selection for the loopback mode. This support is implemented for the generic loopback support and for PHY drivers, which obviously support speed selection for loopback mode. Additionally, loopback support according to the data sheet is added to the KSZ9031 PHY. Extend phy_loopback() to signal link up and down if speed changes, because a new link speed requires link up signalling. Use this loopback speed selection in the tsnep driver to select the loopback mode speed depending the previously active speed. User space tests with 100 Mbps and 1 Gbps loopback are possible again. ==================== Link: https://patch.msgid.link/20250312203010.47429-1-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20tsnep: Select speed for loopbackGerhard Engleder
Use 100 Mbps only if the PHY is configured to this speed. Otherwise use always the maximum speed of 1000 Mbps. Also remove explicit setting of carrier on and link mode after loopback. This is not needed anymore, because phy_loopback() with selected speed signals the link and the speed to the MAC. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250312203010.47429-6-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20net: phy: marvell: Align set_loopback() implementationGerhard Engleder
Use genphy_loopback() to disable loopback like ksz9031_set_loopback(). This way disable loopback is implemented only once within genphy_loopback() and the set_loopback() implementations look similar. Also fix comment about msleep() in the out-of loopback case which is not executed in the out-of loopback case. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250312203010.47429-5-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20net: phy: micrel: Add loopback supportGerhard Engleder
The KSZ9031 PHYs requires full duplex for loopback mode. Add PHY specific set_loopback() to ensure this. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250312203010.47429-4-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20net: phy: Support speed selection for PHY loopbackGerhard Engleder
phy_loopback() leaves it to the PHY driver to select the speed of the loopback mode. Thus, the speed of the loopback mode depends on the PHY driver in use. Add support for speed selection to phy_loopback() to enable loopback with defined speeds. Ensure that link up is signaled if speed changes as speed is not allowed to change during link up. Link down and up is necessary for a new speed. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20250312203010.47429-3-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20net: phy: Allow loopback speed selection for PHY driversGerhard Engleder
PHY drivers support loopback mode, but it is not possible to select the speed of the loopback mode. The speed is chosen by the set_loopback() operation of the PHY driver. Same is valid for genphy_loopback(). There are PHYs that support loopback with different speeds. Extend set_loopback() to make loopback speed selection possible. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250312203010.47429-2-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowedChao Du
The comments for EXT_SVADE are a bit confusing. Clarify it. Signed-off-by: Chao Du <duchao@eswincomputing.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250221025929.31678-1-duchao@eswincomputing.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-03-19xsk: fix an integer overflow in xp_create_and_assign_umem()Gavrilov Ilia
Since the i and pool->chunk_size variables are of type 'u32', their product can wrap around and then be cast to 'u64'. This can lead to two different XDP buffers pointing to the same memory area. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 94033cd8e73b ("xsk: Optimize for aligned case") Cc: stable@vger.kernel.org Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru> Link: https://patch.msgid.link/20250313085007.3116044-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19Merge branch 'kvm-arm64/pmu-fixes' into kvmarm/nextOliver Upton
* kvm-arm64/pmu-fixes: : vPMU fixes for 6.15 courtesy of Akihiko Odaki : : Various fixes to KVM's vPMU implementation, notably ensuring : userspace-directed changes to the PMCs are reflected in the backing perf : events. KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/pkvm-6.15' into kvmarm/nextOliver Upton
* kvm-arm64/pkvm-6.15: : pKVM updates for 6.15 : : - SecPageTable stats for stage-2 table pages allocated by the protected : hypervisor (Vincent Donnefort) : : - HCRX_EL2 trap + vCPU initialization fixes for pKVM (Fuad Tabba) KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: arm64: Count pKVM stage-2 usage in secondary pagetable stats KVM: arm64: Distinct pKVM teardown memcache for stage-2 KVM: arm64: Add flags to kvm_hyp_memcache Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/writable-midr' into kvmarm/nextOliver Upton
* kvm-arm64/writable-midr: : Writable implementation ID registers, courtesy of Sebastian Ott : : Introduce a new capability that allows userspace to set the : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1, : and AIDR_EL1. Also plug a hole in KVM's trap configuration where : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not : support SME. KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable KVM: arm64: Copy guest CTR_EL0 into hyp VM KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR KVM: arm64: Allow userspace to change the implementation ID registers KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value KVM: arm64: Maintain per-VM copy of implementation ID regs KVM: arm64: Set HCR_EL2.TID1 unconditionally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/pmuv3-asahi' into kvmarm/nextOliver Upton
* kvm-arm64/pmuv3-asahi: : Support PMUv3 for KVM guests on Apple silicon : : Take advantage of some IMPLEMENTATION DEFINED traps available on Apple : parts to trap-and-emulate the PMUv3 registers on behalf of a KVM guest. : Constrain the vPMU to a cycle counter and single event counter, as the : Apple PMU has events that cannot be counted on every counter. : : There is a small new interface between the ARM PMU driver and KVM, where : the PMU driver owns the PMUv3 -> hardware event mappings. arm64: Enable IMP DEF PMUv3 traps on Apple M* KVM: arm64: Provide 1 event counter on IMPDEF hardware drivers/perf: apple_m1: Provide helper for mapping PMUv3 events KVM: arm64: Remap PMUv3 events onto hardware KVM: arm64: Advertise PMUv3 if IMPDEF traps are present KVM: arm64: Compute synthetic sysreg ESR for Apple PMUv3 traps KVM: arm64: Move PMUVer filtering into KVM code KVM: arm64: Use guard() to cleanup usage of arm_pmus_lock KVM: arm64: Drop kvm_arm_pmu_available static key KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3 KVM: arm64: Always support SW_INCR PMU event KVM: arm64: Compute PMCEID from arm_pmu's event bitmaps drivers/perf: apple_m1: Support host/guest event filtering drivers/perf: apple_m1: Refactor event select/filter configuration Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/pv-cpuid' into kvmarm/nextOliver Upton
* kvm-arm64/pv-cpuid: : Paravirtualized implementation ID, courtesy of Shameer Kolothum : : Big-little has historically been a pain in the ass to virtualize. The : implementation ID (MIDR, REVIDR, AIDR) of a vCPU can change at the whim : of vCPU scheduling. This can be particularly annoying when the guest : needs to know the underlying implementation to mitigate errata. : : "Hyperscalers" face a similar scheduling problem, where VMs may freely : migrate between hosts in a pool of heterogenous hardware. And yes, our : server-class friends are equally riddled with errata too. : : In absence of an architected solution to this wart on the ecosystem, : introduce support for paravirtualizing the implementation exposed : to a VM, allowing the VMM to describe the pool of implementations that a : VM may be exposed to due to scheduling/migration. : : Userspace is expected to intercept and handle these hypercalls using the : SMCCC filter UAPI, should it choose to do so. smccc: kvm_guest: Fix kernel builds for 32 bit arm KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2 smccc/kvm_guest: Enable errata based on implementation CPUs arm64: Make  _midr_in_range_list() an exported function KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2 KVM: arm64: Specify hypercall ABI for retrieving target implementations arm64: Modify _midr_range() functions to read MIDR/REVIDR internally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/nv-idregs' into kvmarm/nextOliver Upton
* kvm-arm64/nv-idregs: : Changes to exposure of NV features, courtesy of Marc Zyngier : : Apply NV-specific feature restrictions at reset rather than at the point : of KVM_RUN. This makes the true feature set visible to userspace, a : necessary step towards save/restore support or NV VMs. : : Add an additional vCPU feature flag for selecting the E2H0 flavor of NV, : such that the VHE-ness of the VM can be applied to the feature set. KVM: arm64: selftests: Test that TGRAN*_2 fields are writable KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN*_2 KVM: arm64: Advertise FEAT_ECV when possible KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable KVM: arm64: Allow userspace to limit NV support to nVHE KVM: arm64: Move NV-specific capping to idreg sanitisation KVM: arm64: Enforce NV limits on a per-idregs basis KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available KVM: arm64: Consolidate idreg callbacks KVM: arm64: Advertise NV2 in the boot messages KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0 KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace arm64: cpufeature: Handle NV_frac as a synonym of NV2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/nv-vgic' into kvmarm/nextOliver Upton
* kvm-arm64/nv-vgic: : NV VGICv3 support, courtesy of Marc Zyngier : : Support for emulating the GIC hypervisor controls and managing shadow : VGICv3 state for the L1 hypervisor. As part of it, bring in support for : taking IRQs to the L1 and UAPI to manage the VGIC maintenance interrupt. KVM: arm64: nv: Fail KVM init if asking for NV without GICv3 KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup KVM: arm64: nv: Propagate used_lrs between L1 and L0 contexts KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2 KVM: arm64: nv: Respect virtual HCR_EL2.TWx setting KVM: arm64: nv: Add Maintenance Interrupt emulation KVM: arm64: nv: Handle L2->L1 transition on interrupt injection KVM: arm64: nv: Nested GICv3 emulation KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses KVM: arm64: nv: Add ICH_*_EL2 registers to vpcu_sysreg KVM: arm64: nv: Load timer before the GIC arm64: sysreg: Add layout for ICH_MISR_EL2 arm64: sysreg: Add layout for ICH_VTR_EL2 arm64: sysreg: Add layout for ICH_HCR_EL2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/misc' into kvmarm/nextOliver Upton
* kvm-arm64/misc: : Miscellaneous fixes/cleanups for KVM/arm64 : : - Avoid GICv4 vLPI configuration when confronted with user error : : - Only attempt vLPI configuration when the target routing is an MSI : : - Document ordering requirements to avoid aforementioned user error KVM: arm64: Tear down vGIC on failed vCPU creation KVM: arm64: Document ordering requirements for irqbypass KVM: arm64: vgic-v4: Fall back to software irqbypass if LPI not found KVM: arm64: vgic-v4: Only WARN for HW IRQ mismatch when unmapping vLPI KVM: arm64: vgic-v4: Only attempt vLPI mapping for actual MSIs Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19sched/debug: Remove CONFIG_SCHED_DEBUGIngo Molnar
For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled in all the major Linux distributions: /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y The reason is that while originally CONFIG_SCHED_DEBUG started out as a debugging feature, over the years (decades ...) it has grown various bits of statistics, instrumentation and control knobs that are useful for sysadmin and general software development purposes as well. But within the kernel we still pretend that there's a choice, and sometimes code that is seemingly 'debug only' creates overhead that should be optimized in reality. So make it all official and make CONFIG_SCHED_DEBUG unconditional. Now that all uses of CONFIG_SCHED_DEBUG are removed from the code by previous patches, remove the Kconfig option as well. Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250317104257.3496611-6-mingo@kernel.org
2025-03-19sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config filesIngo Molnar
We leave most of the defconfigs alone (there's over 70 of them), but let's remove CONFIG_SCHED_DEBUG from the scheduler self-test Kconfig files. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/Z9szt3MpQmQ56TRd@gmail.com
2025-03-19sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from ↵Ingo Molnar
documentation Since it's enabled unconditionally now, remove all references to it. (Left out languages I cannot read.) Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250317104257.3496611-5-mingo@kernel.org
2025-03-19sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditionalIngo Molnar
All the big Linux distros enable CONFIG_SCHED_DEBUG, because the various features it provides help not just with kernel development, but with system administration and user-space software development as well. Reflect this reality and enable this functionality unconditionally. Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250317104257.3496611-4-mingo@kernel.org
2025-03-19sched/debug: Make 'const_debug' tunables unconditional __read_mostlyIngo Molnar
With CONFIG_SCHED_DEBUG becoming unconditional, remove the extra 'const_debug' indirection towards __read_mostly. Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250317104257.3496611-3-mingo@kernel.org
2025-03-19sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()Ingo Molnar
The scheduler has this special SCHED_WARN() facility that depends on CONFIG_SCHED_DEBUG. Since CONFIG_SCHED_DEBUG is getting removed, convert SCHED_WARN() to WARN_ON_ONCE(). Note that the warning output isn't 100% equivalent: #define SCHED_WARN_ON(x) WARN_ONCE(x, #x) Because SCHED_WARN_ON() would output the 'x' condition as well, while WARN_ONCE() will only show a backtrace. Hopefully these are rare enough to not really matter. If it does, we should probably introduce a new WARN_ON() variant that outputs the condition in stringified form, or improve WARN_ON() itself. Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250317104257.3496611-2-mingo@kernel.org
2025-03-19x86/mm: Only do broadcast flush from reclaim if pages were unmappedRik van Riel
Track whether pages were unmapped from any MM (even ones with a currently empty mm_cpumask) by the reclaim code, to figure out whether or not broadcast TLB flush should be done when reclaim finishes. The reason any MM must be tracked, and not only ones contributing to the tlbbatch cpumask, is that broadcast ASIDs are expected to be kept up to date even on CPUs where the MM is not currently active. This change allows reclaim to avoid doing TLB flushes when only clean page cache pages and/or slab memory were reclaimed, which is fairly common. ( This is a simpler alternative to the code that was in my INVLPGB series before, and it seems to capture most of the benefit due to how common it is to reclaim only page cache. ) Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250319132520.6b10ad90@fangorn
2025-03-19block/blk-iocost: ensure 'ret' is set on errorJens Axboe
In case blkg_conf_open_bdev_frozen() fails, ioc_qos_write() jumps to the error path without assigning a value to 'ret'. Ensure that it inherits the error from the passed back error value. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503200454.QWpwKeJu-lkp@intel.com/ Fixes: 9730763f4756 ("block: correct locking order for protecting blk-wbt parameters") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-19perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM onesSohil Mehta
Introduce a name for an old Pentium 4 model and replace the x86_model checks with VFM ones. This gets rid of one of the last remaining Intel-specific x86_model checks. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250318223828.2945651-3-sohil.mehta@intel.com
2025-03-19perf/x86/intel, x86/cpu: Simplify Intel PMU initializationSohil Mehta
Architectural Perfmon was introduced on the Family 6 "Core" processors starting with Yonah. Processors before Yonah need their own customized PMU initialization. p6_pmu_init() is expected to provide that initialization for early Family 6 processors. But, currently, it could get called for any Family 6 processor if the architectural perfmon feature is disabled on that processor. To simplify, restrict the P6 PMU initialization to early Family 6 processors that do not have architectural perfmon support and truly need the special handling. As a result, the "unsupported" console print becomes practically unreachable because all the released P6 processors are covered by the switch cases. Move the console print to a common location where it can cover all modern processors (including Family >15) that may not have architectural perfmon support enumerated. Also, use this opportunity to get rid of the unnecessary switch cases in P6 initialization. Only the Pentium Pro processor needs a quirk, and the rest of the processors do not need any special handling. The gaps in the case numbers are only due to no processor with those model numbers being released. Use decimal numbers to represent Intel Family numbers. Also, convert one of the last few Intel x86_model comparisons to a VFM-based one. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250318223828.2945651-2-sohil.mehta@intel.com
2025-03-19rseq/selftests: Fix namespace collision with rseq UAPI headerMichael Jeanson
When the rseq UAPI header is included, 'union rseq' clashes with 'struct rseq'. It's not the case in the rseq selftests but it does break the KVM selftests that also include this file. Rename 'union rseq' to 'union rseq_tls' to fix this. Fixes: e6644c967d3c ("rseq/selftests: Ensure the rseq ABI TLS is actually 1024 bytes") Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20250319202144.1141542-1-mjeanson@efficios.com
2025-03-19MAINTAINERS: Add a secondary maintainer for bluefield_edacDavid Thompson
Add David as a secondary maintainer. [ bp: Rewrite commit message. ] Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250319181630.2673-1-davthompson@nvidia.com
2025-03-19x86/crc: drop the avx10_256 functions and rename avx10_512 to avx512Eric Biggers
Intel made a late change to the AVX10 specification that removes support for a 256-bit maximum vector length and enumeration of the maximum vector length. AVX10 will imply a maximum vector length of 512 bits. I.e. there won't be any such thing as AVX10/256 or AVX10/512; there will just be AVX10, and it will essentially just consolidate AVX512 features. As a result of this new development, my strategy of providing both *_avx10_256 and *_avx10_512 functions didn't turn out to be that useful. The only remaining motivation for the 256-bit AVX512 / AVX10 functions is to avoid downclocking on older Intel CPUs. But I already wrote *_avx2 code too (primarily to support CPUs without AVX512), which performs almost as well as *_avx10_256. So we should just use that. Therefore, remove the *_avx10_256 CRC functions, and rename the *_avx10_512 CRC functions to *_avx512. Make Ice Lake and Tiger Lake use the *_avx2 functions instead of *_avx10_256 which they previously used. Link: https://lore.kernel.org/r/20250319181316.91271-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-03-19Merge tag 'chinese-doc-6.15-rc1' of ↵Jonathan Corbet
gitolite.kernel.org:pub/scm/linux/kernel/git/alexs/linux into docs-mw Chinese translation docs for 6.15-rc1 This is the Chinese translation subtree for 6.15-rc1. It just includes few changes: - Chinese disclaimer change - add a new translation doc: snp-tdx-threat-model - fix a typo Above patches are tested by 'make htmldocs/pdfdocs' Signed-off-by: Alex Shi <alexs@kernel.org>
2025-03-19wifi: mt76: mt7996: fix locking in mt7996_mac_sta_rc_work()Johannes Berg
The 'continue' statements need to be under spinlock, since the spinlock needs to be held as a loop invariant. Fixes: 0762bdd30279 ("wifi: mt76: mt7996: rework mt7996_mac_sta_rc_work to support MLO") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-19Merge tag 'for-net-2025-03-14' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_event: Fix connection regression between LE and non-LE adapters - Fix error code in chan_alloc_skb_cb() * tag 'for-net-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_event: Fix connection regression between LE and non-LE adapters Bluetooth: Fix error code in chan_alloc_skb_cb() ==================== Link: https://patch.msgid.link/20250314163847.110069-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19Merge tag 'mt76-next-2025-03-19' of https://github.com/nbd168/wirelessJohannes Berg
Felix Fietkau says: ==================== mt76 patches for 6.15 - preparation for mt7996 mlo support - fixes ==================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-19net: macb: Add __nonstring annotations for unterminated stringsKees Cook
When a character array without a terminating NUL character has a static initializer, GCC 15's -Wunterminated-string-initialization will only warn if the array lacks the "nonstring" attribute[1]. Mark the arrays with __nonstring to correctly identify the char array as "not a C string" and thereby eliminate the warning: In file included from ../drivers/net/ethernet/cadence/macb_main.c:42: ../drivers/net/ethernet/cadence/macb.h:1070:35: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (33 chars into 32 available) [-Wunterminated-string-initialization] 1070 | GEM_STAT_TITLE(TX1519CNT, "tx_greater_than_1518_byte_frames"), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/net/ethernet/cadence/macb.h:1050:24: note: in definition of macro 'GEM_STAT_TITLE_BITS' 1050 | .stat_string = title, \ | ^~~~~ ../drivers/net/ethernet/cadence/macb.h:1070:9: note: in expansion of macro 'GEM_STAT_TITLE' 1070 | GEM_STAT_TITLE(TX1519CNT, "tx_greater_than_1518_byte_frames"), | ^~~~~~~~~~~~~~ ../drivers/net/ethernet/cadence/macb.h:1097:35: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (33 chars into 32 available) [-Wunterminated-string-initialization] 1097 | GEM_STAT_TITLE(RX1519CNT, "rx_greater_than_1518_byte_frames"), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/net/ethernet/cadence/macb.h:1050:24: note: in definition of macro 'GEM_STAT_TITLE_BITS' 1050 | .stat_string = title, \ | ^~~~~ ../drivers/net/ethernet/cadence/macb.h:1097:9: note: in expansion of macro 'GEM_STAT_TITLE' 1097 | GEM_STAT_TITLE(RX1519CNT, "rx_greater_than_1518_byte_frames"), | ^~~~~~~~~~~~~~ Since these strings are copied with memcpy() they do not need to be NUL terminated, and can use __nonstring: memcpy(p, gem_statistics[i].stat_string, ETH_GSTRING_LEN); Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1] Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250312200700.make.521-kees@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>