summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-31KVM: arm64: Add helpers to determine if PMC counts at a given ELOliver Upton
Checking the exception level filters for a PMC is a minor annoyance to open code. Add helpers to check if an event counts at EL0 and EL1, which will prove useful in a subsequent change. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-15-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Adjust range of accessible PMCs according to HPMNOliver Upton
The value of MDCR_EL2.HPMN controls the number of event counters made visible to EL0 and EL1. This means it is possible for the guest hypervisor to allow direct access to event counters to the L2. Rework KVM's PMU register emulation to take the effects of HPMN into account when handling a trap. For bitmask-style registers, writes only affect accessible registers. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-14-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Rename kvm_pmu_valid_counter_mask()Oliver Upton
Nested PMU support requires dynamically changing the visible range of PMU counters based on the exception level and value of MDCR_EL2.HPMN. At the same time, the PMU emulation code needs to know the absolute number of implemented counters, regardless of context. Rename the existing helper to make it obvious that it returns the number of implemented counters and not anything else. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-13-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Advertise support for FEAT_HPMN0Oliver Upton
Everything is in place now for KVM to actually handle MDCR_EL2.HPMN. Not only that, the emulation is capable of doing FEAT_HPMN0. Advertise support for the feature in the VM's ID registers. It is possible to emulate FEAT_HPMN0 on hardware that doesn't support it since KVM currently traps all PMU registers. Having said that, let's only advertise the feature on supporting hardware in case KVM ever provides 'direct' PMU support to VMs w/o involving host perf. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-12-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMNOliver Upton
MDCR_EL2.HPMN splits the PMU event counters into two ranges: the first range is accessible from all ELs, and the second range is accessible only to EL2/3. Supposing the guest hypervisor allows direct access to the PMU counters from the L2, KVM needs to locally handle those accesses. Add a new complex trap configuration for HPMN that checks if the counter index is accessible to the current context. As written, the architecture suggests HPMN only causes PMEVCNTR<n>_EL0 to trap, though intuition (and the pseudocode) suggest that the trap applies to PMEVTYPER<n>_EL0 as well. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-11-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0Oliver Upton
TPM and TPMCR trap bits also affect Host EL0. How fun. Mark these two trap bits as such and take advantage of the new infrastructure for dealing w/ EL0 traps. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-10-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Reinject traps that take effect in Host EL0Oliver Upton
Wire up the other end of traps that affect host EL0 by actually injecting them into the guest hypervisor. Skip over FGT entirely, as a cursory glance suggests no FGT is effective in host EL0. Note that kvm_inject_nested() is already equipped for handling exceptions while the VM is already in a host context. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Rename BEHAVE_FORWARD_ANYOliver Upton
BEHAVE_FORWARD_ANY is slightly ambiguous, especially since we're about to cram some more information into the enum. Rephrase it. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Allow coarse-grained trap combos to use complex trapsOliver Upton
KVM uses a sanity-check to avoid infinite recursion in trap combinations that could potentially depend on itself. Narrow the scope of this sanity check to the exact CGT IDs that correspond w/ trap combos, opening the door to using 'complex' traps as part of a combination. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2Oliver Upton
Add support for sanitising MDCR_EL2 and describe the RES0/RES1 bits according to the feature set exposed to the VM. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1Oliver Upton
Align the field definitions w/ DDI0601 2024-09 and opportunistically declare MTPMU as a signed field. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: sysreg: Migrate MDCR_EL2 definition to tableOliver Upton
Migrate MDCR_EL2 over to the sysreg table and align definitions with DDI0601 2024-09. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: sysreg: Describe ID_AA64DFR2_EL1 fieldsOliver Upton
Describe the new ID register in line with DDI0601 2024-09. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Initialize trap register values in hyp in pKVMFuad Tabba
Handle the initialization of trap registers at the hypervisor in pKVM, even for non-protected guests. The host is not trusted with the values of the trap registers, regardless of the VM type. Therefore, when switching between the host and the guests, only flush the HCR_EL2 TWI and TWE bits. The host is allowed to configure these for opportunistic scheduling, as neither affects the protection of VMs or the hypervisor. Reported-by: Will Deacon <will@kernel.org> Fixes: 814ad8f96e92 ("KVM: arm64: Drop trapping of PAuth instructions/keys") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-5-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Initialize the hypervisor's VM state at EL2Fuad Tabba
Do not trust the state of the VM as provided by the host when initializing the hypervisor's view of the VM sate. Initialize it instead at EL2 to a known good and safe state, as pKVM already does with hypervisor VCPU states. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-4-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Refactor kvm_vcpu_enable_ptrauth() for hyp useFuad Tabba
Move kvm_vcpu_enable_ptrauth() to a shared header to be used by hypervisor code in protected mode. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-3-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Move pkvm_vcpu_init_traps() to init_pkvm_hyp_vcpu()Fuad Tabba
Move pkvm_vcpu_init_traps() to the initialization of the hypervisor's vcpu state in init_pkvm_hyp_vcpu(), and remove the associated hypercall. In protected mode, traps need to be initialized whenever a VCPU is initialized anyway, and not only for protected VMs. This also saves an unnecessary hypercall. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-2-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31Merge tag 'sound-6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Here we see slightly more commits than wished, but basically all are small and mostly trivial fixes. The only core change is the workaround for __counted_by() usage in ASoC DAPM code, while the rest are device-specific fixes for Intel Baytrail devices, Cirrus and wcd937x codecs, and HD-audio / USB-audio devices" * tag 'sound-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1 ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3 ALSA: usb-audio: Add quirks for Dell WD19 dock ASoC: codecs: wcd937x: relax the AUX PDM watchdog ASoC: codecs: wcd937x: add missing LO Switch control ASoC: dt-bindings: rockchip,rk3308-codec: add port property ALSA: hda/realtek: Add subwoofer quirk for Infinix ZERO BOOK 13 ASoC: dapm: fix bounds checker error in dapm_widget_list_create ASoC: Intel: sst: Fix used of uninitialized ctx to log an error ASoC: cs42l51: Fix some error handling paths in cs42l51_probe() ASoC: Intel: sst: Support LPE0F28 ACPI HID ALSA: hda/realtek: Limit internal Mic boost on Dell platform ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet ASoC: Intel: bytcr_rt5640: Add support for non ACPI instantiated codec ASoC: codecs: rt5640: Always disable IRQs from rt5640_cancel_work()
2024-10-31gpiolib: fix debugfs dangling chip separatorJohan Hovold
Add the missing newline after entries for recently removed gpio chips so that the chip sections are separated by a newline as intended. Fixes: e348544f7994 ("gpio: protect the list of GPIO devices with SRCU") Cc: stable@vger.kernel.org # 6.9 Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-3-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-10-31gpiolib: fix debugfs newline separatorsJohan Hovold
The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable@vger.kernel.org # 3.7 Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-10-31KVM: arm64: selftests: Test ID_AA64PFR0.MPAM isn't completely ignoredJames Morse
The ID_AA64PFR0.MPAM bit was previously accidentally exposed to guests, and is ignored by KVM. KVM will always present the guest with 0 here, and trap the MPAM system registers to inject an undef. But, this value is still needed to prevent migration when the value is incompatible with the target hardware. Add a kvm unit test to try and write multiple values to ID_AA64PFR0.MPAM. Only the hardware value previously exposed should be ignored, all other values should be rejected. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-8-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Disable MPAM visibility by default and ignore VMM writesJames Morse
commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests, but didn't add trap handling. A previous patch supplied the missing trap handling. Existing VMs that have the MPAM field of ID_AA64PFR0_EL1 set need to be migratable, but there is little point enabling the MPAM CPU interface on new VMs until there is something a guest can do with it. Clear the MPAM field from the guest's ID_AA64PFR0_EL1 and on hardware that supports MPAM, politely ignore the VMMs attempts to set this bit. Guests exposed to this bug have the sanitised value of the MPAM field, so only the correct value needs to be ignored. This means the field can continue to be used to block migration to incompatible hardware (between MPAM=1 and MPAM=5), and the VMM can't rely on the field being ignored. Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-7-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add a macro for creating filtered sys_reg_descs entriesJames Morse
The sys_reg_descs array holds function pointers and reset value for managing the user-space and guest view of system registers. These are mostly created by a set of macro's as only some combinations of behaviour are needed. If a register needs special treatment, its sys_reg_descs entry is open-coded. This is true of some id registers where the value provided by user-space is validated by some helpers. Before adding another one of these, add a helper that covers the existing special cases. 'ID_FILTERED' expects helpers to set the user-space value, and retrieve the modified reset value. Like ID_WRITABLE() this uses id_visibility(), which should have no functional change for the registers converted to use ID_FILTERED(). read_sanitised_id_aa64dfr0_el1() and read_sanitised_id_aa64pfr0_el1() have been refactored to be called from kvm_read_sanitised_id_reg(), to try be consistent with ID_WRITABLE(). Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-6-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Fix missing traps of guest accesses to the MPAM registersJames Morse
commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests, but didn't add trap handling. If you are unlucky, this results in an MPAM aware guest being delivered an undef during boot. The host prints: | kvm [97]: Unsupported guest sys_reg access at: ffff800080024c64 [00000005] | { Op0( 3), Op1( 0), CRn(10), CRm( 5), Op2( 0), func_read }, Which results in: | Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc7-00559-gd89c186d50b2 #14616 | Hardware name: linux,dummy-virt (DT) | pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : test_has_mpam+0x18/0x30 | lr : test_has_mpam+0x10/0x30 | sp : ffff80008000bd90 ... | Call trace: | test_has_mpam+0x18/0x30 | update_cpu_capabilities+0x7c/0x11c | setup_cpu_features+0x14/0xd8 | smp_cpus_done+0x24/0xb8 | smp_init+0x7c/0x8c | kernel_init_freeable+0xf8/0x280 | kernel_init+0x24/0x1e0 | ret_from_fork+0x10/0x20 | Code: 910003fd 97ffffde 72001c00 54000080 (d538a500) | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b | ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- Add the support to enable the traps, and handle the three guest accessible registers by injecting an UNDEF. This stops KVM from spamming the host log, but doesn't yet hide the feature from the id registers. With MPAM v1.0 we can trap the MPAMIDR_EL1 register only if ARM64_HAS_MPAM_HCR, with v1.1 an additional MPAM2_EL2.TIDR bit traps MPAMIDR_EL1 on platforms that don't have MPAMHCR_EL2. Enable one of these if either is supported. If neither is supported, the guest can discover that the CPU has MPAM support, and how many PARTID etc the host has ... but it can't influence anything, so its harmless. Fixes: 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") CC: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/linux-arm-kernel/20200925160102.118858-1-james.morse@arm.com/ Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-5-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: cpufeature: discover CPU support for MPAMJames Morse
ARMv8.4 adds support for 'Memory Partitioning And Monitoring' (MPAM) which describes an interface to cache and bandwidth controls wherever they appear in the system. Add support to detect MPAM. Like SVE, MPAM has an extra id register that describes some more properties, including the virtualisation support, which is optional. Detect this separately so we can detect mismatched/insane systems, but still use MPAM on the host even if the virtualisation support is missing. MPAM needs enabling at the highest implemented exception level, otherwise the register accesses trap. The 'enabled' flag is accessible to lower exception levels, but its in a register that traps when MPAM isn't enabled. The cpufeature 'matches' hook is extended to test this on one of the CPUs, so that firmware can emulate MPAM as disabled if it is reserved for use by secure world. Secondary CPUs that appear late could trip cpufeature's 'lower safe' behaviour after the MPAM properties have been advertised to user-space. Add a verify call to ensure late secondaries match the existing CPUs. (If you have a boot failure that bisects here its likely your CPUs advertise MPAM in the id registers, but firmware failed to either enable or MPAM, or emulate the trap as if it were disabled) Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-4-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: head.S: Initialise MPAM EL2 registers and disable trapsJames Morse
Add code to head.S's el2_setup to detect MPAM and disable any EL2 traps. This register resets to an unknown value, setting it to the default parititons/pmg before we enable the MMU is the best thing to do. Kexec/kdump will depend on this if the previous kernel left the CPU configured with a restrictive configuration. If linux is booted at the highest implemented exception level el2_setup will clear the enable bit, disabling MPAM. This code can't be enabled until a subsequent patch adds the Kconfig and cpufeature boiler plate. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-3-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64/sysreg: Convert existing MPAM sysregs and add the remaining entriesJames Morse
Move the existing MPAM system register defines from sysreg.h to tools/sysreg and add the remaining system registers. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-2-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernateDavid Woodhouse
The PSCI v1.3 specification adds support for a SYSTEM_OFF2 function which is analogous to ACPI S4 state. This will allow hosting environments to determine that a guest is hibernated rather than just powered off, and handle that state appropriately on subsequent launches. Since commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and poweroff") the EFI shutdown method is deliberately preferred over PSCI or other methods. So register a SYS_OFF_MODE_POWER_OFF handler which *only* handles the hibernation, leaving the original PSCI SYSTEM_OFF as a last resort via the legacy pm_power_off function pointer. The hibernation code already exports a system_entering_hibernation() function which is be used by the higher-priority handler to check for hibernation. That existing function just returns the value of a static boolean variable from hibernate.c, which was previously only set in the hibernation_platform_enter() code path. Set the same flag in the simpler code path around the call to kernel_power_off() too. An alternative way to hook SYSTEM_OFF2 into the hibernation code would be to register a platform_hibernation_ops structure with an ->enter() method which makes the new SYSTEM_OFF2 call. But that would have the unwanted side-effect of making hibernation take a completely different code path in hibernation_platform_enter(), invoking a lot of special dpm callbacks. Another option might be to add a new SYS_OFF_MODE_HIBERNATE mode, with fallback to SYS_OFF_MODE_POWER_OFF. Or to use the sys_off_data to indicate whether the power off is for hibernation. But this version works and is relatively simple. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20241019172459.2241939-7-dwmw2@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31btrfs: fix defrag not merging contiguous extents due to merged extent mapsFilipe Manana
When running defrag (manual defrag) against a file that has extents that are contiguous and we already have the respective extent maps loaded and merged, we end up not defragging the range covered by those contiguous extents. This happens when we have an extent map that was the result of merging multiple extent maps for contiguous extents and the length of the merged extent map is greater than or equals to the defrag threshold length. The script below reproduces this scenario: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT # Create a 256K file with 4 extents of 64K each. xfs_io -f -c "falloc 0 64K" \ -c "pwrite 0 64K" \ -c "falloc 64K 64K" \ -c "pwrite 64K 64K" \ -c "falloc 128K 64K" \ -c "pwrite 128K 64K" \ -c "falloc 192K 64K" \ -c "pwrite 192K 64K" \ $MNT/foo umount $MNT echo -n "Initial number of file extent items: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 128K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 128K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 256K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 256K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l Running it: $ ./test.sh Initial number of file extent items: 4 Number of file extent items after defrag with 128K threshold: 4 Number of file extent items after defrag with 256K threshold: 4 The 4 extents don't get merged because we have an extent map with a size of 256K that is the result of merging the individual extent maps for each of the four 64K extents and at defrag_lookup_extent() we have a value of zero for the generation threshold ('newer_than' argument) since this is a manual defrag. As a consequence we don't call defrag_get_extent() to get an extent map representing a single file extent item in the inode's subvolume tree, so we end up using the merged extent map at defrag_collect_targets() and decide not to defrag. Fix this by updating defrag_lookup_extent() to always discard extent maps that were merged and call defrag_get_extent() regardless of the minimum generation threshold ('newer_than' argument). A test case for fstests will be sent along soon. CC: stable@vger.kernel.org # 6.1+ Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-31btrfs: fix extent map merging not happening for adjacent extentsFilipe Manana
If we have 3 or more adjacent extents in a file, that is, consecutive file extent items pointing to adjacent extents, within a contiguous file range and compatible flags, we end up not merging all the extents into a single extent map. For example: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ xfs_io -f -d -c "pwrite -b 64K 0 64K" \ -c "pwrite -b 64K 64K 64K" \ -c "pwrite -b 64K 128K 64K" \ -c "pwrite -b 64K 192K 64K" \ /mnt/sdc/foo After all the ordered extents complete we unpin the extent maps and try to merge them, but instead of getting a single extent map we get two because: 1) When the first ordered extent completes (file range [0, 64K)) we unpin its extent map and attempt to merge it with the extent map for the range [64K, 128K), but we can't because that extent map is still pinned; 2) When the second ordered extent completes (file range [64K, 128K)), we unpin its extent map and merge it with the previous extent map, for file range [0, 64K), but we can't merge with the next extent map, for the file range [128K, 192K), because this one is still pinned. The merged extent map for the file range [0, 128K) gets the flag EXTENT_MAP_MERGED set; 3) When the third ordered extent completes (file range [128K, 192K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [0, 128K), but we can't because that extent map has the flag EXTENT_MAP_MERGED set (mergeable_maps() returns false due to different flags) while the extent map for the range [128K, 192K) doesn't have that flag set. We also can't merge it with the next extent map, for file range [192K, 256K), because that one is still pinned. At this moment we have 3 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 192K). One for file range [192K, 256K) which is still pinned; 4) When the fourth and final extent completes (file range [192K, 256K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [128K, 192K), which succeeds since none of these extent maps have the EXTENT_MAP_MERGED flag set. So we end up with 2 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 256K), with the flag EXTENT_MAP_MERGED set. Since after merging extent maps we don't attempt to merge again, that is, merge the resulting extent map with the one that is now preceding it (and the one following it), we end up with those two extent maps, when we could have had a single extent map to represent the whole file. Fix this by making mergeable_maps() ignore the EXTENT_MAP_MERGED flag. While this doesn't present any functional issue, it prevents the merging of extent maps which allows to save memory, and can make defrag not merging extents too (that will be addressed in the next patch). Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-31bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycledToke Høiland-Jørgensen
The test_run code detects whether a page has been modified and re-initialises the xdp_frame structure if it has, using xdp_update_frame_from_buff(). However, xdp_update_frame_from_buff() doesn't touch frame->mem, so that wasn't correctly re-initialised, which led to the pages from page_pool not being returned correctly. Syzbot noticed this as a memory leak. Fix this by also copying the frame->mem structure when re-initialising the frame, like we do on initialisation of a new page from page_pool. Fixes: e5995bc7e2ba ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption") Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Reported-by: syzbot+d121e098da06af416d23@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+d121e098da06af416d23@syzkaller.appspotmail.com Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://lore.kernel.org/bpf/20241030-test-run-mem-fix-v1-1-41e88e8cae43@redhat.com
2024-10-31Merge tag 'nvme-6.12-2024-10-31' of git://git.infradead.org/nvme into block-6.12Jens Axboe
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.12 - Spec compliant identification fix (Keith) - Module parameter to enable backward compatibility on unusual namespace formats (Keith) - Target double free fix when using keys (Vitaliy) - Passthrough command error handling fix (Keith)" * tag 'nvme-6.12-2024-10-31' of git://git.infradead.org/nvme: nvme: re-fix error-handling for io_uring nvme-passthrough nvmet-auth: assign dh_key to NULL after kfree_sensitive nvme: module parameter to disable pi with offsets nvme: enhance cns version checking
2024-10-31Revert "ALSA: hda/conexant: Mute speakers at suspend / shutdown"Jarosław Janik
Commit 4f61c8fe3520 ("ALSA: hda/conexant: Mute speakers at suspend / shutdown") mutes speakers on system shutdown or whenever HDA controller is suspended by PM; this however interacts badly with Thinkpad's ACPI firmware behavior which uses beeps to signal various events (enter/leave suspend or hibernation, AC power connect/disconnect, low battery, etc.); now those beeps are either muted altogether (for suspend/hibernate/ shutdown related events) or work more or less randomly (eg. AC plug/unplug is only audible when you are playing music at the moment, because HDA device is likely in suspend mode otherwise). Since the original bug report mentioned in 4f61c8fe3520 complained about Lenovo's Thinkpad laptop - revert this commit altogether. Fixes: 4f61c8fe3520 ("ALSA: hda/conexant: Mute speakers at suspend / shutdown") Signed-off-by: Jarosław Janik <jaroslaw.janik@gmail.com> Link: https://patch.msgid.link/20241030171813.18941-2-jaroslaw.janik@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-10-31io_uring/rw: fix missing NOWAIT check for O_DIRECT start writeJens Axboe
When io_uring starts a write, it'll call kiocb_start_write() to bump the super block rwsem, preventing any freezes from happening while that write is in-flight. The freeze side will grab that rwsem for writing, excluding any new writers from happening and waiting for existing writes to finish. But io_uring unconditionally uses kiocb_start_write(), which will block if someone is currently attempting to freeze the mount point. This causes a deadlock where freeze is waiting for previous writes to complete, but the previous writes cannot complete, as the task that is supposed to complete them is blocked waiting on starting a new write. This results in the following stuck trace showing that dependency with the write blocked starting a new write: task:fio state:D stack:0 pid:886 tgid:886 ppid:876 Call trace: __switch_to+0x1d8/0x348 __schedule+0x8e8/0x2248 schedule+0x110/0x3f0 percpu_rwsem_wait+0x1e8/0x3f8 __percpu_down_read+0xe8/0x500 io_write+0xbb8/0xff8 io_issue_sqe+0x10c/0x1020 io_submit_sqes+0x614/0x2110 __arm64_sys_io_uring_enter+0x524/0x1038 invoke_syscall+0x74/0x268 el0_svc_common.constprop.0+0x160/0x238 do_el0_svc+0x44/0x60 el0_svc+0x44/0xb0 el0t_64_sync_handler+0x118/0x128 el0t_64_sync+0x168/0x170 INFO: task fsfreeze:7364 blocked for more than 15 seconds. Not tainted 6.12.0-rc5-00063-g76aaf945701c #7963 with the attempting freezer stuck trying to grab the rwsem: task:fsfreeze state:D stack:0 pid:7364 tgid:7364 ppid:995 Call trace: __switch_to+0x1d8/0x348 __schedule+0x8e8/0x2248 schedule+0x110/0x3f0 percpu_down_write+0x2b0/0x680 freeze_super+0x248/0x8a8 do_vfs_ioctl+0x149c/0x1b18 __arm64_sys_ioctl+0xd0/0x1a0 invoke_syscall+0x74/0x268 el0_svc_common.constprop.0+0x160/0x238 do_el0_svc+0x44/0x60 el0_svc+0x44/0xb0 el0t_64_sync_handler+0x118/0x128 el0t_64_sync+0x168/0x170 Fix this by having the io_uring side honor IOCB_NOWAIT, and only attempt a blocking grab of the super block rwsem if it isn't set. For normal issue where IOCB_NOWAIT would always be set, this returns -EAGAIN which will have io_uring core issue a blocking attempt of the write. That will in turn also get completions run, ensuring forward progress. Since freezing requires CAP_SYS_ADMIN in the first place, this isn't something that can be triggered by a regular user. Cc: stable@vger.kernel.org # 5.10+ Reported-by: Peter Mann <peter.mann@sh.cz> Link: https://lore.kernel.org/io-uring/38c94aec-81c9-4f62-b44e-1d87f5597644@sh.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-31drm/xe: Don't short circuit TDR on jobs not startedMatthew Brost
Short circuiting TDR on jobs not started is an optimization which is not required. On LNL we are facing an issue where jobs do not get scheduled by the GuC if it misses a GGTT page update. When this occurs let the TDR fire, toggle the scheduling which may get the job unstuck, and print a warning message. If the TDR fires twice on job that hasn't started, timeout the job. v2: - Add warning message (Paulo) - Add fixes tag (Paulo) - Timeout job which hasn't started after TDR firing twice v3: - Include local change v4: - Short circuit check_timeout on job not started - use warn level rather than notice (Paulo) Fixes: 7ddb9403dd74 ("drm/xe: Sample ctx timestamp to determine if jobs have timed out") Cc: stable@vger.kernel.org Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241025214330.2010521-2-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 35d25a4a0012e690ef0cc4c5440231176db595cc) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-31drm/xe: Add mmio read before GGTT invalidateMatthew Brost
On LNL without a mmio read before a GGTT invalidate the GuC can incorrectly read the GGTT scratch page upon next access leading to jobs not getting scheduled. A mmio read before a GGTT invalidate seems to fix this. Since a GGTT invalidate is not a hot code path, blindly do a mmio read before each GGTT invalidate. Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: stable@vger.kernel.org Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3164 Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023221200.1797832-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 5a710196883e0ac019ac6df2a6d79c16ad3c32fa) [ Fix conflict with mmio vs gt argument ] Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-31USB: serial: io_edgeport: fix use after free in debug printkDan Carpenter
The "dev_dbg(&urb->dev->dev, ..." which happens after usb_free_urb(urb) is a use after free of the "urb" pointer. Store the "dev" pointer at the start of the function to avoid this issue. Fixes: 984f68683298 ("USB: serial: io_edgeport.c: remove dbg() usage") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Johan Hovold <johan@kernel.org>
2024-10-31gpio: sloppy-logic-analyzer: Check for error code from devm_mutex_init() callAndy Shevchenko
Even if it's not critical, the avoidance of checking the error code from devm_mutex_init() call today diminishes the point of using devm variant of it. Tomorrow it may even leak something. Add the missed check. Fixes: 7828b7bbbf20 ("gpio: add sloppy logic analyzer using polling") Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20241030174132.2113286-3-andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-10-31kconfig: show sub-menu entries even if the prompt is hiddenMasahiro Yamada
Since commit f79dc03fe68c ("kconfig: refactor choice value calculation"), when EXPERT is disabled, nothing within the "if INPUT" ... "endif" block in drivers/input/Kconfig is displayed. This issue affects all command-line interfaces and GUI frontends. The prompt for INPUT is hidden when EXPERT is disabled. Previously, menu_is_visible() returned true in this case; however, it now returns false, resulting in all sub-menu entries being skipped. Here is a simplified test case illustrating the issue: config A bool "A" if X default y config B bool "B" depends on A When X is disabled, A becomes unconfigurable and is forced to y. B should be displayed, as its dependency is met. This commit restores the necessary code, so menu_is_visible() functions as it did previously. Fixes: f79dc03fe68c ("kconfig: refactor choice value calculation") Reported-by: Edmund Raile <edmund.raile@proton.me> Closes: https://lore.kernel.org/all/5fd0dfc7ff171aa74352e638c276069a5f2e888d.camel@proton.me/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-10-31kbuild: deb-pkg: add pkg.linux-upstream.nokerneldbg build profileMasahiro Yamada
The Debian kernel supports the pkg.linux.nokerneldbg build profile. The debug package tends to become huge, and you may not want to build it even when CONFIG_DEBUG_INFO is enabled. This commit introduces a similar profile for the upstream kernel. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-10-31kbuild: deb-pkg: add pkg.linux-upstream.nokernelheaders build profileMasahiro Yamada
Since commit f1d87664b82a ("kbuild: cross-compile linux-headers package when possible"), 'make bindeb-pkg' may attempt to cross-compile the linux-headers package, but it fails under certain circumstances. For example, when CONFIG_MODULE_SIG_FORMAT is enabled on Debian, the following command fails: $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- bindeb-pkg [ snip ] Rebuilding host programs with aarch64-linux-gnu-gcc... HOSTCC debian/linux-headers-6.12.0-rc4/usr/src/linux-headers-6.12.0-rc4/scripts/kallsyms HOSTCC debian/linux-headers-6.12.0-rc4/usr/src/linux-headers-6.12.0-rc4/scripts/sorttable HOSTCC debian/linux-headers-6.12.0-rc4/usr/src/linux-headers-6.12.0-rc4/scripts/asn1_compiler HOSTCC debian/linux-headers-6.12.0-rc4/usr/src/linux-headers-6.12.0-rc4/scripts/sign-file In file included from /usr/include/openssl/opensslv.h:109, from debian/linux-headers-6.12.0-rc4/usr/src/linux-headers-6.12.0-rc4/scripts/sign-file.c:25: /usr/include/openssl/macros.h:14:10: fatal error: openssl/opensslconf.h: No such file or directory 14 | #include <openssl/opensslconf.h> | ^~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. This commit adds a new profile, pkg.linux-upstream.nokernelheaders, to guard the linux-headers package. There are two options to fix the above issue. Option 1: Set the pkg.linux-upstream.nokernelheaders build profile $ DEB_BUILD_PROFILES=pkg.linux-upstream.nokernelheaders \ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- bindeb-pkg This skips the building of the linux-headers package. Option 2: Install the necessary build dependencies If you want to cross-compile the linux-headers package, you need to install additional packages. For example, on Debian, the packages necessary for cross-compiling it to arm64 can be installed with the following commands: # dpkg --add-architecture arm64 # apt update # apt install gcc-aarch64-linux-gnu libssl-dev:arm64 Fixes: f1d87664b82a ("kbuild: cross-compile linux-headers package when possible") Reported-by: Ron Economos <re@w6rz.net> Closes: https://lore.kernel.org/all/b3d4f49e-7ddb-29ba-0967-689232329b53@w6rz.net/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Ron Economos <re@w6rz.net> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-10-31kbuild: rpm-pkg: disable kernel-devel package when cross-compilingMasahiro Yamada
Since commit f1d87664b82a ("kbuild: cross-compile linux-headers package when possible"), 'make binrpm-pkg' may attempt to cross-compile the kernel-devel package, but it fails under certain circumstances. For example, when CONFIG_MODULE_SIG_FORMAT is enabled on openSUSE Tumbleweed, the following command fails: $ make ARCH=arm64 CROSS_COMPILE=aarch64-suse-linux- binrpm-pkg [ snip ] Rebuilding host programs with aarch64-suse-linux-gcc... HOSTCC /home/masahiro/ref/linux/rpmbuild/BUILDROOT/kernel-6.12.0_rc4-1.aarch64/usr/src/kernels/6.12.0-rc4/scripts/kallsyms HOSTCC /home/masahiro/ref/linux/rpmbuild/BUILDROOT/kernel-6.12.0_rc4-1.aarch64/usr/src/kernels/6.12.0-rc4/scripts/sorttable HOSTCC /home/masahiro/ref/linux/rpmbuild/BUILDROOT/kernel-6.12.0_rc4-1.aarch64/usr/src/kernels/6.12.0-rc4/scripts/asn1_compiler HOSTCC /home/masahiro/ref/linux/rpmbuild/BUILDROOT/kernel-6.12.0_rc4-1.aarch64/usr/src/kernels/6.12.0-rc4/scripts/sign-file /home/masahiro/ref/linux/rpmbuild/BUILDROOT/kernel-6.12.0_rc4-1.aarch64/usr/src/kernels/6.12.0-rc4/scripts/sign-file.c:25:10: fatal error: openssl/opensslv.h: No such file or directory 25 | #include <openssl/opensslv.h> | ^~~~~~~~~~~~~~~~~~~~ compilation terminated. I believe this issue is less common on Fedora because the disto's cross- compilier cannot link user-space programs. Hence, CONFIG_CC_CAN_LINK is unset. On Fedora 40, the package information explains this limitation clearly: $ dnf info gcc-aarch64-linux-gnu [ snip ] Description : Cross-build GNU C compiler. : : Only building kernels is currently supported. Support for cross-building : user space programs is not currently provided as that would massively multiply : the number of packages. Anyway, cross-compiling RPM packages is somewhat challenging. This commit disables the kernel-devel package when cross-compiling because I did not come up with a better solution. Fixes: f1d87664b82a ("kbuild: cross-compile linux-headers package when possible") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org>
2024-10-31gpio: fix uninit-value in swnode_find_gpioSuraj Sonawane
Fix an issue detected by the Smatch tool: drivers/gpio/gpiolib-swnode.c:78 swnode_find_gpio() error: uninitialized symbol 'ret'. The issue occurs because the 'ret' variable may be used without initialization if the for_each_gpio_property_name loop does not run. This could lead to returning an undefined value, causing unpredictable behavior. Initialize 'ret' to 0 before the loop to ensure the function returns an error code if no properties are parsed, maintaining proper error handling. Fixes: 9e4c6c1ad ("Merge tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux") Signed-off-by: Suraj Sonawane <surajsonawane0215@gmail.com> Link: https://lore.kernel.org/r/20241026090642.28633-1-surajsonawane0215@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-10-31Merge tag 'nf-24-10-31' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Remove unused parameters in conntrack_dump_flush.c used by selftests, from Liu Jing. 2) Fix possible UaF when removing xtables module via getsockopt() interface, from Dong Chenchen. 3) Fix potential crash in nf_send_reset6() reported by syzkaller. From Eric Dumazet 4) Validate offset and length before calling skb_checksum() in nft_payload, otherwise hitting BUG() is possible. netfilter pull request 24-10-31 * tag 'nf-24-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_payload: sanitize offset and length before calling skb_checksum() netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6() netfilter: Fix use-after-free in get_info() selftests: netfilter: remove unused parameter ==================== Link: https://patch.msgid.link/ Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31Merge tag 'for-net-2024-10-30' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci: fix null-ptr-deref in hci_read_supported_codecs * tag 'for-net-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs ==================== Link: https://patch.msgid.link/20241030192205.38298-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'Paolo Abeni
Jijie Shao says: ==================== There are some bugfix for the HNS3 ethernet driver ChangeLog: v2 -> v3: - Rewrite the commit logs of net: hns3: add sync command to sync io-pgtable' to add more verbose explanation, suggested Paolo. - Add fixes tag for hardware issue, suggested Paolo and Simon Horman. v2: https://lore.kernel.org/all/20241018101059.1718375-1-shaojijie@huawei.com/ v1 -> v2: - Pass IRQF_NO_AUTOEN to request_irq(), suggested by Jakub. - Rewrite the commit logs of 'net: hns3: default enable tx bounce buffer when smmu enabled' and 'net: hns3: add sync command to sync io-pgtable'. v1: https://lore.kernel.org/all/20241011094521.3008298-1-shaojijie@huawei.com/ ==================== Link: https://patch.msgid.link/20241025092938.2912958-1-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31net: hns3: fix kernel crash when 1588 is sent on HIP08 devicesJie Wang
Currently, HIP08 devices does not register the ptp devices, so the hdev->ptp is NULL. But the tx process would still try to set hardware time stamp info with SKBTX_HW_TSTAMP flag and cause a kernel crash. [ 128.087798] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018 ... [ 128.280251] pc : hclge_ptp_set_tx_info+0x2c/0x140 [hclge] [ 128.286600] lr : hclge_ptp_set_tx_info+0x20/0x140 [hclge] [ 128.292938] sp : ffff800059b93140 [ 128.297200] x29: ffff800059b93140 x28: 0000000000003280 [ 128.303455] x27: ffff800020d48280 x26: ffff0cb9dc814080 [ 128.309715] x25: ffff0cb9cde93fa0 x24: 0000000000000001 [ 128.315969] x23: 0000000000000000 x22: 0000000000000194 [ 128.322219] x21: ffff0cd94f986000 x20: 0000000000000000 [ 128.328462] x19: ffff0cb9d2a166c0 x18: 0000000000000000 [ 128.334698] x17: 0000000000000000 x16: ffffcf1fc523ed24 [ 128.340934] x15: 0000ffffd530a518 x14: 0000000000000000 [ 128.347162] x13: ffff0cd6bdb31310 x12: 0000000000000368 [ 128.353388] x11: ffff0cb9cfbc7070 x10: ffff2cf55dd11e02 [ 128.359606] x9 : ffffcf1f85a212b4 x8 : ffff0cd7cf27dab0 [ 128.365831] x7 : 0000000000000a20 x6 : ffff0cd7cf27d000 [ 128.372040] x5 : 0000000000000000 x4 : 000000000000ffff [ 128.378243] x3 : 0000000000000400 x2 : ffffcf1f85a21294 [ 128.384437] x1 : ffff0cb9db520080 x0 : ffff0cb9db500080 [ 128.390626] Call trace: [ 128.393964] hclge_ptp_set_tx_info+0x2c/0x140 [hclge] [ 128.399893] hns3_nic_net_xmit+0x39c/0x4c4 [hns3] [ 128.405468] xmit_one.constprop.0+0xc4/0x200 [ 128.410600] dev_hard_start_xmit+0x54/0xf0 [ 128.415556] sch_direct_xmit+0xe8/0x634 [ 128.420246] __dev_queue_xmit+0x224/0xc70 [ 128.425101] dev_queue_xmit+0x1c/0x40 [ 128.429608] ovs_vport_send+0xac/0x1a0 [openvswitch] [ 128.435409] do_output+0x60/0x17c [openvswitch] [ 128.440770] do_execute_actions+0x898/0x8c4 [openvswitch] [ 128.446993] ovs_execute_actions+0x64/0xf0 [openvswitch] [ 128.453129] ovs_dp_process_packet+0xa0/0x224 [openvswitch] [ 128.459530] ovs_vport_receive+0x7c/0xfc [openvswitch] [ 128.465497] internal_dev_xmit+0x34/0xb0 [openvswitch] [ 128.471460] xmit_one.constprop.0+0xc4/0x200 [ 128.476561] dev_hard_start_xmit+0x54/0xf0 [ 128.481489] __dev_queue_xmit+0x968/0xc70 [ 128.486330] dev_queue_xmit+0x1c/0x40 [ 128.490856] ip_finish_output2+0x250/0x570 [ 128.495810] __ip_finish_output+0x170/0x1e0 [ 128.500832] ip_finish_output+0x3c/0xf0 [ 128.505504] ip_output+0xbc/0x160 [ 128.509654] ip_send_skb+0x58/0xd4 [ 128.513892] udp_send_skb+0x12c/0x354 [ 128.518387] udp_sendmsg+0x7a8/0x9c0 [ 128.522793] inet_sendmsg+0x4c/0x8c [ 128.527116] __sock_sendmsg+0x48/0x80 [ 128.531609] __sys_sendto+0x124/0x164 [ 128.536099] __arm64_sys_sendto+0x30/0x5c [ 128.540935] invoke_syscall+0x50/0x130 [ 128.545508] el0_svc_common.constprop.0+0x10c/0x124 [ 128.551205] do_el0_svc+0x34/0xdc [ 128.555347] el0_svc+0x20/0x30 [ 128.559227] el0_sync_handler+0xb8/0xc0 [ 128.563883] el0_sync+0x160/0x180 Fixes: 0bf5eb788512 ("net: hns3: add support for PTP") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issueHao Lan
The TQP BAR space is divided into two segments. TQPs 0-1023 and TQPs 1024-1279 are in different BAR space addresses. However, hclge_fetch_pf_reg does not distinguish the tqp space information when reading the tqp space information. When the number of TQPs is greater than 1024, access bar space overwriting occurs. The problem of different segments has been considered during the initialization of tqp.io_base. Therefore, tqp.io_base is directly used when the queue is read in hclge_fetch_pf_reg. The error message: Unable to handle kernel paging request at virtual address ffff800037200000 pc : hclge_fetch_pf_reg+0x138/0x250 [hclge] lr : hclge_get_regs+0x84/0x1d0 [hclge] Call trace: hclge_fetch_pf_reg+0x138/0x250 [hclge] hclge_get_regs+0x84/0x1d0 [hclge] hns3_get_regs+0x2c/0x50 [hns3] ethtool_get_regs+0xf4/0x270 dev_ethtool+0x674/0x8a0 dev_ioctl+0x270/0x36c sock_do_ioctl+0x110/0x2a0 sock_ioctl+0x2ac/0x530 __arm64_sys_ioctl+0xa8/0x100 invoke_syscall+0x4c/0x124 el0_svc_common.constprop.0+0x140/0x15c do_el0_svc+0x30/0xd0 el0_svc+0x1c/0x2c el0_sync_handler+0xb0/0xb4 el0_sync+0x168/0x180 Fixes: 939ccd107ffc ("net: hns3: move dump regs function to a separate file") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31net: hns3: initialize reset_timer before hclgevf_misc_irq_init()Jian Shen
Currently the misc irq is initialized before reset_timer setup. But it will access the reset_timer in the irq handler. So initialize the reset_timer earlier. Fixes: ff200099d271 ("net: hns3: remove unnecessary work in hclgevf_main") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31net: hns3: don't auto enable misc vectorJian Shen
Currently, there is a time window between misc irq enabled and service task inited. If an interrupte is reported at this time, it will cause warning like below: [ 16.324639] Call trace: [ 16.324641] __queue_delayed_work+0xb8/0xe0 [ 16.324643] mod_delayed_work_on+0x78/0xd0 [ 16.324655] hclge_errhand_task_schedule+0x58/0x90 [hclge] [ 16.324662] hclge_misc_irq_handle+0x168/0x240 [hclge] [ 16.324666] __handle_irq_event_percpu+0x64/0x1e0 [ 16.324667] handle_irq_event+0x80/0x170 [ 16.324670] handle_fasteoi_edge_irq+0x110/0x2bc [ 16.324671] __handle_domain_irq+0x84/0xfc [ 16.324673] gic_handle_irq+0x88/0x2c0 [ 16.324674] el1_irq+0xb8/0x140 [ 16.324677] arch_cpu_idle+0x18/0x40 [ 16.324679] default_idle_call+0x5c/0x1bc [ 16.324682] cpuidle_idle_call+0x18c/0x1c4 [ 16.324684] do_idle+0x174/0x17c [ 16.324685] cpu_startup_entry+0x30/0x6c [ 16.324687] secondary_start_kernel+0x1a4/0x280 [ 16.324688] ---[ end trace 6aa0bff672a964aa ]--- So don't auto enable misc vector when request irq.. Fixes: 7be1b9f3e99f ("net: hns3: make hclge_service use delayed workqueue") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>