summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-14net: netconsole: Disable target before netpoll cleanupBreno Leitao
Currently, netconsole cleans up the netpoll structure before disabling the target. This approach can lead to race conditions, as message senders (write_ext_msg() and write_msg()) check if the target is enabled before using netpoll. The sender can validate that the target is enabled, but, the netpoll might be de-allocated already, causing undesired behaviours. This patch reverses the order of operations: 1. Disable the target 2. Clean up the netpoll structure This change eliminates the potential race condition, ensuring that no messages are sent through a partially cleaned-up netpoll structure. Fixes: 2382b15bcc39 ("netconsole: take care of NETDEV_UNREGISTER event") Cc: stable@vger.kernel.org Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240712143415.1141039-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: Switch API optimizations Marcin Szycik says: Optimize the process of creating a recipe in the switch block by removing duplicate switch ID words and changing how result indexes are fitted into recipes. In many cases this can decrease the number of recipes required to add a certain set of rules, potentially allowing a more varied set of rules to be created. Total rule count will also increase, since less words will be left unused/wasted. There are only 64 rules available in total, so every one counts. After this modification, many fields and some structs became unused or were simplified, resulting in overall simpler implementation. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add tracepoint for adding and removing switch rules ice: Remove unused members from switch API ice: Optimize switch recipe creation ice: remove unused recipe bookkeeping data ice: Simplify bitmap setting in adding recipe ice: Remove reading all recipes before adding a new one ice: Remove unused struct ice_prot_lkup_ext members ==================== Link: https://patch.msgid.link/20240711181312.2019606-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge branch 'vrf-fix-source-address-selection-with-route-leak'Jakub Kicinski
Nicolas Dichtel says: ==================== vrf: fix source address selection with route leak For patch 1 and 2, I didn't find the exact commit that introduced this bug, but I suspect it has been here since the first version. I arbitrarily choose one. ==================== Link: https://patch.msgid.link/20240710081521.3809742-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14selftests: vrf_route_leaking: add local testNicolas Dichtel
The goal is to check that the source address selected by the kernel is routable when a leaking route is used. ICMP, TCP and UDP connections are tested. The symmetric topology is enough for this test. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240710081521.3809742-5-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14ipv6: take care of scope when choosing the src addrNicolas Dichtel
When the source address is selected, the scope must be checked. For example, if a loopback address is assigned to the vrf device, it must not be chosen for packets sent outside. CC: stable@vger.kernel.org Fixes: afbac6010aec ("net: ipv6: Address selection needs to consider L3 domains") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240710081521.3809742-4-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14ipv6: fix source address selection with route leakNicolas Dichtel
By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, configured in a vrf, uses an interface from another vrf (aka route leak). The original vrf does not own the selected source address. Let's add a check against the output interface and call the appropriate function to select the source address. CC: stable@vger.kernel.org Fixes: 0d240e7811c4 ("net: vrf: Implement get_saddr for IPv6") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20240710081521.3809742-3-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14ipv4: fix source address selection with route leakNicolas Dichtel
By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, configured in a vrf, uses an interface from another vrf (aka route leak). The original vrf does not own the selected source address. Let's add a check against the output interface and call the appropriate function to select the source address. CC: stable@vger.kernel.org Fixes: 8cbb512c923d ("net: Add source address lookup op for VRF") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240710081521.3809742-2-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14selftests: forwarding: devlink_lib: Wait for udev events after reloadingAmit Cohen
Lately, an additional locking was added by commit c0a40097f0bc ("drivers: core: synchronize really_probe() and dev_uevent()"). The locking protects dev_uevent() calling. This function is used to send messages from the kernel to user space. Uevent messages notify user space about changes in device states, such as when a device is added, removed, or changed. These messages are used by udev (or other similar user-space tools) to apply device-specific rules. After reloading devlink instance, udev events should be processed. This locking causes a short delay of udev events handling. One example for useful udev rule is renaming ports. 'forwading.config' can be configured to use names after udev rules are applied. Some tests run devlink_reload() and immediately use the updated names. This worked before the above mentioned commit was pushed, but now the delay of uevent messages causes that devlink_reload() returns before udev events are handled and tests fail. Adjust devlink_reload() to not assume that udev events are already processed when devlink reload is done, instead, wait for udev events to ensure they are processed before returning from the function. Without this patch: TESTS='rif_mac_profile' ./resource_scale.sh TEST: 'rif_mac_profile' 4 [ OK ] sysctl: cannot stat /proc/sys/net/ipv6/conf/swp1/disable_ipv6: No such file or directory sysctl: cannot stat /proc/sys/net/ipv6/conf/swp1/disable_ipv6: No such file or directory sysctl: cannot stat /proc/sys/net/ipv6/conf/swp2/disable_ipv6: No such file or directory sysctl: cannot stat /proc/sys/net/ipv6/conf/swp2/disable_ipv6: No such file or directory Cannot find device "swp1" Cannot find device "swp2" TEST: setup_wait_dev (: Interface swp1 does not come up.) [FAIL] With this patch: $ TESTS='rif_mac_profile' ./resource_scale.sh TEST: 'rif_mac_profile' 4 [ OK ] TEST: 'rif_mac_profile' overflow 5 [ OK ] This is relevant not only for this test. Fixes: bc7cbb1e9f4c ("selftests: forwarding: Add devlink_lib.sh") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/89367666e04b38a8993027f1526801ca327ab96a.1720709333.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge branch ↵Jakub Kicinski
'net-pse-pd-fix-possible-issues-with-a-pse-supporting-both-c33-and-podl' Kory Maincent says: ==================== net: pse-pd: Fix possible issues with a PSE supporting both c33 and PoDL Although PSE controllers supporting both c33 and PoDL are not on the market yet, we want to prevent potential issues from arising in the future. Two possible issues could occur with a PSE supporting both c33 and PoDL: - Setting the config for one type of PSE leaves the other type's config null. In this case, the PSE core would return EOPNOTSUPP, which is not the correct behavior. - Null dereference of Netlink attributes as only one of the Netlink attributes would be specified at a time. This patch series contains two patches to fix these issues. ==================== Link: https://patch.msgid.link/20240711-fix_pse_pd_deref-v3-0-edd78fc4fe42@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14net: ethtool: pse-pd: Fix possible null-derefKory Maincent
Fix a possible null dereference when a PSE supports both c33 and PoDL, but only one of the netlink attributes is specified. The c33 or PoDL PSE capabilities are already validated in the ethnl_set_pse_validate() call. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240705184116.13d8235a@kernel.org/ Fixes: 4d18e3ddf427 ("net: ethtool: pse-pd: Expand pse commands with the PSE PoE interface") Link: https://patch.msgid.link/20240711-fix_pse_pd_deref-v3-2-edd78fc4fe42@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14net: pse-pd: Do not return EOPNOSUPP if config is nullKory Maincent
For a PSE supporting both c33 and PoDL, setting config for one type of PoE leaves the other type's config null. Currently, this case returns EOPNOTSUPP, which is incorrect. Instead, we should do nothing if the configuration is empty. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Fixes: d83e13761d5b ("net: pse-pd: Use regulator framework within PSE framework") Link: https://patch.msgid.link/20240711-fix_pse_pd_deref-v3-1-edd78fc4fe42@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge tag 'ipsec-2024-07-11' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-07-11 1) Fix esp_output_tail_tcp() on unsupported ESPINTCP. From Hagar Hemdan. 2) Fix two bugs in the recently introduced SA direction separation. From Antony Antony. 3) Fix unregister netdevice hang on hardware offload. We had to add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. 4) Fix netdev reference count imbalance in xfrm_state_find. From Jianbo Liu. 5) Call xfrm_dev_policy_delete when killingi them on offloaded policies. Jianbo Liu. * tag 'ipsec-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: call xfrm_dev_policy_delete when kill policy xfrm: fix netdev reference count imbalance xfrm: Export symbol xfrm_dev_state_delete. xfrm: Fix unregister netdevice hang on hardware offload. xfrm: Log input direction mismatch error in one place xfrm: Fix input error path memory access net: esp: cleanup esp_output_tail_tcp() in case of unsupported ESPINTCP ==================== Link: https://patch.msgid.link/20240711100025.1949454-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge tag 'md-6.11-20240712' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.11/block Pull MD fixes from Song: "Changes in this set are: 1. md-cluster fixes by Heming Zhao; 2. raid1 fix by Mateusz Jończyk." * tag 'md-6.11-20240712' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md: md/raid1: set max_sectors during early return from choose_slow_rdev() md-cluster: fix no recovery job when adding/re-adding a disk md-cluster: fix hanging issue while a new disk adding
2024-07-14RDMA/mana_ib: Set correct device into ibKonstantin Taranov
Add mana_get_primary_netdev_rcu helper to get a primary netdevice for a given port. When mana is used with netvsc, the VF netdev is controlled by an upper netvsc device. In a baremetal case, the VF netdev is the primary device. Use the mana_get_primary_netdev_rcu() helper in the mana_ib to get the correct device for querying network states. Fixes: 8b184e4f1c32 ("RDMA/mana_ib: Enable RoCE on port 1") Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1720705077-322-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-07-14bnxt_re: Fix imm_data endiannessJack Wang
When map a device between servers with MLX and BCM RoCE nics, RTRS server complain about unknown imm type, and can't map the device, After more debug, it seems bnxt_re wrongly handle the imm_data, this patch fixed the compat issue with MLX for us. In off list discussion, Selvin confirmed HW is working in little endian format and all data needs to be converted to LE while providing. This patch fix the endianness for imm_data Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20240710122102.37569-1-jinpu.wang@ionos.com Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-07-14RDMA: Fix netdev tracker in ib_device_set_netdevDavid Ahern
If a netdev has already been assigned, ib_device_set_netdev needs to release the reference on the older netdev but it is mistakenly being called for the new netdev. Fix it and in the process use netdev_put to be symmetrical with the netdev_hold. Fixes: 09f530f0c6d6 ("RDMA: Add netdevice_tracker to ib_device_set_netdev()") Signed-off-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240710203310.19317-1-dsahern@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-07-13randomize_kstack: Improve stack alignment codegenKees Cook
The codgen for adding architecture-specific stack alignment to the effective alloca() usage is somewhat inefficient and allows a bit to get carried beyond the desired entropy range. This isn't really a problem, but it's unexpected and the codegen is kind of bad. Quoting Mark[1], the disassembly for arm64's invoke_syscall() looks like: // offset = raw_cpu_read(kstack_offset) mov x4, sp adrp x0, kstack_offset mrs x5, tpidr_el1 add x0, x0, #:lo12:kstack_offset ldr w0, [x0, x5] // offset = KSTACK_OFFSET_MAX(offset) and x0, x0, #0x3ff // alloca(offset) add x0, x0, #0xf and x0, x0, #0x7f0 sub sp, x4, x0 ... which in C would be: offset = raw_cpu_read(kstack_offset) offset &= 0x3ff; // [0x0, 0x3ff] offset += 0xf; // [0xf, 0x40e] offset &= 0x7f0; // [0x0, ... so when *all* bits [3:0] are 0, they'll have no impact, and when *any* of bits [3:0] are 1 they'll trigger a carry into bit 4, which could ripple all the way up and spill into bit 10. Switch the masking in KSTACK_OFFSET_MAX() to explicitly clear the bottom bits to avoid the rounding by using 0b1111110000 instead of 0b1111111111: // offset = raw_cpu_read(kstack_offset) mov x4, sp adrp x0, 0 <kstack_offset> mrs x5, tpidr_el1 add x0, x0, #:lo12:kstack_offset ldr w0, [x0, x5] // offset = KSTACK_OFFSET_MAX(offset) and x0, x0, #0x3f0 // alloca(offset) sub sp, x4, x0 Suggested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/lkml/ZnVfOnIuFl2kNWkT@J2N7QTR9R3/ [1] Link: https://lore.kernel.org/r/20240702211612.work.576-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-07-13exec: Avoid pathological argc, envc, and bprm->p valuesKees Cook
Make sure nothing goes wrong with the string counters or the bprm's belief about the stack pointer. Add checks and matching self-tests. Take special care for !CONFIG_MMU, since argmin is not exposed there. For 32-bit validation, 32-bit UML was used: $ tools/testing/kunit/kunit.py run \ --make_options CROSS_COMPILE=i686-linux-gnu- \ --make_options SUBARCH=i386 \ exec For !MMU validation, m68k was used: $ tools/testing/kunit/kunit.py run \ --arch m68k --make_option CROSS_COMPILE=m68k-linux-gnu- \ exec Link: https://lore.kernel.org/r/20240520021615.741800-2-keescook@chromium.org Link: https://lore.kernel.org/r/20240621205046.4001362-2-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-07-13execve: Keep bprm->argmin behind CONFIG_MMUKees Cook
When argmin was added in commit 655c16a8ce9c ("exec: separate MM_ANONPAGES and RLIMIT_STACK accounting"), it was intended only for validating stack limits on CONFIG_MMU[1]. All checking for reaching the limit (argmin) is wrapped in CONFIG_MMU ifdef checks, though setting argmin was not. That argmin is only supposed to be used under CONFIG_MMU was rediscovered recently[2], and I don't want to trip over this again. Move argmin's declaration into the existing CONFIG_MMU area, and add helpers functions so the MMU tests can be consolidated. Link: https://lore.kernel.org/all/20181126122307.GA1660@redhat.com [1] Link: https://lore.kernel.org/all/202406211253.7037F69@keescook/ [2] Link: https://lore.kernel.org/r/20240621205046.4001362-1-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-07-14Merge branch kvm-arm64/docs into kvmarm/nextOliver Upton
* kvm-arm64/docs: : KVM Documentation fixes, courtesy of Changyuan Lyu : : Small set of typo fixes / corrections to the KVM API documentation : relating to MSIs and arm64 VGIC UAPI. MAINTAINERS: Include documentation in KVM/arm64 entry KVM: Documentation: Correct the VGIC V2 CPU interface addr space size KVM: Documentation: Enumerate allowed value macros of `irq_type` KVM: Documentation: Fix typo `BFD` Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/nv-tcr2 into kvmarm/nextOliver Upton
* kvm-arm64/nv-tcr2: : Fixes to the handling of TCR_EL1, courtesy of Marc Zyngier : : Series addresses a couple gaps that are present in KVM (from cover : letter): : : - VM configuration: HCRX_EL2.TCR2En is forced to 1, and we blindly : save/restore stuff. : : - trap bit description and routing: none, obviously, since we make a : point in not trapping. KVM: arm64: Honor trap routing for TCR2_EL1 KVM: arm64: Make PIR{,E0}_EL1 save/restore conditional on FEAT_TCRX KVM: arm64: Make TCR2_EL1 save/restore dependent on the VM features KVM: arm64: Get rid of HCRX_GUEST_FLAGS KVM: arm64: Correctly honor the presence of FEAT_TCRX Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/nv-sve into kvmarm/nextOliver Upton
* kvm-arm64/nv-sve: : CPTR_EL2, FPSIMD/SVE support for nested : : This series brings support for honoring the guest hypervisor's CPTR_EL2 : trap configuration when running a nested guest, along with support for : FPSIMD/SVE usage at L1 and L2. KVM: arm64: Allow the use of SVE+NV KVM: arm64: nv: Add additional trap setup for CPTR_EL2 KVM: arm64: nv: Add trap description for CPTR_EL2 KVM: arm64: nv: Add TCPAC/TTA to CPTR->CPACR conversion helper KVM: arm64: nv: Honor guest hypervisor's FP/SVE traps in CPTR_EL2 KVM: arm64: nv: Load guest FP state for ZCR_EL2 trap KVM: arm64: nv: Handle CPACR_EL1 traps KVM: arm64: Spin off helper for programming CPTR traps KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state KVM: arm64: nv: Use guest hypervisor's max VL when running nested guest KVM: arm64: nv: Save guest's ZCR_EL2 when in hyp context KVM: arm64: nv: Load guest hyp's ZCR into EL1 state KVM: arm64: nv: Handle ZCR_EL2 traps KVM: arm64: nv: Forward SVE traps to guest hypervisor KVM: arm64: nv: Forward FP/ASIMD traps to guest hypervisor Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/el2-kcfi into kvmarm/nextOliver Upton
* kvm-arm64/el2-kcfi: : kCFI support in the EL2 hypervisor, courtesy of Pierre-Clément Tosi : : Enable the usage fo CONFIG_CFI_CLANG (kCFI) for hardening indirect : branches in the EL2 hypervisor. Unlike kernel support for the feature, : CFI failures at EL2 are always fatal. KVM: arm64: nVHE: Support CONFIG_CFI_CLANG at EL2 KVM: arm64: Introduce print_nvhe_hyp_panic helper arm64: Introduce esr_brk_comment, esr_is_cfi_brk KVM: arm64: VHE: Mark __hyp_call_panic __noreturn KVM: arm64: nVHE: gen-hyprel: Skip R_AARCH64_ABS32 KVM: arm64: nVHE: Simplify invalid_host_el2_vect KVM: arm64: Fix __pkvm_init_switch_pgd call ABI KVM: arm64: Fix clobbered ELR in sync abort/SError Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/ctr-el0 into kvmarm/nextOliver Upton
* kvm-arm64/ctr-el0: : Support for user changes to CTR_EL0, courtesy of Sebastian Ott : : Allow userspace to change the guest-visible value of CTR_EL0 for a VM, : so long as the requested value represents a subset of features supported : by hardware. In other words, prevent the VMM from over-promising the : capabilities of hardware. : : Make this happen by fitting CTR_EL0 into the existing infrastructure for : feature ID registers. KVM: selftests: Assert that MPIDR_EL1 is unchanged across vCPU reset KVM: arm64: nv: Unfudge ID_AA64PFR0_EL1 masking KVM: selftests: arm64: Test writes to CTR_EL0 KVM: arm64: rename functions for invariant sys regs KVM: arm64: show writable masks for feature registers KVM: arm64: Treat CTR_EL0 as a VM feature ID register KVM: arm64: unify code to prepare traps KVM: arm64: nv: Use accessors for modifying ID registers KVM: arm64: Add helper for writing ID regs KVM: arm64: Use read-only helper for reading VM ID registers KVM: arm64: Make idregs debugfs iterator search sysreg table directly KVM: arm64: Get sys_reg encoding from descriptor in idregs_debug_show() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/shadow-mmu into kvmarm/nextOliver Upton
* kvm-arm64/shadow-mmu: : Shadow stage-2 MMU support for NV, courtesy of Marc Zyngier : : Initial implementation of shadow stage-2 page tables to support a guest : hypervisor. In the author's words: : : So here's the 10000m (approximately 30000ft for those of you stuck : with the wrong units) view of what this is doing: : : - for each {VMID,VTTBR,VTCR} tuple the guest uses, we use a : separate shadow s2_mmu context. This context has its own "real" : VMID and a set of page tables that are the combination of the : guest's S2 and the host S2, built dynamically one fault at a time. : : - these shadow S2 contexts are ephemeral, and behave exactly as : TLBs. For all intent and purposes, they *are* TLBs, and we discard : them pretty often. : : - TLB invalidation takes three possible paths: : : * either this is an EL2 S1 invalidation, and we directly emulate : it as early as possible : : * or this is an EL1 S1 invalidation, and we need to apply it to : the shadow S2s (plural!) that match the VMID set by the L1 guest : : * or finally, this is affecting S2, and we need to teardown the : corresponding part of the shadow S2s, which invalidates the TLBs KVM: arm64: nv: Truely enable nXS TLBI operations KVM: arm64: nv: Add handling of NXS-flavoured TLBI operations KVM: arm64: nv: Add handling of range-based TLBI operations KVM: arm64: nv: Add handling of outer-shareable TLBI operations KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 level KVM: arm64: nv: Handle FEAT_TTL hinted TLB operations KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operations KVM: arm64: nv: Handle TLBI ALLE1{,IS} operations KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operations KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1 KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidation KVM: arm64: nv: Add Stage-1 EL2 invalidation primitives KVM: arm64: nv: Unmap/flush shadow stage 2 page tables KVM: arm64: nv: Handle shadow stage 2 page faults KVM: arm64: nv: Implement nested Stage-2 page table walk logic KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/ffa-1p1 into kvmarm/nextOliver Upton
* kvm-arm64/ffa-1p1: : Improvements to the pKVM FF-A Proxy, courtesy of Sebastian Ene : : Various minor improvements to how host FF-A calls are proxied with the : TEE, along with support for v1.1 of the protocol. KVM: arm64: Use FF-A 1.1 with pKVM KVM: arm64: Update the identification range for the FF-A smcs KVM: arm64: Add support for FFA_PARTITION_INFO_GET KVM: arm64: Trap FFA_VERSION host call in pKVM Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/misc into kvmarm/nextOliver Upton
* kvm-arm64/misc: : Miscellaneous updates : : - Provide a command-line parameter to statically control the WFx trap : selection in KVM : : - Make sysreg masks allocation accounted Revert "KVM: arm64: nv: Fix RESx behaviour of disabled FGTs with negative polarity" KVM: arm64: nv: Use GFP_KERNEL_ACCOUNT for sysreg_masks allocation KVM: arm64: nv: Fix RESx behaviour of disabled FGTs with negative polarity KVM: arm64: Add early_param to control WFx trapping Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-13Merge tag 'i2c-for-6.10-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Fixes for the I2C testunit, the Renesas R-Car driver and some MAINTAINERS corrections" * tag 'i2c-for-6.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: testunit: avoid re-issued work after read message i2c: rcar: ensure Gen3+ reset does not disturb local targets i2c: mark HostNotify target address as used i2c: testunit: correct Kconfig description MAINTAINERS: VIRTIO I2C loses a maintainer, gains a reviewer MAINTAINERS: delete entries for Thor Thayer i2c: rcar: clear NO_RXDMA flag after resetting i2c: rcar: bring hardware to known state when probing
2024-07-13Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-07-11 (net/intel) This series contains updates to most Intel network drivers. Tony removes MODULE_AUTHOR from drivers containing the entry. Simon Horman corrects a kdoc entry for i40e. Pawel adds implementation for devlink param "local_forwarding" on ice. Michal removes unneeded call, and code, for eswitch rebuild for ice. Sasha removed a no longer used field from igc. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Remove the internal 'eee_advert' field ice: remove eswitch rebuild ice: Add support for devlink local_forwarding param i40e: correct i40e_addr_to_hkey() name in kdoc net: intel: Remove MODULE_AUTHORs ==================== Link: https://patch.msgid.link/20240711201932.2019925-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13sfc: falcon: Make I2C terminology more inclusiveEaswar Hariharan
I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave" with more appropriate terms. Inspired by Wolfram's series to fix drivers/i2c/, fix the terminology for users of I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists in the specification. Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://patch.msgid.link/20240711052734.1273652-5-eahariha@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13net: phy: dp83td510: add cable testing supportOleksij Rempel
This patch implements the TDR test procedure as described in "Application Note DP83TD510E Cable Diagnostics Toolkit revC", section 3.2. The procedure was tested with "draka 08 signalkabel 2x0.8mm". The reported cable length was 5 meters more for each 20 meters of actual cable length. For instance, a 20-meter cable showed as 25 meters, and a 40-meter cable showed as 50 meters. Since other parts of the diagnostics provided by this PHY (e.g., Active Link Cable Diagnostics) require accurate cable characterization to provide proper results, this tuning can be implemented in a separate patch/interface. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> changes v2: - add comments - change post silence time to 1000ms Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240712152848.2479912-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13net: dpaa: Fix compilation WarningBreno Leitao
Remove variables that are defined and incremented but never read. This issue appeared in network tests[1] as: drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c:38:6: warning: variable 'i' set but not used [-Wunused-but-set-variable] 38 | int i = 0; | ^ Link: https://netdev.bots.linux.dev/static/nipa/870263/13729811/build_clang/stderr [1] Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20240712134817.913756-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13eth: mlx5: expose NETIF_F_NTUPLE when ARFS is compiled outJakub Kicinski
ARFS depends on NTUPLE filters, but the inverse is not true. Drivers which don't support ARFS commonly still support NTUPLE filtering. mlx5 has a Kconfig option to disable ARFS (MLX5_EN_ARFS) and does not advertise NTUPLE filters as a feature at all when ARFS is compiled out. That's not correct, ntuple filters indeed still work just fine (as long as MLX5_EN_RXNFC is enabled). This is needed to make the RSS test not skip all RSS context related testing. Acked-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240711223722.297676-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13selftests: mptcp: lib: fix shellcheck errorsMatthieu Baerts (NGI0)
It looks like we missed these two errors recently: - SC2068: Double quote array expansions to avoid re-splitting elements. - SC2145: Argument mixes string and array. Use * or separate argument. Two simple fixes, it is not supposed to change the behaviour as the variable names should not have any spaces in their names. Still, better to fix them to easily spot new issues. Fixes: f265d3119a29 ("selftests: mptcp: lib: use setup/cleanup_ns helpers") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240712-upstream-net-next-20240712-selftests-mptcp-fix-shellcheck-v1-1-1cb7180db40a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13Merge branch 'mlx5-misc-2023-07-08-sf-max-eq'Jakub Kicinski
Saeed Mahameed says: ==================== mlx5 misc 2023-07-08 (sf max eq) Link: https://patchwork.kernel.org/project/netdevbpf/patch/20240708080025.1593555-2-tariqt@nvidia.com/ ==================== Link: https://patch.msgid.link/20240712003310.355106-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13net/mlx5: Use set number of max EQsDaniel Jurgens
If a maximum number of EQs has been set for an SF, use that amount. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240712003310.355106-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13net/mlx5: Set default max eqs for SFsDaniel Jurgens
If the user hasn't configured max_io_eqs set a low default. The SF driver shouldn't try to create more than this, but FW will enforce this limit. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240712003310.355106-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13net/mlx5: Set sf_eq_usage for SF max EQsDaniel Jurgens
When setting max_io_eqs for an SF function also set the sf_eq_usage_cap. This is to indicate to the SF driver from the PF that the user has set the max io eqs via devlink. So the SF driver can later query the proper max eq value from the new cap. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240712003310.355106-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13net/mlx5: IFC updates for SF max IO EQsDaniel Jurgens
Expose a new cap sf_eq_usage. The vhca_resource_manager can write this cap, indicating the SF driver should use max_num_eqs_24b to determine how many EQs to use. Will be used in the next patch, to indicate to the SF driver from the PF that the user has set the max io eqs via devlink. So the SF driver can later query the proper max eq value from the new cap. devlink port function set pci/0000:08:00.0/32768 max_io_eqs 32 Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240712003310.355106-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13net: mvpp2: Improve data types and use min()Thorsten Blum
Change the data type of the variable freq in mvpp2_rx_time_coal_set() and mvpp2_tx_time_coal_set() to u32 because port->priv->tclk also has the data type u32. Change the data type of the function parameter clk_hz in mvpp2_usec_to_cycles() and mvpp2_cycles_to_usec() to u32 accordingly and remove the following Coccinelle/coccicheck warning reported by do_div.cocci: WARNING: do_div() does a 64-by-32 division, please consider using div64_ul instead Use min() to simplify the code and improve its readability. Compile-tested only. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240711154741.174745-1-thorsten.blum@toblux.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13net: ethtool: Monotonically increase the message sequence numberDanielle Ratson
Currently, during the module firmware flashing process, unicast notifications are sent from the kernel using the same sequence number, making it impossible for user space to track missed notifications. Monotonically increase the message sequence number, so the order of notifications could be tracked effectively. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240711080934.2071869-1-danieller@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13Merge branch 'tcp-make-simultaneous-connect-rfc-compliant'Jakub Kicinski
Kuniyuki Iwashima says: ==================== tcp: Make simultaneous connect() RFC-compliant. Patch 1 fixes an issue that BPF TCP option parser is triggered for ACK instead of SYN+ACK in the case of simultaneous connect(). Patch 2 removes an wrong assumption in tcp_ao/self-connnect tests. v2: https://lore.kernel.org/netdev/20240708180852.92919-1-kuniyu@amazon.com/ v1: https://lore.kernel.org/netdev/20240704035703.95065-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20240710171246.87533-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13selftests: tcp: Remove broken SNMP assumptions for TCP AO self-connect tests.Kuniyuki Iwashima
tcp_ao/self-connect.c checked the following SNMP stats before/after connect() to confirm that the test exercises the simultaneous connect() path. * TCPChallengeACK * TCPSYNChallenge But the stats should not be counted for self-connect in the first place, and the assumption is no longer true. Let's remove the check. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Dmitry Safonov <dima@arista.com> Link: https://patch.msgid.link/20240710171246.87533-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13tcp: Don't drop SYN+ACK for simultaneous connect().Kuniyuki Iwashima
RFC 9293 states that in the case of simultaneous connect(), the connection gets established when SYN+ACK is received. [0] TCP Peer A TCP Peer B 1. CLOSED CLOSED 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ... 6. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED 7. ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge ACK. For example, the write() syscall in the following packetdrill script fails with -EAGAIN, and wrong SNMP stats get incremented. 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8> +0 < S 0:0(0) win 1000 <mss 1000> +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8> +0 < S. 0:0(0) ack 1 win 1000 +0 write(3, ..., 100) = 100 +0 > P. 1:101(100) ack 1 -- # packetdrill cross-synack.pkt cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable) # nstat ... TcpExtTCPChallengeACK 1 0.0 TcpExtTCPSYNChallenge 1 0.0 The problem is that bpf_skops_established() is triggered by the Challenge ACK instead of SYN+ACK. This causes the bpf prog to miss the chance to check if the peer supports a TCP option that is expected to be exchanged in SYN and SYN+ACK. Let's accept a bare SYN+ACK for active-open TCP_SYN_RECV sockets to avoid such a situation. Note that tcp_ack_snd_check() in tcp_rcv_state_process() is skipped not to send an unnecessary ACK, but this could be a bit risky for net.git, so this targets for net-next. Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240710171246.87533-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13test/vsock: add install targetPeng Fan
Add install target for vsock to make Yocto easy to install the images. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20240710122728.45044-1-peng.fan@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13Merge tag '6.10-rc7-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fix from Steve French: "Small fix, also for stable" * tag '6.10-rc7-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix setting SecurityFlags to true
2024-07-13MAINTAINERS: add 5 missing tcp-related filesEric Dumazet
Following files are part of TCP stack: - net/ipv4/inet_connection_sock.c - net/ipv4/inet_hashtables.c - net/ipv4/inet_timewait_sock.c - net/ipv6/inet6_connection_sock.c - net/ipv6/inet6_hashtables.c Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240712234213.3178593-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13cifs: fix setting SecurityFlags to trueSteve French
If you try to set /proc/fs/cifs/SecurityFlags to 1 it will set them to CIFSSEC_MUST_NTLMV2 which no longer is relevant (the less secure ones like lanman have been removed from cifs.ko) and is also missing some flags (like for signing and encryption) and can even cause mount to fail, so change this to set it to Kerberos in this case. Also change the description of the SecurityFlags to remove mention of flags which are no longer supported. Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-13io_uring/net: check socket is valid in io_bind()/io_listen()Tetsuo Handa
We need to check that sock_from_file(req->file) != NULL. Reported-by: syzbot <syzbot+1e811482aa2c70afa9a0@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=1e811482aa2c70afa9a0 Fixes: 7481fd93fa0a ("io_uring: Introduce IORING_OP_BIND") Fixes: ff140cc8628a ("io_uring: Introduce IORING_OP_LISTEN") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/903da529-eaa3-43ef-ae41-d30f376c60cc@I-love.SAKURA.ne.jp [axboe: move assignment of sock to where the NULL check is] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-13Merge tag 'timers-v6.11-rc1' of ↵Thomas Gleixner
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clocksource/event driver updates from Daniel Lezcano: - Remove unnecessary local variables initialization as they will be initialized in the code path anyway right after on the ARM arch timer and the ARM global timer (Li kunyu) - Fix a race condition in the interrupt leading to a deadlock on the SH CMT driver. Note that this fix was not tested on the platform using this timer but the fix seems reasonable enough to be picked confidently (Niklas Söderlund) - Increase the rating of the gic-timer and use the configured width clocksource register on the MIPS architecture (Jiaxun Yang) - Add the DT bindings for the TMU on the Renesas platforms (Geert Uytterhoeven) - Add the DT bindings for the SOPHGO SG2002 clint on RiscV (Thomas Bonnefille) - Add the rtl-otto timer driver along with the DT bindings for the Realtek platform (Chris Packham) Link: https://lore.kernel.org/all/91cd05de-4c5d-4242-a381-3b8a4fe6a2a2@linaro.org