linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2025-05-23	netfilter: nf_tables: Prepare for handling NETDEV_REGISTER events	Phil Sutter
	Put NETDEV_UNREGISTER handling code into a switch, no functional change intended as the function is only called for that event yet. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nf_tables: Have a list of nf_hook_ops in nft_hook	Phil Sutter
	Supporting a 1:n relationship between nft_hook and nf_hook_ops is convenient since a chain's or flowtable's nft_hooks may remain in place despite matching interfaces disappearing. This stabilizes ruleset dumps in that regard and opens the possibility to claim newly added interfaces which match the spec. Also it prepares for wildcard interface specs since these will potentially match multiple interfaces. All spots dealing with hook registration are updated to handle a list of multiple nf_hook_ops, but nft_netdev_hook_alloc() only adds a single item for now to retain the old behaviour. The only expected functional change here is how vanishing interfaces are handled: Instead of dropping the respective nft_hook, only the matching nf_hook_ops are dropped. To safely remove individual ops from the list in netdev handlers, an rcu_head is added to struct nf_hook_ops so kfree_rcu() may be used. There is at least nft_flowtable_find_dev() which may be iterating through the list at the same time. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nf_tables: Pass nf_hook_ops to nft_unregister_flowtable_hook()	Phil Sutter
	The function accesses only the hook's ops field, pass it directly. This prepares for nft_hooks holding a list of nf_hook_ops in future. While at it, make use of the function in __nft_unregister_flowtable_net_hooks() as well. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nf_tables: Introduce nft_register_flowtable_ops()	Phil Sutter
	Facilitate binding and registering of a flowtable hook via a single function call. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nf_tables: Introduce nft_hook_find_ops{,_rcu}()	Phil Sutter
	Also a pretty dull wrapper around the hook->ops.dev comparison for now. Will search the embedded nf_hook_ops list in future. The ugly cast to eliminate the const qualifier will vanish then, too. Since this future list will be RCU-protected, also introduce an _rcu() variant here. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nf_tables: Introduce functions freeing nft_hook objects	Phil Sutter
	Pointless wrappers around kfree() for now, prep work for an embedded list of nf_hook_ops. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nf_tables: add packets conntrack state to debug trace info	Florian Westphal
	Add the minimal relevant info needed for userspace ("nftables monitor trace") to provide the conntrack view of the packet: - state (new, related, established) - direction (original, reply) - status (e.g., if connection is subject to dnat) - id (allows to query ctnetlink for remaining conntrack state info) Example: trace id a62 inet filter PRE_RAW packet: iif "enp0s3" ether [..] [..] trace id a62 inet filter PRE_MANGLE conntrack: ct direction original ct state new ct id 32 trace id a62 inet filter PRE_MANGLE packet: [..] [..] trace id a62 inet filter IN conntrack: ct direction original ct state new ct status dnat-done ct id 32 [..] In this case one can see that while NAT is active, the new connection isn't subject to a translation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: conntrack: make nf_conntrack_id callable without a module dependency	Florian Westphal
	While nf_conntrack_id() doesn't need any functionaliy from conntrack, it does reside in nf_conntrack_core.c -- callers add a module dependency on conntrack. Followup patch will need to compute the conntrack id from nf_tables_trace.c to include it in nf_trace messages emitted to userspace via netlink. I don't want to introduce a module dependency between nf_tables and conntrack for this. Since trace is slowpath, the added indirection is ok. One alternative is to move nf_conntrack_id to the netfilter/core.c, but I don't see a compelling reason so far. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit	Sebastian Andrzej Siewior
	nf_dup_skb_recursion is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Move nf_dup_skb_recursion to struct netdev_xmit, provide wrappers. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctx	Sebastian Andrzej Siewior
	nft_pcpu_tun_ctx is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a nft_inner_tun_ctx member (original nft_pcpu_tun_ctx) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nf_dup{4, 6}: Move duplication check to task_struct	Sebastian Andrzej Siewior
	nf_skb_duplicated is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Due to the recursion involved, the simplest change is to make it a per-task variable. Move the per-CPU variable nf_skb_duplicated to task_struct and name it in_nf_duplicate. Add it to the existing bitfield so it doesn't use additional memory. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nft_tunnel: fix geneve_opt dump	Fernando Fernandez Mancera
	When dumping a nft_tunnel with more than one geneve_opt configured the netlink attribute hierarchy should be as follow: NFTA_TUNNEL_KEY_OPTS \| \|--NFTA_TUNNEL_KEY_OPTS_GENEVE \| \| \| \|--NFTA_TUNNEL_KEY_GENEVE_CLASS \| \|--NFTA_TUNNEL_KEY_GENEVE_TYPE \| \|--NFTA_TUNNEL_KEY_GENEVE_DATA \| \|--NFTA_TUNNEL_KEY_OPTS_GENEVE \| \| \| \|--NFTA_TUNNEL_KEY_GENEVE_CLASS \| \|--NFTA_TUNNEL_KEY_GENEVE_TYPE \| \|--NFTA_TUNNEL_KEY_GENEVE_DATA \| \|--NFTA_TUNNEL_KEY_OPTS_GENEVE ... Otherwise, userspace tools won't be able to fetch the geneve options configured correctly. Fixes: 925d844696d9 ("netfilter: nft_tunnel: add support for geneve opts") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	selftests: netfilter: nft_fib.sh: add type and oif tests with and without VRFs	Florian Westphal
	Replace the existing VRF test with a more comprehensive one. It tests following combinations: - fib type (returns address type, e.g. unicast) - fib oif (route output interface index - both with and without 'iif' keyword (changes result, e.g. 'fib daddr type local' will be true when the destination address is configured on the local machine, but 'fib daddr . iif type local' will only be true when the destination address is configured on the incoming interface. Add all types of addresses to test with for both ipv4 and ipv6: - local address on the incoming interface - local address on another interface - local address on another interface thats part of a vrf - address on another host The ruleset stores obtained results from 'fib' in nftables sets and then queries the sets to check that it has the expected results. Perform one pass while packets are coming in on interface NOT part of a VRF and then again when it was added and make sure fib returns the expected routes and address types for the various addresses in the setup. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	netfilter: nf_tables: nft_fib: consistent l3mdev handling	Florian Westphal
	fib has two modes: 1. Obtain output device according to source or destination address 2. Obtain the type of the address, e.g. local, unicast, multicast. 'fib daddr type' should return 'local' if the address is configured in this netns or unicast otherwise. 'fib daddr . iif type' should return 'local' if the address is configured on the input interface or unicast otherwise, i.e. more restrictive. However, if the interface is part of a VRF, then 'fib daddr type' returns unicast even if the address is configured on the incoming interface. This is broken for both ipv4 and ipv6. In the ipv4 case, inet_dev_addr_type must only be used if the 'iif' or 'oif' (strict mode) was requested. Else inet_addr_type_dev_table() needs to be used and the correct dev argument must be passed as well so the correct fib (vrf) table is used. In the ipv6 case, the bug is similar, without strict mode, dev is NULL so .flowi6_l3mdev will be set to 0. Add a new 'nft_fib_l3mdev_master_ifindex_rcu()' helper and use that to init the .l3mdev structure member. For ipv6, use it from nft_fib6_flowi_init() which gets called from both the 'type' and the 'route' mode eval functions. This provides consistent behaviour for all modes for both ipv4 and ipv6: If strict matching is requested, the input respectively output device of the netfilter hooks is used. Otherwise, use skb->dev to obtain the l3mdev ifindex. Without this, most type checks in updated nft_fib.sh selftest fail: FAIL: did not find veth0 . 10.9.9.1 . local in fibtype4 FAIL: did not find veth0 . dead:1::1 . local in fibtype6 FAIL: did not find veth0 . dead:9::1 . local in fibtype6 FAIL: did not find tvrf . 10.0.1.1 . local in fibtype4 FAIL: did not find tvrf . 10.9.9.1 . local in fibtype4 FAIL: did not find tvrf . dead:1::1 . local in fibtype6 FAIL: did not find tvrf . dead:9::1 . local in fibtype6 FAIL: fib expression address types match (iif in vrf) (fib errounously returns 'unicast' for all of them, even though all of these addresses are local to the vrf). Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23	crypto: qat - add missing header inclusion	Arnd Bergmann
	Without this header, the build of the new qat_6xxx driver fails when CONFIG_PCI_IOV is not set: In file included from drivers/crypto/intel/qat/qat_common/adf_gen6_shared.c:7: drivers/crypto/intel/qat/qat_common/adf_gen4_pfvf.h: In function 'adf_gen4_init_pf_pfvf_ops': drivers/crypto/intel/qat/qat_common/adf_gen4_pfvf.h:13:34: error: 'adf_pfvf_comms_disabled' undeclared (first use in this function) 13 \| pfvf_ops->enable_comms = adf_pfvf_comms_disabled; \| ^~~~~~~~~~~~~~~~~~~~~~~ Fixes: 17fd7514ae68 ("crypto: qat - add qat_6xxx driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-23	crypto: api - Redo lookup on EEXIST	Herbert Xu
	When two crypto algorithm lookups occur at the same time with different names for the same algorithm, e.g., ctr(aes-generic) and ctr(aes), they will both be instantiated. However, only one of them can be registered. The second instantiation will fail with EEXIST. Avoid failing the second lookup by making it retry, but only once because there are tricky names such as gcm_base(ctr(aes),ghash) that will always fail, despite triggering instantiation and EEXIST. Reported-by: Ingo Franzki <ifranzki@linux.ibm.com> Fixes: 2825982d9d66 ("[CRYPTO] api: Added event notification") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-23	ASoC: codecs: add support for ES8375	Zhang Yi
	The driver is for codec es8375 of everest Signed-off-by: Zhang Yi <zhangyi@everest-semi.com> Link: https://patch.msgid.link/20250523025502.23214-3-zhangyi@everest-semi.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-23	ASoC: dt-bindings: Add Everest ES8375 audio CODEC	Zhang Yi
	Add device tree binding documentation for Everest ES8375 Signed-off-by: Zhang Yi <zhangyi@everest-semi.com> Link: https://patch.msgid.link/20250523025502.23214-2-zhangyi@everest-semi.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-23	Merge branch kvm-arm64/misc-6.16 into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/misc-6.16: : . : Misc changes and improvements for 6.16: : : - Add a new selftest for the SVE host state being corrupted by a guest : : - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, : ensuring that the window for interrupts is slightly bigger, and avoiding : a pretty bad erratum on the AmpereOne HW : : - Replace a couple of open-coded on/off strings with str_on_off() : : - Get rid of the pKVM memblock sorting, which now appears to be superflous : : - Drop superflous clearing of ICH_LR_EOI in the LR when nesting : : - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from : a pretty bad case of TLB corruption unless accesses to HCR_EL2 are : heavily synchronised : : - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables : in a human-friendly fashion : . KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables arm64: errata: Work around AmpereOne's erratum AC04_CPU_23 KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1 KVM: arm64: Drop sort_memblock_regions() KVM: arm64: selftests: Add test for SVE host corruption KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Replace ternary flags with str_on_off() helper Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/nv-nv into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/at-fixes-6.16 into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/at-fixes-6.16: : . : Set of fixes for Address Translation (AT) instruction emulation, : which affect the (not yet upstream) NV support. : : From the cover letter: : : "Here's a small series of fixes for KVM's implementation of address : translation (aka the AT S1* instructions), addressing a number of : issues in increasing levels of severity: : : - We misreport PAR_EL1.PTW in a number of occasions, including state : that is not possible as per the architecture definition : : - We don't handle access faults at all, and that doesn't play very : well with the rest of the VNCR stuff : : - AT S1E{0,1} from EL2 with HCR_EL2.{E2H,TGE}={1,1} will absolutely : take the host down, no questions asked" : . KVM: arm64: Don't feed uninitialised data to HCR_EL2 KVM: arm64: Teach address translation about access faults KVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E* Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/fgt-masks into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/fgt-masks: (43 commits) : . : Large rework of the way KVM deals with trap bits in conjunction with : the CPU feature registers. It now draws a direct link between which : the feature set, the system registers that need to UNDEF to match : the configuration and bits that need to behave as RES0 or RES1 in : the trap registers that are visible to the guest. : : Best of all, these definitions are mostly automatically generated : from the JSON description published by ARM under a permissive : license. : . KVM: arm64: Handle TSB CSYNC traps KVM: arm64: Add FGT descriptors for FEAT_FGT2 KVM: arm64: Allow sysreg ranges for FGT descriptors KVM: arm64: Add context-switch for FEAT_FGT2 registers KVM: arm64: Add trap routing for FEAT_FGT2 registers KVM: arm64: Add sanitisation for FEAT_FGT2 registers KVM: arm64: Add FEAT_FGT2 registers to the VNCR page KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits KVM: arm64: Allow kvm_has_feat() to take variable arguments KVM: arm64: Use FGT feature maps to drive RES0 bits KVM: arm64: Validate FGT register descriptions against RES0 masks KVM: arm64: Switch to table-driven FGU configuration KVM: arm64: Handle PSB CSYNC traps KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask KVM: arm64: Remove hand-crafted masks for FGT registers KVM: arm64: Use computed FGT masks to setup FGT registers KVM: arm64: Propagate FGT masks to the nVHE hypervisor KVM: arm64: Unconditionally configure fine-grain traps KVM: arm64: Use computed masks as sanitisers for FGT registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/mte-frac into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/mte-frac: : . : Prevent FEAT_MTE_ASYNC from being accidently exposed to a guest, : courtesy of Ben Horgan. From the cover letter: : : "The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM. : However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0 : indicates that MTE_ASYNC is supported. On a host with : ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the : MTE capability enabled will incorrectly see MTE_ASYNC advertised as : supported. This series fixes that." : . KVM: selftests: Confirm exposing MTE_frac does not break migration KVM: arm64: Make MTE_frac masking conditional on MTE capability arm64/sysreg: Expose MTE_frac so that it is visible to KVM Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/ubsan-el2: : . : Add UBSAN support to the EL2 portion of KVM, reusing most of the : existing logic provided by CONFIG_IBSAN_TRAP. : : Patches courtesy of Mostafa Saleh. : . KVM: arm64: Handle UBSAN faults KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2 ubsan: Remove regs from report_ubsan_failure() arm64: Introduce esr_is_ubsan_brk() Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/pkvm-np-thp-6.16 into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/pkvm-np-thp-6.16: (21 commits) : . : Large mapping support for non-protected pKVM guests, courtesy of : Vincent Donnefort. From the cover letter: : : "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM : np-guests, that is installing PMD-level mappings in the stage-2, : whenever the stage-1 is backed by either Hugetlbfs or THPs." : . KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section KVM: arm64: Unconditionally cross check hyp state KVM: arm64: Defer EL2 stage-1 mapping on share KVM: arm64: Move hyp state to hyp_vmemmap KVM: arm64: Introduce {get,set}_host_state() helpers KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE KVM: arm64: Fix pKVM page-tracking comments ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch 'so_passrights'	David S. Miller
	Kuniyuki Iwashima says: ==================== af_unix: Introduce SO_PASSRIGHTS. As long as recvmsg() or recvmmsg() is used with cmsg, it is not possible to avoid receiving file descriptors via SCM_RIGHTS. This series introduces a new socket option, SO_PASSRIGHTS, to allow disabling SCM_RIGHTS. The option is enabled by default. See patch 8 for background/context. This series is related to [0], but is split into a separate series, as most of the patches are specific to af_unix. The v2 of the BPF LSM extension part will be posted later, once this series is merged into net-next and has landed in bpf-next. [0]: https://lore.kernel.org/bpf/20250505215802.48449-1-kuniyu@amazon.com/ Changes: v5: * Patch 4 * Fix BPF selftest failure (setget_sockopt.c) v4: https://lore.kernel.org/netdev/20250515224946.6931-1-kuniyu@amazon.com/ * Patch 6 * Group sk->sk_scm_XXX bits by struct * Patch 9 * Remove errno handling v3: https://lore.kernel.org/netdev/20250514165226.40410-1-kuniyu@amazon.com/ * Patch 3 * Remove inline in scm.c * Patch 4 & 5 & 8 * Return -EOPNOTSUPP in getsockopt() * Patch 5 * Add CONFIG_SECURITY_NETWORK check for SO_PASSSEC * Patch 6 * Add kdoc for sk_scm_unused * Update sk_scm_XXX under lock_sock() in setsockopt() * Patch 7 * Update changelog (recent change -> aed6ecef55d7) v2: https://lore.kernel.org/netdev/20250510015652.9931-1-kuniyu@amazon.com/ * Added patch 4 & 5 to reuse sk_txrehash for scm_recv() flags v1: https://lore.kernel.org/netdev/20250508013021.79654-1-kuniyu@amazon.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23	selftest: af_unix: Test SO_PASSRIGHTS.	Kuniyuki Iwashima
	scm_rights.c has various patterns of tests to exercise GC. Let's add cases where SO_PASSRIGHTS is disabled. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23	af_unix: Introduce SO_PASSRIGHTS.	Kuniyuki Iwashima
	As long as recvmsg() or recvmmsg() is used with cmsg, it is not possible to avoid receiving file descriptors via SCM_RIGHTS. This behaviour has occasionally been flagged as problematic, as it can be (ab)used to trigger DoS during close(), for example, by passing a FUSE-controlled fd or a hung NFS fd. For instance, as noted on the uAPI Group page [0], an untrusted peer could send a file descriptor pointing to a hung NFS mount and then close it. Once the receiver calls recvmsg() with msg_control, the descriptor is automatically installed, and then the responsibility for the final close() now falls on the receiver, which may result in blocking the process for a long time. Regarding this, systemd calls cmsg_close_all() [1] after each recvmsg() to close() unwanted file descriptors sent via SCM_RIGHTS. However, this cannot work around the issue at all, because the final fput() may still occur on the receiver's side once sendmsg() with SCM_RIGHTS succeeds. Also, even filtering by LSM at recvmsg() does not work for the same reason. Thus, we need a better way to refuse SCM_RIGHTS at sendmsg(). Let's introduce SO_PASSRIGHTS to disable SCM_RIGHTS. Note that this option is enabled by default for backward compatibility. Link: https://uapi-group.org/kernel-features/#disabling-reception-of-scm_rights-for-af_unix-sockets #[0] Link: https://github.com/systemd/systemd/blob/v257.5/src/basic/fd-util.c#L612-L628 #[1] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23	af_unix: Inherit sk_flags at connect().	Kuniyuki Iwashima
	For SOCK_STREAM embryo sockets, the SO_PASS{CRED,PIDFD,SEC} options are inherited from the parent listen()ing socket. Currently, this inheritance happens at accept(), because these attributes were stored in sk->sk_socket->flags and the struct socket is not allocated until accept(). This leads to unintentional behaviour. When a peer sends data to an embryo socket in the accept() queue, unix_maybe_add_creds() embeds credentials into the skb, even if neither the peer nor the listener has enabled these options. If the option is enabled, the embryo socket receives the ancillary data after accept(). If not, the data is silently discarded. This conservative approach works for SO_PASS{CRED,PIDFD,SEC}, but would not for SO_PASSRIGHTS; once an SCM_RIGHTS with a hung file descriptor was sent, it'd be game over. To avoid this, we will need to preserve SOCK_PASSRIGHTS even on embryo sockets. Commit aed6ecef55d7 ("af_unix: Save listener for embryo socket.") made it possible to access the parent's flags in sendmsg() via unix_sk(other)->listener->sk->sk_socket->flags, but this introduces an unnecessary condition that is irrelevant for most sockets, accept()ed sockets and clients. Therefore, we moved SOCK_PASSXXX into struct sock. Let’s inherit sk->sk_scm_recv_flags at connect() to avoid receiving SCM_RIGHTS on embryo sockets created from a parent with SO_PASSRIGHTS=0. Note that the parent socket is locked in connect() so we don't need READ_ONCE() for sk_scm_recv_flags. Now, we can remove !other->sk_socket check in unix_maybe_add_creds() to avoid slow SOCK_PASS{CRED,PIDFD} handling for embryo sockets created from a parent with SO_PASS{CRED,PIDFD}=0. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23	af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.	Kuniyuki Iwashima
	As explained in the next patch, SO_PASSRIGHTS would have a problem if we assigned a corresponding bit to socket->flags, so it must be managed in struct sock. Mixing socket->flags and sk->sk_flags for similar options will look confusing, and sk->sk_flags does not have enough space on 32bit system. Also, as mentioned in commit 16e572626961 ("af_unix: dont send SCM_CREDENTIALS by default"), SOCK_PASSCRED and SOCK_PASSPID handling is known to be slow, and managing the flags in struct socket cannot avoid that for embryo sockets. Let's move SOCK_PASS{CRED,PIDFD,SEC} to struct sock. While at it, other SOCK_XXX flags in net.h are grouped as enum. Note that assign_bit() was atomic, so the writer side is moved down after lock_sock() in setsockopt(), but the bit is only read once in sendmsg() and recvmsg(), so lock_sock() is not needed there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23	net: Restrict SO_PASS{CRED,PIDFD,SEC} to AF_{UNIX,NETLINK,BLUETOOTH}.	Kuniyuki Iwashima
	SCM_CREDENTIALS and SCM_SECURITY can be recv()ed by calling scm_recv() or scm_recv_unix(), and SCM_PIDFD is only used by scm_recv_unix(). scm_recv() is called from AF_NETLINK and AF_BLUETOOTH. scm_recv_unix() is literally called from AF_UNIX. Let's restrict SO_PASSCRED and SO_PASSSEC to such sockets and SO_PASSPIDFD to AF_UNIX only. Later, SOCK_PASS{CRED,PIDFD,SEC} will be moved to struct sock and united with another field. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23	tcp: Restrict SO_TXREHASH to TCP socket.	Kuniyuki Iwashima
	sk->sk_txrehash is only used for TCP. Let's restrict SO_TXREHASH to TCP to reflect this. Later, we will make sk_txrehash a part of the union for other protocol families. Note that we need to modify BPF selftest not to get/set SO_TEREHASH for non-TCP sockets. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23	scm: Move scm_recv() from scm.h to scm.c.	Kuniyuki Iwashima
	scm_recv() has been placed in scm.h since the pre-git era for no particular reason (I think), which makes the file really fragile. For example, when you move SOCK_PASSCRED from include/linux/net.h to enum sock_flags in include/net/sock.h, you will see weird build failure due to terrible dependency. To avoid the build failure in the future, let's move scm_recv(_unix())? and its callees to scm.c. Note that only scm_recv() needs to be exported for Bluetooth. scm_send() should be moved to scm.c too, but I'll revisit later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23	af_unix: Don't pass struct socket to maybe_add_creds().	Kuniyuki Iwashima
	We will move SOCK_PASS{CRED,PIDFD,SEC} from struct socket.flags to struct sock for better handling with SOCK_PASSRIGHTS. Then, we don't need to access struct socket in maybe_add_creds(). Let's pass struct sock to maybe_add_creds() and its caller queue_oob(). While at it, we append the unix_ prefix and fix double spaces around the pid assignment. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23	af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD.	Kuniyuki Iwashima
	Currently, the same checks for SOCK_PASSCRED and SOCK_PASSPIDFD are scattered across many places. Let's centralise the bit tests to make the following changes cleaner. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23	Revert "crypto: testmgr - Add hash export format testing"	Herbert Xu
	This reverts commit 18c438b228558e05ede7dccf947a6547516fc0c7. The s390 hmac and sha3 algorithms are failing the test. Revert the change until they have been fixed. Reported-by: Ingo Franzki <ifranzki@linux.ibm.com> Link: https://lore.kernel.org/all/623a7fcb-b4cb-48e6-9833-57ad2b32a252@linux.ibm.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-23	platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI ID	Todd Brandt
	The ARL requires that the GMA and NPU devices both be in D3Hot in order for PC10 and S0iX to be achieved in S2idle. The original ARL-H/U addition to the intel_pmc_core driver attempted to do this by switching them to D3 in the init and resume calls of the intel_pmc_core driver. The problem is the ARL-H/U have a different NPU device and thus are not being properly set and thus S0iX does not work properly in ARL-H/U. This patch creates a new ARL-H specific device id that is correct and also adds the D3 fixup to the suspend callback. This way if the PCI devies drop from D3 to D0 after resume they can be corrected for the next suspend. Thus there is no dropout in S0iX. Fixes: bd820906ea9d ("platform/x86/intel/pmc: Add Arrow Lake U/H support to intel_pmc_core driver") Signed-off-by: Todd Brandt <todd.e.brandt@intel.com> Link: https://lore.kernel.org/r/a61f78be45c13f39e122dcc684b636f4b21e79a0.1747737446.git.todd.e.brandt@intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-23	mips, net: ensure that SOCK_COREDUMP is defined	Christian Brauner
	For historical reasons mips has to override the socket enum values but the defines are all the same. So simply move the ARCH_HAS_SOCKET_TYPES scope. Fixes: a9194f88782a ("coredump: add coredump socket") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23	netfs: Fix undifferentiation of DIO reads from unbuffered reads	David Howells
	On cifs, "DIO reads" (specified by O_DIRECT) need to be differentiated from "unbuffered reads" (specified by cache=none in the mount parameters). The difference is flagged in the protocol and the server may behave differently: Windows Server will, for example, mandate that DIO reads are block aligned. Fix this by adding a NETFS_UNBUFFERED_READ to differentiate this from NETFS_DIO_READ, parallelling the write differentiation that already exists. cifs will then do the right thing. Fixes: 016dc8516aec ("netfs: Implement unbuffered/DIO read support") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/3444961.1747987072@warthog.procyon.org.uk Reviewed-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> cc: Steve French <sfrench@samba.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23	i2c: mlxbf: avoid 64-bit division	Arnd Bergmann
	The 64-bit division in mlxbf_i2c_get_ticks() causes link failures when compile-testing on 32-bit machines: ERROR: modpost: "__udivdi3" [drivers/i2c/busses/i2c-mlxbf.ko] undefined! Change this to a div_u64(), which should replace the constant division with a a multiply/shift combination in the mlxbf_i2c_get_ticks(). The frequency calculation functions require a slow library call but should be used much rarer. Fixes: 053859002c20 ("i2c: mlxbf: Allow build with COMPILE_TEST") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250520152600.1975628-1-arnd@kernel.org Signed-off-by: Andi Shyti <andi@smida.it>
2025-05-23	i2c: viai2c-wmt: Replace dev_err() with dev_err_probe() in probe function	Enrico Zanda
	This simplifies the code while improving log. Signed-off-by: Enrico Zanda <e.zanda1@gmail.com> Link: https://lore.kernel.org/r/20250513210246.528370-2-e.zanda1@gmail.com Signed-off-by: Andi Shyti <andi@smida.it>
2025-05-23	i2c: designware: Don't warn about missing get_clk_rate_khz	Heikki Krogerus
	Converting the WARN_ON() to a dev_dbg() message in i2c_dw_clk_rate(). That removes the need to supply a dummy implementation for the callback (or alternatively a dummy clk device) when the fallback path is preferred where the existing values already in the clock registers are used - when a firmware has programmed the clock registers. The fallback path was introduced in commit 4fec76e0985c ("i2c: designware: Fix wrong setting for {ss,fs,hs}_{h,l}cnt registers"). Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Link: https://lore.kernel.org/r/20250513124015.2568924-1-heikki.krogerus@linux.intel.com Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-05-23	i2c: designware: Invoke runtime suspend on quick slave re-registration	Tan En De
	Replaced pm_runtime_put() with pm_runtime_put_sync_suspend() to ensure the runtime suspend is invoked immediately when unregistering a slave. This prevents a race condition where suspend was skipped when unregistering and registering slave in quick succession. For example, consider the rapid sequence of `delete_device -> new_device -> delete_device -> new_device`. In this sequence, it is observed that the dw_i2c_plat_runtime_suspend() might not be invoked after `delete_device` operation. This is because after `delete_device` operation, when the pm_runtime_put() is about to trigger suspend, the following `new_device` operation might race and cancel the suspend. If that happens, during the `new_device` operation, dw_i2c_plat_runtime_resume() is skipped (since there was no suspend), which means `i_dev->init()`, i.e. i2c_dw_init_slave(), is skipped. Since i2c_dw_init_slave() is skipped, i2c_dw_configure_fifo_slave() is skipped too, which leaves `DW_IC_INTR_MASK` unconfigured. If we inspect the interrupt mask register using devmem, it will show as zero. Example shell script to reproduce the issue: ``` #!/bin/sh SLAVE_LADDR=0x1010 SLAVE_BUS=13 NEW_DEVICE=/sys/bus/i2c/devices/i2c-$SLAVE_BUS/new_device DELETE_DEVICE=/sys/bus/i2c/devices/i2c-$SLAVE_BUS/delete_device # Create initial device echo slave-24c02 $SLAVE_LADDR > $NEW_DEVICE sleep 2 # Rapid sequence of # delete_device -> new_device -> delete_device -> new_device echo $SLAVE_LADDR > $DELETE_DEVICE echo slave-24c02 $SLAVE_LADDR > $NEW_DEVICE echo $SLAVE_LADDR > $DELETE_DEVICE echo slave-24c02 $SLAVE_LADDR > $NEW_DEVICE # Using devmem to inspect IC_INTR_MASK will show as zero ``` Signed-off-by: Tan En De <ende.tan@starfivetech.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Link: https://lore.kernel.org/r/20250412023303.378600-1-ende.tan@starfivetech.com Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-05-23	i2c-mlxbf: Improve I2C bus timing configuration	Chris Babroski
	Update the I2C bus timing configuration on BlueField to match the configuration recommended and verified by the HW team. I2C block read failures were found on BlueField 3 during communication with a device that requires the use of repeated start conditions. Testing showed that these failures were caused by the I2C transaction getting aborted early due to a short bus "timeout" configuration value. This value determines how long the clock can be held low before the I2C transaction is aborted. Upon further inspection, it was also found that other I2C bus timing configuration values used by the kernel driver do not match the configuration that is recommended by the HW team and used in the BlueField BSP I2C drivers. Signed-off-by: Chris Babroski <cbabroski@nvidia.com> Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Reviewed-by: Khalil Blaiech <kblaiech@nvidia.com> Link: https://lore.kernel.org/r/20250506193059.321345-2-cbabroski@nvidia.com Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-05-23	i2c-mlxbf: Add repeated start condition support	Chris Babroski
	Add support for SMBus repeated start conditions to the Mellanox I2C driver. This support is specifically enabled for the I2C_FUNC_SMBUS_WRITE_I2C_BLOCK implementation which is required for communication with a specific I2C device on BlueField 3. Signed-off-by: Chris Babroski <cbabroski@nvidia.com> Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Reviewed-by: Khalil Blaiech <kblaiech@nvidia.com> Link: https://lore.kernel.org/r/20250506193059.321345-1-cbabroski@nvidia.com Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-05-23	i2c: xgene-slimpro: Replace dev_err() with dev_err_probe() in probe function	Enrico Zanda
	This simplifies the code while improving log. Signed-off-by: Enrico Zanda <e.zanda1@gmail.com> Link: https://lore.kernel.org/r/20250511203920.325704-2-e.zanda1@gmail.com Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-05-23	dt-bindings: i2c: i2c-wmt: Convert to YAML	Alexey Charkov
	Rewrite the textual description for the WonderMedia I2C controller as YAML schema, and switch the filename to follow the compatible string. The controller only supports two bus speeds (100kHz and 400kHz) so restrict clock-frequency values accordingly. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Alexey Charkov <alchark@gmail.com> Link: https://lore.kernel.org/r/20250506-vt8500-i2c-binding-v3-1-401c3e090a88@gmail.com Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-05-23	i2c: microchip-corei2c: add smbus support	prashanth kumar burujukindi
	Add hardware support for the SMBUS commands smbus_quick, smbus_byte, smbus_byte_data, smbus_word_data and smbus_block_data, replacing the fallback to software emulation Signed-off-by: prashanth kumar burujukindi <prashanthkumar.burujukindi@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250430-preview-dormitory-85191523283d@spud Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-05-23	i2c: mlxbf: Allow build with COMPILE_TEST	Andi Shyti
	Extend the Kconfig dependency to include COMPILE_TEST so the Mellanox BlueField I2C driver can be built on non-ARM64 platforms for compile testing purposes. Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Cc: Khalil Blaiech <kblaiech@nvidia.com> Cc: Asmaa Mnebhi <asmaa@nvidia.com> Link: https://lore.kernel.org/r/20250505215854.2896383-1-andi.shyti@kernel.org
2025-05-23	i2c: I2C_DESIGNWARE_AMDISP should depend on DRM_AMD_ISP	Geert Uytterhoeven
	The AMD Image Signal Processor I2C functionality is only present on AMD platforms with ISP support, and its platform device is instantiated by the AMD ISP driver. Hence add a dependency on DRM_AMD_ISP, to prevent asking the user about this driver when configuring a kernel that does not support the AMD ISP. Fixes: d6263c468a76 ("i2c: amd-isp: Add ISP i2c-designware driver") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Link: https://lore.kernel.org/r/3888f892b8c4d8c8acd17e56581e726ace7f7092.1746536495.git.geert+renesas@glider.be Signed-off-by: Andi Shyti <andi.shyti@kernel.org>