summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-12-06net/ncsi: Silence runtime memcpy() false positive warningKees Cook
The memcpy() in ncsi_cmd_handler_oem deserializes nca->data into a flexible array structure that overlapping with non-flex-array members (mfr_id) intentionally. Since the mem_to_flex() API is not finished, temporarily silence this warning, since it is a false positive, using unsafe_memcpy(). Reported-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/netdev/CACPK8Xdfi=OJKP0x0D1w87fQeFZ4A2DP2qzGCRcuVbpU-9=4sQ@mail.gmail.com/ Cc: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221202212418.never.837-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-06bpf: Don't use rcu_users to refcount in task kfuncsDavid Vernet
A series of prior patches added some kfuncs that allow struct task_struct * objects to be used as kptrs. These kfuncs leveraged the 'refcount_t rcu_users' field of the task for performing refcounting. This field was used instead of 'refcount_t usage', as we wanted to leverage the safety provided by RCU for ensuring a task's lifetime. A struct task_struct is refcounted by two different refcount_t fields: 1. p->usage: The "true" refcount field which task lifetime. The task is freed as soon as this refcount drops to 0. 2. p->rcu_users: An "RCU users" refcount field which is statically initialized to 2, and is co-located in a union with a struct rcu_head field (p->rcu). p->rcu_users essentially encapsulates a single p->usage refcount, and when p->rcu_users goes to 0, an RCU callback is scheduled on the struct rcu_head which decrements the p->usage refcount. Our logic was that by using p->rcu_users, we would be able to use RCU to safely issue refcount_inc_not_zero() a task's rcu_users field to determine if a task could still be acquired, or was exiting. Unfortunately, this does not work due to p->rcu_users and p->rcu sharing a union. When p->rcu_users goes to 0, an RCU callback is scheduled to drop a single p->usage refcount, and because the fields share a union, the refcount immediately becomes nonzero again after the callback is scheduled. If we were to split the fields out of the union, this wouldn't be a problem. Doing so should also be rather non-controversial, as there are a number of places in struct task_struct that have padding which we could use to avoid growing the structure by splitting up the fields. For now, so as to fix the kfuncs to be correct, this patch instead updates bpf_task_acquire() and bpf_task_release() to use the p->usage field for refcounting via the get_task_struct() and put_task_struct() functions. Because we can no longer rely on RCU, the change also guts the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions pending a resolution on the above problem. In addition, the task fixes the kfunc and rcu_read_lock selftests to expect this new behavior. Fixes: 90660309b0c7 ("bpf: Add kfuncs for storing struct task_struct * as a kptr") Fixes: fca1aa75518c ("bpf: Handle MEM_RCU type properly") Reported-by: Matus Jokay <matus.jokay@stuba.sk> Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221206210538.597606-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-06Merge branch 'BPF selftests fixes'Andrii Nakryiko
Daan De Meyer says: ==================== This patch series fixes a few issues I've found while integrating the bpf selftests into systemd's mkosi development environment. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-12-06selftests/bpf: Use CONFIG_TEST_BPF=m instead of CONFIG_TEST_BPF=yDaan De Meyer
CONFIG_TEST_BPF can only be a module, so let's indicate it as such in the selftests config. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221205131618.1524337-4-daan.j.demeyer@gmail.com
2022-12-06selftests/bpf: Use "is not set" instead of "=n"Daan De Meyer
"=n" is not valid kconfig syntax. Use "is not set" instead to indicate the option should be disabled. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221205131618.1524337-3-daan.j.demeyer@gmail.com
2022-12-06selftests/bpf: Install all required files to run selftestsDaan De Meyer
When installing the selftests using "make -C tools/testing/selftests install", we need to make sure all the required files to run the selftests are installed. Let's make sure this is the case. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221205131618.1524337-2-daan.j.demeyer@gmail.com
2022-12-06libbpf: Parse usdt args without offset on x86 (e.g. 8@(%rsp))Timo Hunziker
Parse USDT arguments like "8@(%rsp)" on x86. These are emmited by SystemTap. The argument syntax is similar to the existing "memory dereference case" but the offset left out as it's zero (i.e. read the value from the address in the register). We treat it the same as the the "memory dereference case", but set the offset to 0. I've tested that this fixes the "unrecognized arg #N spec: 8@(%rsp).." error I've run into when attaching to a probe with such an argument. Attaching and reading the correct argument values works. Something similar might be needed for the other supported architectures. [0] Closes: https://github.com/libbpf/libbpf/issues/559 Signed-off-by: Timo Hunziker <timo.hunziker@gmx.ch> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221203123746.2160-1-timo.hunziker@eclipso.ch
2022-12-07ata: libahci_platform: ahci_platform_find_clk: oops, NULL pointerAnders Roxell
When booting a arm 32-bit kernel with config CONFIG_AHCI_DWC enabled on a am57xx-evm board. This happens when the clock references are unnamed in DT, the strcmp() produces a NULL pointer dereference, see the following oops, NULL pointer dereference: [ 4.673950] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 4.682098] [00000000] *pgd=00000000 [ 4.685699] Internal error: Oops: 5 [#1] SMP ARM [ 4.690338] Modules linked in: [ 4.693420] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc7 #1 [ 4.699615] Hardware name: Generic DRA74X (Flattened Device Tree) [ 4.705749] PC is at strcmp+0x0/0x34 [ 4.709350] LR is at ahci_platform_find_clk+0x3c/0x5c [ 4.714416] pc : [<c130c494>] lr : [<c0c230e0>] psr: 20000013 [ 4.720703] sp : f000dda8 ip : 00000001 fp : c29b1840 [ 4.725952] r10: 00000020 r9 : c1b23380 r8 : c1b23368 [ 4.731201] r7 : c1ab4cc4 r6 : 00000001 r5 : c3c66040 r4 : 00000000 [ 4.737762] r3 : 00000080 r2 : 00000080 r1 : c1ab4cc4 r0 : 00000000 [...] [ 4.998870] strcmp from ahci_platform_find_clk+0x3c/0x5c [ 5.004302] ahci_platform_find_clk from ahci_dwc_probe+0x1f0/0x54c [ 5.010589] ahci_dwc_probe from platform_probe+0x64/0xc0 [ 5.016021] platform_probe from really_probe+0xe8/0x41c [ 5.021362] really_probe from __driver_probe_device+0xa4/0x204 [ 5.027313] __driver_probe_device from driver_probe_device+0x38/0xc8 [ 5.033782] driver_probe_device from __driver_attach+0xb4/0x1ec [ 5.039825] __driver_attach from bus_for_each_dev+0x78/0xb8 [ 5.045532] bus_for_each_dev from bus_add_driver+0x17c/0x220 [ 5.051300] bus_add_driver from driver_register+0x90/0x124 [ 5.056915] driver_register from do_one_initcall+0x48/0x1e8 [ 5.062591] do_one_initcall from kernel_init_freeable+0x1cc/0x234 [ 5.068817] kernel_init_freeable from kernel_init+0x20/0x13c [ 5.074584] kernel_init from ret_from_fork+0x14/0x2c [ 5.079681] Exception stack(0xf000dfb0 to 0xf000dff8) [ 5.084747] dfa0: 00000000 00000000 00000000 00000000 [ 5.092956] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 5.101165] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 5.107818] Code: e5e32001 e3520000 1afffffb e12fff1e (e4d03001) [ 5.114013] ---[ end trace 0000000000000000 ]--- Add an extra check in the if-statement if hpriv-clks[i].id. Fixes: 6ce73f3a6fc0 ("ata: libahci_platform: Add function returning a clock-handle by id") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-12-06selftests/bpf: Allow building bpf tests with CONFIG_XFRM_INTERFACE=[m|n]Martin KaFai Lau
It is useful to use vmlinux.h in the xfrm_info test like other kfunc tests do. In particular, it is common for kfunc bpf prog that requires to use other core kernel structures in vmlinux.h Although vmlinux.h is preferred, it needs a ___local flavor of struct bpf_xfrm_info in order to build the bpf selftests when CONFIG_XFRM_INTERFACE=[m|n]. Cc: Eyal Birger <eyal.birger@gmail.com> Fixes: 90a3a05eb33f ("selftests/bpf: add xfrm_info tests") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20221206193554.1059757-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-06bpftool: Fix memory leak in do_build_table_cbMiaoqian Lin
strdup() allocates memory for path. We need to release the memory in the following error path. Add free() to avoid memory leak. Fixes: 8f184732b60b ("bpftool: Switch to libbpf's hashmap for pinned paths of BPF objects") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221206071906.806384-1-linmq006@gmail.com
2022-12-06riscv, bpf: Emit fixed-length instructions for BPF_PSEUDO_FUNCPu Lehui
For BPF_PSEUDO_FUNC instruction, verifier will refill imm with correct addresses of bpf_calls and then run last pass of JIT. Since the emit_imm of RV64 is variable-length, which will emit appropriate length instructions accorroding to the imm, it may broke ctx->offset, and lead to unpredictable problem, such as inaccurate jump. So let's fix it with fixed-length instructions. Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Suggested-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20221206091410.1584784-1-pulehui@huaweicloud.com
2022-12-06Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Revert the dropping of the cache invalidation from the arm64 arch_dma_prep_coherent() as it caused a regression in the qcom_q6v5_mss remoteproc driver. The driver is already buggy but the original arm64 change made the problem obvious. The change will be re-introduced once the driver is fixed" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
2022-12-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Unless anything comes from the ARM side, this should be the last pull request for this release - and it's mostly documentation: - Document the interaction between KVM_CAP_HALT_POLL and halt_poll_ns - s390: fix multi-epoch extension in nested guests - x86: fix uninitialized variable on nested triple fault" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Document the interaction between KVM_CAP_HALT_POLL and halt_poll_ns KVM: Move halt-polling documentation into common directory KVM: x86: fix uninitialized variable use on KVM_REQ_TRIPLE_FAULT KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field
2022-12-06Merge tag 'for-linus-xsa-6.1-rc9-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two zero-day fixes for the xen-netback driver (XSA-423 and XSA-424)" * tag 'for-linus-xsa-6.1-rc9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/netback: don't call kfree_skb() with interrupts disabled xen/netback: Ensure protocol headers don't fall in the non-linear area
2022-12-06Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"Will Deacon
This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. Although the semantics of the DMA API require only a clean operation here, it turns out that the Qualcomm 'qcom_q6v5_mss' remoteproc driver (ab)uses the DMA API for transferring the modem firmware to the secure world via calls to Trustzone [1]. Once the firmware buffer has changed hands, _any_ access from the non-secure side (i.e. Linux) will be detected on the bus and result in a full system reset [2]. Although this is possible even with this revert in place (due to speculative reads via the cacheable linear alias of memory), anecdotally the problem occurs considerably more frequently when the lines have not been invalidated, assumedly due to some micro-architectural interactions with the cache hierarchy. Revert the offending change for now, along with a comment, so that the Qualcomm developers have time to fix the driver [3] to use a firmware buffer which does not have a cacheable alias in the linear map. Link: https://lore.kernel.org/r/20221114110329.68413-1-manivannan.sadhasivam@linaro.org [1] Link: https://lore.kernel.org/r/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ [2] Link: https://lore.kernel.org/r/20221206092152.GD15486@thinkpad [2] Reported-by: Amit Pundir <amit.pundir@linaro.org> Reported-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Sibi Sankar <quic_sibis@quicinc.com> Signed-off-by: Will Deacon <will@kernel.org> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20221206103403.646-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-12-06xen/netback: don't call kfree_skb() with interrupts disabledJuergen Gross
It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. So remove kfree_skb() from the spin_lock_irqsave() section and use the already existing "drop" label in xenvif_start_xmit() for dropping the SKB. At the same time replace the dev_kfree_skb() call there with a call of dev_kfree_skb_any(), as xenvif_start_xmit() can be called with disabled interrupts. This is XSA-424 / CVE-2022-42328 / CVE-2022-42329. Fixes: be81992f9086 ("xen/netback: don't queue unlimited number of packages") Reported-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2022-12-06xen/netback: Ensure protocol headers don't fall in the non-linear areaRoss Lagerwall
In some cases, the frontend may send a packet where the protocol headers are spread across multiple slots. This would result in netback creating an skb where the protocol headers spill over into the non-linear area. Some drivers and NICs don't handle this properly resulting in an interface reset or worse. This issue was introduced by the removal of an unconditional skb pull in the tx path to improve performance. Fix this without reintroducing the pull by setting up grant copy ops for as many slots as needed to reach the XEN_NETBACK_TX_COPY_LEN size. Adjust the rest of the code to handle multiple copy operations per skb. This is XSA-423 / CVE-2022-3643. Fixes: 7e5d7753956b ("xen-netback: remove unconditional __pskb_pull_tail() in guest Tx path") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2022-12-06net/mlx5e: Generalize creation of default IPsec miss group and ruleLeon Romanovsky
Create general function that sets miss group and rule to forward all not-matched traffic to the next table. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Group IPsec miss handles into separate structLeon Romanovsky
Move miss handles into dedicated struct, so we can reuse it in next patch when creating IPsec policy flow table. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Make clear what IPsec rx_err doesLeon Romanovsky
Reuse existing struct what holds all information about modify header pointer and rule. This helps to reduce ambiguity from the name _err_ that doesn't describe the real purpose of that flow table, rule and function - to copy status result from HW to the stack. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Flatten the IPsec RX add rule pathLeon Romanovsky
Rewrote the IPsec RX add rule path to be less convoluted and don't rely on pre-initialized variables. The code now has clean linear flow with clean separation between error and success paths. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Refactor FTE setup code to be more clearLeon Romanovsky
The policy offload logic needs to set flow steering rule that match on saddr and daddr too, so factor out this code to separate functions, together with code alignment to netdev coding pattern of relying on family type. As part of this change, let's separate more logic from setup_fte_common to make sure that the function names describe that is done in the function better than general *common* name. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Move IPsec flow table creation to separate functionLeon Romanovsky
Even now, to support IPsec crypto, the RX and TX paths use same logic to create flow tables. In the following patches, we will add more tables to support IPsec packet offload. So reuse existing code and rewrite it to support IPsec packet offload from the beginning. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Create hardware IPsec packet offload objectsLeon Romanovsky
Create initial hardware IPsec packet offload object and connect it to advanced steering operation (ASO) context and queue, so the data path can communicate with the stack. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Create Advanced Steering Operation object for IPsecLeon Romanovsky
Setup the ASO (Advanced Steering Operation) object that is needed for IPsec to interact with SW stack about various fast changing events: replay window, lifetime limits, e.t.c Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Remove accesses to priv for low level IPsec FS codeLeon Romanovsky
mlx5 priv structure is driver main structure that holds high level data. That information is not needed for IPsec flow steering logic and the pointer to mlx5e_priv was not supposed to be passed in the first place. This change "cleans" the logic to rely on internal to IPsec structures without touching global mlx5e_priv. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Use mlx5 print routines for low level IPsec codeLeon Romanovsky
Low level mlx5 code needs to use mlx5_core print routines and not netdev ones, as the failures are relevant to the HW itself and not to its netdev. This change allows us to remove access to mlx5 priv structure, which holds high level driver data that isn't needed for mlx5 IPsec code. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Create symmetric IPsec RX and TX flow steering structsLeon Romanovsky
Remove AF family obfuscation by creating symmetric structs for RX and TX IPsec flow steering chains. This simplifies to us low level IPsec FS creation logic without need to dig into multiple levels of structs. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Remove extra layers of definesLeon Romanovsky
Instead of performing redefinition of XFRM core defines to same values but with MLX5_* prefix, cache the input values as is by making sure that the proper storage objects are used. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Store replay window in XFRM attributesLeon Romanovsky
As a preparation for future extension of IPsec hardware object to allow configuration of packet offload mode, extend the XFRM validator to check replay window values. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5e: Advertise IPsec packet offload supportLeon Romanovsky
Add needed capabilities check to determine if device supports IPsec packet offload mode. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5: Add HW definitions for IPsec packet offloadLeon Romanovsky
Add all needed bits to support IPsec packet offload mode. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06net/mlx5: Return ready to use ASO WQELeon Romanovsky
There is no need in hiding returned ASO WQE type by providing void*, use the real type instead. Do it together with zeroing that memory, so ASO WQE will be ready to use immediately. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06Merge branch 'Extend XFRM core to allow packet offload configuration'Steffen Klassert
Leon Romanovsky says: ============ The following series extends XFRM core code to handle a new type of IPsec offload - packet offload. In this mode, the HW is going to be responsible for the whole data path, so both policy and state should be offloaded. IPsec packet offload is an improved version of IPsec crypto mode, In packet mode, HW is responsible to trim/add headers in addition to decrypt/encrypt. In this mode, the packet arrives to the stack as already decrypted and vice versa for TX (exits to HW as not-encrypted). Devices that implement IPsec packet offload mode offload policies too. In the RX path, it causes the situation that HW can't effectively handle mixed SW and HW priorities unless users make sure that HW offloaded policies have higher priorities. It means that we don't need to perform any search of inexact policies and/or priority checks if HW policy was discovered. In such situation, the HW will catch the packets anyway and HW can still implement inexact lookups. In case specific policy is not found, we will continue with packet lookup and check for existence of HW policies in inexact list. HW policies are added to the head of SPD to ensure fast lookup, as XFRM iterates over all policies in the loop. This simple solution allows us to achieve same benefits of separate HW/SW policies databases without over-engineering the code to iterate and manage two databases at the same path. To not over-engineer the code, HW policies are treated as SW ones and don't take into account netdev to allow reuse of the same priorities for policies databases without over-engineering the code to iterate and manage two databases at the same path. To not over-engineer the code, HW policies are treated as SW ones and don't take into account netdev to allow reuse of the same priorities for different devices. * No software fallback * Fragments are dropped, both in RX and TX * No sockets policies * Only IPsec transport mode is implemented ================================================================================ Rekeying: In order to support rekeying, as XFRM core is skipped, the HW/driver should do the following: * Count the handled packets * Raise event that limits are reached * Drop packets once hard limit is occurred. The XFRM core calls to newly introduced xfrm_dev_state_update_curlft() function in order to perform sync between device statistics and internal structures. On HW limit event, driver calls to xfrm_state_check_expire() to allow XFRM core take relevant decisions. This separation between control logic (in XFRM) and data plane allows us to packet reuse SW stack. ================================================================================ Configuration: iproute2: https://lore.kernel.org/netdev/cover.1652179360.git.leonro@nvidia.com/ Packet offload mode: ip xfrm state offload packet dev <if-name> dir <in|out> ip xfrm policy .... offload packet dev <if-name> Crypto offload mode: ip xfrm state offload crypto dev <if-name> dir <in|out> or (backward compatibility) ip xfrm state offload dev <if-name> dir <in|out> ================================================================================ Performance results: TCP multi-stream, using iperf3 instance per-CPU. +----------------------+--------+--------+--------+--------+---------+---------+ | | 1 CPU | 2 CPUs | 4 CPUs | 8 CPUs | 16 CPUs | 32 CPUs | | +--------+--------+--------+--------+---------+---------+ | | BW (Gbps) | +----------------------+--------+--------+-------+---------+---------+---------+ | Baseline | 27.9 | 59 | 93.1 | 92.8 | 93.7 | 94.4 | +----------------------+--------+--------+-------+---------+---------+---------+ | Software IPsec | 6 | 11.9 | 23.3 | 45.9 | 83.8 | 91.8 | +----------------------+--------+--------+-------+---------+---------+---------+ | IPsec crypto offload | 15 | 29.7 | 58.5 | 89.6 | 90.4 | 90.8 | +----------------------+--------+--------+-------+---------+---------+---------+ | IPsec packet offload | 28 | 57 | 90.7 | 91 | 91.3 | 91.9 | +----------------------+--------+--------+-------+---------+---------+---------+ IPsec packet offload mode behaves as baseline and reaches linerate with same amount of CPUs. Setups details (similar for both sides): * NIC: ConnectX6-DX dual port, 100 Gbps each. Single port used in the tests. * CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz ================================================================================ Series together with mlx5 part: https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=xfrm-next ================================================================================ Changelog: v10: * Added forgotten xdo_dev_state_del. Patch #4. * Moved changelog in cover letter to the end. * Added "if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {" line to newly added netronome IPsec support. Patch #2. v9: https://lore.kernel.org/all/cover.1669547603.git.leonro@nvidia.com * Added acquire support v8: https://lore.kernel.org/all/cover.1668753030.git.leonro@nvidia.com * Removed not-related blank line * Fixed typos in documentation v7: https://lore.kernel.org/all/cover.1667997522.git.leonro@nvidia.com As was discussed in IPsec workshop: * Renamed "full offload" to be "packet offload". * Added check that offloaded SA and policy have same device while sending packet * Added to SAD same optimization as was done for SPD to speed-up lookups. v6: https://lore.kernel.org/all/cover.1666692948.git.leonro@nvidia.com * Fixed misplaced "!" in sixth patch. v5: https://lore.kernel.org/all/cover.1666525321.git.leonro@nvidia.com * Rebased to latest ipsec-next. * Replaced HW priority patch with solution which mimics separated SPDs for SW and HW. See more description in this cover letter. * Dropped RFC tag, usecase, API and implementation are clear. v4: https://lore.kernel.org/all/cover.1662295929.git.leonro@nvidia.com * Changed title from "PATCH" to "PATCH RFC" per-request. * Added two new patches: one to update hard/soft limits and another initial take on documentation. * Added more info about lifetime/rekeying flow to cover letter, see relevant section. * perf traces for crypto mode will come later. v3: https://lore.kernel.org/all/cover.1661260787.git.leonro@nvidia.com * I didn't hear any suggestion what term to use instead of "packet offload", so left it as is. It is used in commit messages and documentation only and easy to rename. * Added performance data and background info to cover letter * Reused xfrm_output_resume() function to support multiple XFRM transformations * Add PMTU check in addition to driver .xdo_dev_offload_ok validation * Documentation is in progress, but not part of this series yet. v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com * Rebased to latest 6.0-rc1 * Add an extra check in TX datapath patch to validate packets before forwarding to HW. * Added policy cleanup logic in case of netdev down event v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com * Moved comment to be before if (...) in third patch. v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com ----------------------------------------------------------------------- ============ Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06Merge branch 'net-lan966x-enable-ptp-on-bridge-interfaces'Paolo Abeni
Horatiu Vultur says: ==================== net: lan966x: Enable PTP on bridge interfaces Before it was not allowed to run ptp on ports that are part of a bridge because in case of transparent clock the HW will still forward the frames so there would be duplicate frames. Now that there is VCAP support, it is possible to add entries in the VCAP to trap frames to the CPU and the CPU will forward these frames. The first part of the patch series, extends the VCAP support to be able to modify and get the rule, while the last patch uses the VCAP to trap the ptp frames. ==================== Link: https://lore.kernel.org/r/20221203104348.1749811-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: lan966x: Add ptp trap rulesHoratiu Vultur
Currently lan966x, doesn't allow to run PTP over interfaces that are part of the bridge. The reason is when the lan966x was receiving a PTP frame (regardless if L2/IPv4/IPv6) the HW it would flood this frame. Now that it is possible to add VCAP rules to the HW, such to trap these frames to the CPU, it is possible to run PTP also over interfaces that are part of the bridge. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: microchip: vcap: Add vcap_rule_get_key_u32Horatiu Vultur
Add the function vcap_rule_get_key_u32 which allows to get the value and the mask of a key that exist on the rule. If the key doesn't exist, it would return error. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: microchip: vcap: Add vcap_mod_ruleHoratiu Vultur
Add the function vcap_mod_rule which allows to update an existing rule in the vcap. It is required for the rule to exist in the vcap to be able to modify it. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: microchip: vcap: Add vcap_get_ruleHoratiu Vultur
Add function vcap_get_rule which returns a rule based on the internal rule id. The entire functionality of reading and decoding the rule from the VCAP was inside vcap_api_debugfs file. So move the entire implementation in vcap_api as this is used also by vcap_get_rule. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06tipc: Fix potential OOB in tipc_link_proto_rcv()YueHaibing
Fix the potential risk of OOB if skb_linearize() fails in tipc_link_proto_rcv(). Fixes: 5cbb28a4bf65 ("tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20221203094635.29024-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: hisilicon: Fix potential use-after-free in hix5hd2_rx()Liu Jian
The skb is delivered to napi_gro_receive() which may free it, after calling this, dereferencing skb may trigger use-after-free. Fixes: 57c5bc9ad7d7 ("net: hisilicon: add hix5hd2 mac driver") Signed-off-by: Liu Jian <liujian56@huawei.com> Link: https://lore.kernel.org/r/20221203094240.1240211-2-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: mdio: fix unbalanced fwnode reference count in mdio_device_release()Zeng Heng
There is warning report about of_node refcount leak while probing mdio device: OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /spi/soc@0/mdio@710700c0/ethernet@4 In of_mdiobus_register_device(), we increase fwnode refcount by fwnode_handle_get() before associating the of_node with mdio device, but it has never been decreased in normal path. Since that, in mdio_device_release(), it needs to call fwnode_handle_put() in addition instead of calling kfree() directly. After above, just calling mdio_device_free() in the error handle path of of_mdiobus_register_device() is enough to keep the refcount balanced. Fixes: a9049e0c513c ("mdio: Add support for mdio drivers.") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20221203073441.3885317-1-zengheng4@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: hisilicon: Fix potential use-after-free in hisi_femac_rx()Liu Jian
The skb is delivered to napi_gro_receive() which may free it, after calling this, dereferencing skb may trigger use-after-free. Fixes: 542ae60af24f ("net: hisilicon: Add Fast Ethernet MAC driver") Signed-off-by: Liu Jian <liujian56@huawei.com> Link: https://lore.kernel.org/r/20221203094240.1240211-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: thunderx: Fix missing destroy_workqueue of nicvf_rx_mode_wqYongqiang Liu
The nicvf_probe() won't destroy workqueue when register_netdev() failed. Add destroy_workqueue err handle case to fix this issue. Fixes: 2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them.") Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20221203094125.602812-1-liuyongqiang13@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06ravb: Fix potential use-after-free in ravb_rx_gbeth()YueHaibing
The skb is delivered to napi_gro_receive() which may free it, after calling this, dereferencing skb may trigger use-after-free. Fixes: 1c59eb678cbd ("ravb: Fillup ravb_rx_gbeth() stub") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20221203092941.10880-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: microchip: sparx5: Fix missing destroy_workqueue of mact_queueQiheng Lin
The mchp_sparx5_probe() won't destroy workqueue created by create_singlethread_workqueue() in sparx5_start() when later inits failed. Add destroy_workqueue in the cleanup_ports case, also add it in mchp_sparx5_remove() Fixes: b37a1bae742f ("net: sparx5: add mactable support") Signed-off-by: Qiheng Lin <linqiheng@huawei.com> Link: https://lore.kernel.org/r/20221203070259.19560-1-linqiheng@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06ip_gre: do not report erspan version on GRE interfaceHangbin Liu
Although the type I ERSPAN is based on the barebones IP + GRE encapsulation and no extra ERSPAN header. Report erspan version on GRE interface looks unreasonable. Fix this by separating the erspan and gre fill info. IPv6 GRE does not have this info as IPv6 only supports erspan version 1 and 2. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: f989d546a2d5 ("erspan: Add type I version 0 support.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Link: https://lore.kernel.org/r/20221203032858.3130339-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: wwan: iosm: fix memory leak in ipc_mux_init()Zhengchao Shao
When failed to alloc ipc_mux->ul_adb.pp_qlt in ipc_mux_init(), ipc_mux is not released. Fixes: 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com> Link: https://lore.kernel.org/r/20221203020903.383235-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: mana: Fix race on per-CQ variable napi work_doneHaiyang Zhang
After calling napi_complete_done(), the NAPIF_STATE_SCHED bit may be cleared, and another CPU can start napi thread and access per-CQ variable, cq->work_done. If the other thread (for example, from busy_poll) sets it to a value >= budget, this thread will continue to run when it should stop, and cause memory corruption and panic. To fix this issue, save the per-CQ work_done variable in a local variable before napi_complete_done(), so it won't be corrupted by a possible concurrent thread after napi_complete_done(). Also, add a flag bit to advertise to the NIC firmware: the NAPI work_done variable race is fixed, so the driver is able to reliably support features like busy_poll. Cc: stable@vger.kernel.org Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/1670010190-28595-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-06net: stmmac: fix "snps,axi-config" node property parsingJisheng Zhang
In dt-binding snps,dwmac.yaml, some properties under "snps,axi-config" node are named without "axi_" prefix, but the driver expects the prefix. Since the dt-binding has been there for a long time, we'd better make driver match the binding for compatibility. Fixes: afea03656add ("stmmac: rework DMA bus setting and introduce new platform AXI structure") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/r/20221202161739.2203-1-jszhang@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>