git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-12-08	net/mlx5e: meter, refactor to allow multiple post meter tables	Oz Shlomo
	TC police action may configure the maximum packet size to be handled by the policer, in addition to byte/packet rate. Currently the post meter table steers the packet according to the meter aso output. Refactor the code to allow both metering and range post actions as a pre-step for adding police mtu offload support. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08	net/mlx5: DR, Add support for range match action	Yevgeny Kliteynik
	Add support for matching on range. The supported type of range is L2 frame size. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08	net/mlx5: DR, Add function that tells if STE miss addr has been initialized	Yevgeny Kliteynik
	Up until now miss address in all the STEs was used to connect miss lists and to link the last STE in the list to end anchor. Match range STE will require special handling because its miss address is part of the 'action'. That is, range action has hit and miss addresses. Since the range action is always the last action, need to make sure that its miss address isn't overwritten by the end anchor. Adding new function mlx5dr_ste_is_miss_addr_set() to answer the question whether the STE's miss address has already been set as part of STE initialization. Use a callback that always returns false right now. Once match range is added, a different callback will be used for that STE type. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08	net/mlx5: DR, Some refactoring of miss address handling	Yevgeny Kliteynik
	In preparation for MATCH RANGE STE support, create a function to set the miss address of an STE. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08	net/mlx5: DR, Manage definers with refcounts	Yevgeny Kliteynik
	In many cases different actions will ask for the same definer format. Instead of allocating new definer general object and running out of definers, have an xarray of allocated definers and keep track of their usage with refcounts: allocate a new definer only when there isn't one with the same format already created, and destroy definer only when its refcount runs down to zero. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08	net/mlx5: DR, Handle FT action in a separate function	Yevgeny Kliteynik
	As preparation for range action support, moving the handling of final ICM address for flow table action to a separate function. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08	net/mlx5: DR, Rework is_fw_table function	Yevgeny Kliteynik
	This patch handles the following two changes w.r.t. is_fw_table function: 1. When SW steering is asked to create/destroy FW table, we allow for creation/destruction of only termination tables. Rename mlx5_dr_is_fw_table both to comply with the static function naming and to reflect that we're actually checking for FW termination table. 2. When the action 'go to flow table' is created, the destination flow table can be any FW table, not only termination table. Adding function to check if the dest table is FW table. This function will also be used by the later creation of range match action, so putting it the header file. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08	net/mlx5: DR, Add functions to create/destroy MATCH_DEFINER general object	Yevgeny Kliteynik
	SW steering is able to match only on the exact values of the packet fields, as requested by the user: the user provides mask for the fields that are of interest, and the exact values to be matched on when the traffic is handled. Match Definer is a general FW object that defines which fields in the packet will be referenced by the mask and tag of each STE. Match definer ID is part of STE fields, and it defines how the HW needs to interpret the STE's mask/tag values. Till now SW steering used the definers that were managed by FW and implemented the STE layout as described by the HW spec. Now that we're adding a new type of STE, SW steering needs to define for the HW how it should interpret this new STE's layout. This is done with a programmable match definer. The programmable definer allows to selects which fields will be included in the definer, and their layout: it has up to 9 DW selectors 8 Byte selectors. Each selector indicates a DW/Byte worth of fields out of the table that is defined by HW spec by referencing the offset of the required DW/Byte. This patch adds dr_cmd function to create and destroy MATCH_DEFINER general object. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08	net/mlx5: fs, add match on ranges API	Yevgeny Kliteynik
	Range is a new flow destination type which allows matching on a range of values instead of matching on a specific value. Range flow destination has the following fields: - hit_ft: flow table to forward the traffic in case of hit - miss_ft: flow table to forward the traffic in case of miss - field: which packet characteristic to match on - min: minimal value for the selected field - max: maximal value for the selected field Note: - In order to match, the value in the packet should meet the following criteria: min <= value < max - Currently, the only supported field type is L2 packet length Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08	net/mlx5: mlx5_ifc updates for MATCH_DEFINER general object	Yevgeny Kliteynik
	Update full structure of match definer and add an ID of the SELECT match definer type. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-09	Merge tag 'amd-drm-fixes-6.1-2022-12-07' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.1-2022-12-07: amdgpu: - S0ix fix - DCN 3.2 array out of bounds fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221207222751.9558-1-alexander.deucher@amd.com
2022-12-08	Merge patch series "RISC-V interrupt controller select cleanup"	Palmer Dabbelt
	Conor Dooley <conor@kernel.org> says: From: Conor Dooley <conor.dooley@microchip.com> Submitted a patch yesterday defaulting the SiFive PLIC driver to enabled [0], and in the ensuing conversation Marc suggested just doing a select at the arch level and dropping the user selectability completely. * b4-shazam-merge: RISC-V: stop selecting SIFIVE_PLIC at the SoC level irqchip/riscv-intc: remove user selectability of RISCV_INTC irqchip/sifive-plic: remove user selectability of SIFIVE_PLIC Link: https://lore.kernel.org/r/20221118104300.85016-1-conor@kernel.org Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/all/87zgceszp8.wl-maz@kernel.org/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	RISC-V: stop selecting SIFIVE_PLIC at the SoC level	Conor Dooley
	The SIFIVE_PLIC driver is used by all current RISC-V SoCs & will be, where possible, used for future implementations. Rather than having each driver select the option on a case-by-case basis, do so at the arch level. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221118104300.85016-4-conor@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	irqchip/riscv-intc: remove user selectability of RISCV_INTC	Conor Dooley
	Since commit e71ee06e3ca3 ("RISC-V: Force select RISCV_INTC for CONFIG_RISCV") the driver has been enabled at the arch level - and is mandatory anyway. There's no point exposing this as a choice to users, so stop bothering. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221118104300.85016-3-conor@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	irqchip/sifive-plic: remove user selectability of SIFIVE_PLIC	Conor Dooley
	The SiFive PLIC driver is used by all current implementations, including those that do not have a SiFive PLIC. The current driver supports more than just SiFive PLICs at present and, where possible, future PLIC implementations will also use this driver. As every supported RISC-V SoC selects the driver directly in Kconfig.socs there's no point in exposing this kconfig option to users. The Kconfig help text, in its current form, is misleading. There's no point doing anything about that though, as it will no longer be user selectable. Remove it. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221118104300.85016-2-conor@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	Merge tag 'block-6.1-2022-12-08' of git://git.kernel.dk/linux	Linus Torvalds
	Pull block fix from Jens Axboe: "A small fix for initializing the NVMe quirks before initializing the subsystem" * tag 'block-6.1-2022-12-08' of git://git.kernel.dk/linux: nvme initialize core quirks before calling nvme_init_subsystem
2022-12-08	Merge patch series "Add PMEM support for RISC-V"	Palmer Dabbelt
	Anup Patel <apatel@ventanamicro.com> says: The Linux NVDIMM PEM drivers require arch support to map and access the persistent memory device. This series adds RISC-V PMEM support using recently added Svpbmt and Zicbom support. * b4-shazam-merge: RISC-V: Enable PMEM drivers RISC-V: Implement arch specific PMEM APIs RISC-V: Fix MEMREMAP_WB for systems with Svpbmt Link: https://lore.kernel.org/r/20221114090536.1662624-1-apatel@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	Merge tag 'io_uring-6.1-2022-12-08' of git://git.kernel.dk/linux	Linus Torvalds
	Pull io_uring fix from Jens Axboe: "A single small fix for an issue related to ordering between cancelation and current->io_uring teardown" * tag 'io_uring-6.1-2022-12-08' of git://git.kernel.dk/linux: io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
2022-12-08	RISC-V: Enable PMEM drivers	Anup Patel
	We now have PMEM arch support available in RISC-V kernel so let us enable relevant drivers in defconfig. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221114090536.1662624-4-apatel@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	RISC-V: Implement arch specific PMEM APIs	Anup Patel
	The NVDIMM PMEM driver expects arch specific APIs for cache maintenance and if arch does not provide these APIs then NVDIMM PMEM driver will always use MEMREMAP_WT to map persistent memory which in-turn maps as UC memory type defined by the RISC-V Svpbmt specification. Now that the Svpbmt and Zicbom support is available in RISC-V kernel, we implement PMEM APIs using ALT_CMO_OP() macros so that the NVDIMM PMEM driver can use MEMREMAP_WB to map persistent memory. Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221114090536.1662624-3-apatel@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	RISC-V: Fix MEMREMAP_WB for systems with Svpbmt	Anup Patel
	Currently, the memremap() called with MEMREMAP_WB maps memory using the generic ioremap() function which breaks on system with Svpbmt because memory mapped using _PAGE_IOREMAP page attributes is treated as strongly-ordered non-cacheable IO memory. To address this, we implement RISC-V specific arch_memremap_wb() which maps memory using _PAGE_KERNEL page attributes resulting in write-back cacheable mapping on systems with Svpbmt. Fixes: ff689fd21cb1 ("riscv: add RISC-V Svpbmt extension support") Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221114090536.1662624-2-apatel@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	Merge tag 'net-6.1-rc9' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, can and netfilter. Current release - new code bugs: - bonding: ipv6: correct address used in Neighbour Advertisement parsing (src vs dst typo) - fec: properly scope IRQ coalesce setup during link up to supported chips only Previous releases - regressions: - Bluetooth fixes for fake CSR clones (knockoffs): - re-add ERR_DATA_REPORTING quirk - fix crash when device is replugged - Bluetooth: - silence a user-triggerable dmesg error message - L2CAP: fix u8 overflow, oob access - correct vendor codec definition - fix support for Read Local Supported Codecs V2 - ti: am65-cpsw: fix RGMII configuration at SPEED_10 - mana: fix race on per-CQ variable NAPI work_done Previous releases - always broken: - af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(), avoid null-deref - af_can: fix NULL pointer dereference in can_rcv_filter - can: slcan: fix UAF with a freed work - can: can327: flush TX_work on ldisc .close() - macsec: add missing attribute validation for offload - ipv6: avoid use-after-free in ip6_fragment() - nft_set_pipapo: actually validate intervals in fields after the first one - mvneta: prevent oob access in mvneta_config_rss() - ipv4: fix incorrect route flushing when table ID 0 is used, or when source address is deleted - phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C" * tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing() s390/qeth: fix use-after-free in hsci macsec: add missing attribute validation for offload net: mvneta: Fix an out of bounds check net: thunderbolt: fix memory leak in tbnet_open() ipv6: avoid use-after-free in ip6_fragment() net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq() net: phy: mxl-gpy: add MDINT workaround net: dsa: mv88e6xxx: accept phy-mode = "internal" for internal PHY ports xen/netback: don't call kfree_skb() under spin_lock_irqsave() dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove() ethernet: aeroflex: fix potential skb leak in greth_init_rings() tipc: call tipc_lxc_xmit without holding node_read_lock can: esd_usb: Allow REC and TEC to return to zero can: can327: flush TX_work on ldisc .close() can: slcan: fix freed work crash can: af_can: fix NULL pointer dereference in can_rcv_filter net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions() ipv4: Fix incorrect route flushing when table ID 0 is used ipv4: Fix incorrect route flushing when source address is deleted ...
2022-12-08	Merge patch "RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path"	Palmer Dabbelt
	I'm merging this in as a single patch to make it easier to handle the backports. * b4-shazam-merge: RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path	Andrew Bresticker
	The return to userspace path in entry.S may enable interrupts without the corresponding lockdep annotation, producing a splat[0] when DEBUG_LOCKDEP is enabled. Simply calling __trace_hardirqs_on() here gets a bit messy due to the use of RA to point back to ret_from_exception, so just move the whole slow-path loop into C. It's more readable and it lets us use local_irq_{enable,disable}(), avoiding the need for manual annotations altogether. [0]: ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()) WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5512 check_flags+0x10a/0x1e0 Modules linked in: CPU: 2 PID: 1 Comm: init Not tainted 6.1.0-rc4-00160-gb56b6e2b4f31 #53 Hardware name: riscv-virtio,qemu (DT) epc : check_flags+0x10a/0x1e0 ra : check_flags+0x10a/0x1e0 <snip> status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [<ffffffff808edb90>] lock_is_held_type+0x78/0x14e [<ffffffff8003dae2>] __might_resched+0x26/0x22c [<ffffffff8003dd24>] __might_sleep+0x3c/0x66 [<ffffffff80022c60>] get_signal+0x9e/0xa70 [<ffffffff800054a2>] do_notify_resume+0x6e/0x422 [<ffffffff80003c68>] ret_from_exception+0x0/0x10 irq event stamp: 44512 hardirqs last enabled at (44511): [<ffffffff808f901c>] _raw_spin_unlock_irqrestore+0x54/0x62 hardirqs last disabled at (44512): [<ffffffff80008200>] __trace_hardirqs_off+0xc/0x14 softirqs last enabled at (44472): [<ffffffff808f9fbe>] __do_softirq+0x3de/0x51e softirqs last disabled at (44467): [<ffffffff80017760>] irq_exit+0xd6/0x104 ---[ end trace 0000000000000000 ]--- possible reason: unannotated irqs-on. Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> Fixes: 3c4697982982 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	selftests/bpf: Bring test_offload.py back to life	Stanislav Fomichev
	Bpftool has new extra libbpf_det_bind probing map we need to exclude. Also skip trying to load netdevsim modules if it's already loaded (builtin). v2: - drop iproute2->bpftool changes (Toke) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221206232739.2504890-1-sdf@google.com
2022-12-08	riscv: mm: notify remote harts about mmu cache updates	Sergey Matyukevich
	Current implementation of update_mmu_cache function performs local TLB flush. It does not take into account ASID information. Besides, it does not take into account other harts currently running the same mm context or possible migration of the running context to other harts. Meanwhile TLB flush is not performed for every context switch if ASID support is enabled. Patch [1] proposed to add ASID support to update_mmu_cache to avoid flushing local TLB entirely. This patch takes into account other harts currently running the same mm context as well as possible migration of this context to other harts. For this purpose the approach from flush_icache_mm is reused. Remote harts currently running the same mm context are informed via SBI calls that they need to flush their local TLBs. All the other harts are marked as needing a deferred TLB flush when this mm context runs on them. [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/ Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/#t Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08	Merge branch 'mlx4-better-big-tcp-support'	Jakub Kicinski
	Eric Dumazet says: ==================== mlx4: better BIG-TCP support mlx4 uses a bounce buffer in TX whenever the tx descriptors wrap around the right edge of the ring. Size of this bounce buffer was hard coded and can be increased if/when needed. ==================== Link: https://lore.kernel.org/r/20221207141237.2575012-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx4: small optimization in mlx4_en_xmit()	Eric Dumazet
	Test against MLX4_MAX_DESC_TXBBS only matters if the TX bounce buffer is going to be used. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS	Eric Dumazet
	Google production kernel has increased MAX_SKB_FRAGS to 45 for BIG-TCP rollout. Unfortunately mlx4 TX bounce buffer is not big enough whenever an skb has up to 45 page fragments. This can happen often with TCP TX zero copy, as one frag usually holds 4096 bytes of payload (order-0 page). Tested: Kernel built with MAX_SKB_FRAGS=45 ip link set dev eth0 gso_max_size 185000 netperf -t TCP_SENDFILE I made sure that "ethtool -G eth0 tx 64" was properly working, ring->full_size being set to 15. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Wei Wang <weiwan@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx4: rename two constants	Eric Dumazet
	MAX_DESC_SIZE is really the size of the bounce buffer used when reaching the right side of TX ring buffer. MAX_DESC_TXBBS get a MLX4_ prefix. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	ice: reschedule ice_ptp_wait_for_offset_valid during reset	Jacob Keller
	If the ice_ptp_wait_for_offest_valid function is scheduled to run while the driver is resetting, it will exit without completing calibration. The work function gets scheduled by ice_ptp_port_phy_restart which will be called as part of the reset recovery process. It is possible for the first execution to occur before the driver has completely cleared its resetting flags. Ensure calibration completes by rescheduling the task until reset is fully completed. Reported-by: Siddaraju DH <siddaraju.dh@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08	ice: make Tx and Rx vernier offset calibration independent	Siddaraju DH
	The Tx and Rx calibration and timestamp generation blocks are independent. However, the ice driver waits until both blocks are ready before configuring either block. This can result in delay of configuring one block because we have not yet received a packet in the other block. There is no reason to wait to finish programming Tx just because we haven't received a packet. Similarly there is no reason to wait to program Rx just because we haven't transmitted a packet. Instead of checking both offset status before programming either block, refactor the ice_phy_cfg_tx_offset_e822 and ice_phy_cfg_rx_offset_e822 functions so that they perform their own offset status checks. Additionally, make them also check the offset ready bit to determine if the offset values have already been programmed. Call the individual configure functions directly in ice_ptp_wait_for_offset_valid. The functions will now correctly check status, and program the offsets if ready. Once the offset is programmed, the functions will exit quickly after just checking the offset ready register. Remove the ice_phy_calc_vernier_e822 in ice_ptp_hw.c, as well as the offset valid check functions in ice_ptp.c entirely as they are no longer necessary. With this change, the Tx and Rx blocks will each be enabled as soon as possible without waiting for the other block to complete calibration. This can enable timestamps faster in setups which have a low rate of transmitted or received packets. In particular, it can stop a situation where one port never receives traffic, and thus never finishes calibration of the Tx block, resulting in continuous faults reported by the ptp4l daemon application. Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08	ice: only check set bits in ice_ptp_flush_tx_tracker	Jacob Keller
	The ice_ptp_flush_tx_tracker function is called to clear all outstanding Tx timestamp requests when the port is being brought down. This function iterates over the entire list, but this is unnecessary. We only need to check the bits which are actually set in the ready bitmap. Replace this logic with for_each_set_bit, and follow a similar flow as in ice_ptp_tx_tstamp_cleanup. Note that it is safe to call dev_kfree_skb_any on a NULL pointer as it will perform a no-op so we do not need to verify that the skb is actually NULL. The new implementation also avoids clearing (and thus reading!) the PHY timestamp unless the index is marked as having a valid timestamp in the timestamp status bitmap. This ensures that we properly clear the status registers as appropriate. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08	ice: handle flushing stale Tx timestamps in ice_ptp_tx_tstamp	Jacob Keller
	In the event of a PTP clock time change due to .adjtime or .settime, the ice driver needs to update the cached copy of the PHC time and also discard any outstanding Tx timestamps. This is required because otherwise the wrong copy of the PHC time will be used when extending the Tx timestamp. This could result in reporting incorrect timestamps to the stack. The current approach taken to handle this is to call ice_ptp_flush_tx_tracker, which will discard any timestamps which are not yet complete. This is problematic for two reasons: 1) it could lead to a potential race condition where the wrong timestamp is associated with a future packet. This can occur with the following flow: 1. Thread A gets request to transmit a timestamped packet, and picks an index and transmits the packet 2. Thread B calls ice_ptp_flush_tx_tracker and sees the index in use, marking is as disarded. No timestamp read occurs because the status bit is not set, but the index is released for re-use 3. Thread A gets a new request to transmit another timestamped packet, picks the same (now unused) index and transmits that packet. 4. The PHY transmits the first packet and updates the timestamp slot and generates an interrupt. 5. The ice_ptp_tx_tstamp thread executes and sees the interrupt and a valid timestamp but associates it with the new Tx SKB and not the one that actual timestamp for the packet as expected. This could result in the previous timestamp being assigned to a new packet producing incorrect timestamps and leading to incorrect behavior in PTP applications. This is most likely to occur when the packet rate for Tx timestamp requests is very high. 2) on E822 hardware, we must avoid reading a timestamp index more than once each time its status bit is set and an interrupt is generated by hardware. We do have some extensive checks for the unread flag to ensure that only one of either the ice_ptp_flush_tx_tracker or ice_ptp_tx_tstamp threads read the timestamp. However, even with this we can still have cases where we "flush" a timestamp that was actually completed in hardware. This can lead to cases where we don't read the timestamp index as appropriate. To fix both of these issues, we must avoid calling ice_ptp_flush_tx_tracker outside of the teardown path. Rather than using ice_ptp_flush_tx_tracker, introduce a new state bitmap, the stale bitmap. Start this as cleared when we begin a new timestamp request. When we're about to extend a timestamp and send it up to the stack, first check to see if that stale bit was set. If so, drop the timestamp without sending it to the stack. When we need to update the cached PHC timestamp out of band, just mark all currently outstanding timestamps as stale. This will ensure that once hardware completes the timestamp we'll ignore it correctly and avoid reporting bogus timestamps to userspace. With this change, we fix potential issues caused by calling ice_ptp_flush_tx_tracker during normal operation. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08	ice: cleanup allocations in ice_ptp_alloc_tx_tracker	Jacob Keller
	The ice_ptp_alloc_tx_tracker function must allocate the timestamp array and the bitmap for tracking the currently in use indexes. A future change is going to add yet another allocation to this function. If these allocations fail we need to ensure that we properly cleanup and ensure that the pointers in the ice_ptp_tx structure are NULL. Simplify this logic by allocating to local variables first. If any allocation fails, then free everything and exit. Only update the ice_ptp_tx structure if all allocations succeed. This ensures that we have no side effects on the Tx structure unless all allocations have succeeded. Thus, no code will see an invalid pointer and we don't need to re-assign NULL on cleanup. This is safe because kernel "free" functions are designed to be NULL safe and perform no action if passed a NULL pointer. Thus its safe to simply always call kfree or bitmap_free even if one of those pointers was NULL. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08	ice: protect init and calibrating check in ice_ptp_request_ts	Jacob Keller
	When requesting a new timestamp, the ice_ptp_request_ts function does not hold the Tx tracker lock while checking init and calibrating. This means that we might issue a new timestamp request just after the Tx timestamp tracker starts being deinitialized. This could lead to incorrect access of the timestamp structures. Correct this by moving the init and calibrating checks under the lock, and updating the flows which modify these fields to use the lock. Note that we do not need to hold the lock while checking for tx->init in ice_ptp_tx_tstamp. This is because the teardown function will use synchronize_irq after clearing the flag to ensure that the threaded interrupt completes. Either a) the tx->init flag will be cleared before the ice_ptp_tx_tstamp function starts, thus it will exit immediately, or b) the threaded interrupt will be executing and the synchronize_irq will wait until the threaded interrupt has completed at which point we know the init field has definitely been set and new interrupts will not execute the Tx timestamp thread function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08	netfilter: flowtable: add a 'default' case to flowtable datapath	Li Qiong
	Add a 'default' case in case return a uninitialized value of ret, this should not ever happen since the follow transmit path types: - FLOW_OFFLOAD_XMIT_UNSPEC - FLOW_OFFLOAD_XMIT_TC are never observed from this path. Add this check for safety reasons. Signed-off-by: Li Qiong <liqiong@nfschina.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-12-08	Merge branch 'mlx5-Support-tc-police-jump-conform-exceed-attribute'	Jakub Kicinski
	Saeed Mahameed says: ==================== Support tc police jump conform-exceed attribute The tc police action conform-exceed option defines how to handle packets which exceed or conform to the configured bandwidth limit. One of the possible conform-exceed values is jump, which skips over a specified number of actions. This series adds support for conform-exceed jump action. The series adds platform support for branching actions by providing true/false flow attributes to the branching action. This is necessary for supporting police jump, as each branch may execute a different action list. The first five patches are preparation patches: - Patches 1 and 2 add support for actions with no destinations (e.g. drop) - Patch 3 refactor the code for subsequent function reuse - Patch 4 defines an abstract way for identifying terminating actions - Patch 5 updates action list validations logic considering branching actions The following three patches introduce an interface for abstracting branching actions: - Patch 6 introduces an abstract api for defining branching actions - Patch 7 generically instantiates the branching flow attributes using the abstract API Patch 8 adds the platform support for jump actions, by executing the following sequence: a. Store the jumping flow attr b. Identify the jump target action while iterating the actions list. c. Instantiate a new flow attribute after the jump target action. This is the flow attribute that the branching action should jump to. d. Set the target post action id on: d.1. The jumping attribute, thus realizing the jump functionality. d.2. The attribute preceding the target jump attr, if not terminating. The next patches apply the platform's branching attributes to the police action: - Patch 9 is a refactor patch - Patch 10 initializes the post meter table with the red/green flow attributes, as were initialized by the platform - Patch 11 enables the offload of meter actions using jump conform-exceed value. ==================== Link: https://lore.kernel.org/all/20221203221337.29267-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5e: TC, allow meter jump control action	Oz Shlomo
	Separate the matchall police action validation from flower validation. Isolate the action validation logic in the police action parser. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-12-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5e: TC, init post meter rules with branching attributes	Oz Shlomo
	Instantiate the post meter actions with the platform initialized branching action attributes. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-11-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5e: TC, rename post_meter actions	Oz Shlomo
	Currently post meter supports only the pipe/drop conform-exceed policy. This assumption is reflected in several variable names. Rename the following variables as a pre-step for using the generalized branching action platform. Rename fwd_green_rule/drop_red_rule to green_rule/red_rule respectively. Repurpose red_counter/green_counter to act_counter/drop_counter to allow police conform-exceed configurations that do not drop. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-10-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5e: TC, initialize branching action with target attr	Oz Shlomo
	Identify the jump target action when iterating the action list. Initialize the jump target attr with the jumping attribute during the parsing phase. Initialize the jumping attr post action with the target during the offload phase. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5e: TC, initialize branch flow attributes	Oz Shlomo
	Initialize flow attribute for drop, accept, pipe and jump branching actions. Instantiate a flow attribute instance according to the specified branch control action. Store the branching attributes on the branching action flow attribute during the parsing phase. Then, during the offload phase, allocate the relevant mod header objects to the branching actions. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5e: TC, set control params for branching actions	Oz Shlomo
	Extend the act tc api to set the branch control params aligning with the police conform/exceed use case. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-7-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5e: TC, validate action list per attribute	Oz Shlomo
	Currently the entire flow action list is validate for offload limitations. For example, flow with both forward and drop actions are declared invalid due to hardware restrictions. However, a multi-table hardware model changes the limitations from a flow scope to a single flow attribute scope. Apply offload limitations to flow attributes instead of the entire flow. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-6-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5e: TC, add terminating actions	Oz Shlomo
	Extend act api to identify actions that terminate action list. Pre-step for terminating branching actions. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5e: TC, reuse flow attribute post parser processing	Oz Shlomo
	After the tc action parsing phase the flow attribute is initialized with relevant eswitch offload objects such as tunnel, vlan, header modify and counter attributes. The post processing is done both for fdb and post-action attributes. Reuse the flow attribute post parsing logic by both fdb and post-action offloads. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5: fs, assert null dest pointer when dest_num is 0	Oz Shlomo
	Currently create_flow_handle() assumes a null dest pointer when there are no destinations. This might not be the case as the caller may pass an allocated dest array while setting the dest_num parameter to 0. Assert null dest array for flow rules that have no destinations (e.g. drop rule). Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	net/mlx5e: E-Switch, handle flow attribute with no destinations	Oz Shlomo
	Rules with drop action are not required to have a destination. Currently the destination list is allocated with the maximum number of destinations and passed to the fs_core layer along with the actual number of destinations. Remove redundant passing of dest pointer when count of dest is 0. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08	cxl/region: Fix memdev reuse check	Fan Ni
	Due to a typo, the check of whether or not a memdev has already been used as a target for the region (above code piece) will always be skipped. Given a memdev with more than one HDM decoder, an interleaved region can be created that maps multiple HPAs to the same DPA. According to CXL spec 3.0 8.1.3.8.4, "Aliasing (mapping more than one Host Physical Address (HPA) to a single Device Physical Address) is forbidden." Fix this by using existing iterator for memdev reuse check. Cc: <stable@vger.kernel.org> Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Signed-off-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20221107212153.745993-1-fan.ni@samsung.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>