summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-01net: add gso_ipv4_max_size and gro_ipv4_max_size per deviceXin Long
This patch introduces gso_ipv4_max_size and gro_ipv4_max_size per device and adds netlink attributes for them, so that IPV4 BIG TCP can be guarded by a separate tunable in the next patch. To not break the old application using "gso/gro_max_size" for IPv4 GSO packets, this patch updates "gso/gro_ipv4_max_size" in netif_set_gso/gro_max_size() if the new size isn't greater than GSO_LEGACY_MAX_SIZE, so that nothing will change even if userspace doesn't realize the new netlink attributes. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01packet: add TP_STATUS_GSO_TCP for tp_statusXin Long
Introduce TP_STATUS_GSO_TCP tp_status flag to tell the af_packet user that this is a TCP GSO packet. When parsing IPv4 BIG TCP packets in tcpdump/libpcap, it can use tp_len as the IPv4 packet len when this flag is set, as iph tot_len is set to 0 for IPv4 BIG TCP packets. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01ipvlan: use skb_ip_totlen in ipvlan_get_L3_hdrXin Long
ipvlan devices calls netif_inherit_tso_max() to get the tso_max_size/segs from the lower device, so when lower device supports BIG TCP, the ipvlan devices support it too. We also should consider its iph tot_len accessing. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01cipso_ipv4: use iph_set_totlen in skbuff_setattrXin Long
It may process IPv4 TCP GSO packets in cipso_v4_skbuff_setattr(), so the iph->tot_len update should use iph_set_totlen(). Note that for these non GSO packets, the new iph tot_len with extra iph option len added may become greater than 65535, the old process will cast it and set iph->tot_len to it, which is a bug. In theory, iph options shouldn't be added for these big packets in here, a fix may be needed here in the future. For now this patch is only to set iph->tot_len to 0 when it happens. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01netfilter: use skb_ip_totlen and iph_totlenXin Long
There are also quite some places in netfilter that may process IPv4 TCP GSO packets, we need to replace them too. In length_mt(), we have to use u_int32_t/int to accept skb_ip_totlen() return value, otherwise it may overflow and mismatch. This change will also help us add selftest for IPv4 BIG TCP in the following patch. Note that we don't need to replace the one in tcpmss_tg4(), as it will return if there is data after tcphdr in tcpmss_mangle_packet(). The same in mangle_contents() in nf_nat_helper.c, it returns false when skb->len + extra > 65535 in enlarge_skb(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01net: sched: use skb_ip_totlen and iph_totlenXin Long
There are 1 action and 1 qdisc that may process IPv4 TCP GSO packets and access iph->tot_len, replace them with skb_ip_totlen() and iph_totlen() accordingly. Note that we don't need to replace the one in tcf_csum_ipv4(), as it will return for TCP GSO packets in tcf_csum_ipv4_tcp(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01openvswitch: use skb_ip_totlen in conntrackXin Long
IPv4 GSO packets may get processed in ovs_skb_network_trim(), and we need to use skb_ip_totlen() to get iph totlen. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01bridge: use skb_ip_totlen in br netfilterXin Long
These 3 places in bridge netfilter are called on RX path after GRO and IPv4 TCP GSO packets may come through, so replace iph tot_len accessing with skb_ip_totlen() in there. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01net: add a couple of helpers for iph tot_lenXin Long
This patch adds three APIs to replace the iph->tot_len setting and getting in all places where IPv4 BIG TCP packets may reach, they will be used in the following patches. Note that iph_totlen() will be used when iph is not in linear data of the skb. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01Merge branch ↵Jakub Kicinski
'virtio_net-vdpa-update-mac-address-when-it-is-generated-by-virtio-net' Laurent Vivier says: ==================== virtio_net: vdpa: update MAC address when it is generated by virtio-net When the MAC address is not provided by the vdpa device virtio_net driver assigns a random one without notifying the device. The consequence, in the case of mlx5_vdpa, is the internal routing tables of the device are not updated and this can block the communication between two namespaces. To fix this problem, use virtnet_send_command(VIRTIO_NET_CTRL_MAC) to set the address from virtnet_probe() when the MAC address is not provided by the device. ==================== Link: https://lore.kernel.org/r/20230127204500.51930-1-lvivier@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01virtio_net: notify MAC address change on device initializationLaurent Vivier
In virtnet_probe(), if the device doesn't provide a MAC address the driver assigns a random one. As we modify the MAC address we need to notify the device to allow it to update all the related information. The problem can be seen with vDPA and mlx5_vdpa driver as it doesn't assign a MAC address by default. The virtio_net device uses a random MAC address (we can see it with "ip link"), but we can't ping a net namespace from another one using the virtio-vdpa device because the new MAC address has not been provided to the hardware: RX packets are dropped since they don't go through the receive filters, TX packets go through unaffected. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01virtio_net: disable VIRTIO_NET_F_STANDBY if VIRTIO_NET_F_MAC is not setLaurent Vivier
failover relies on the MAC address to pair the primary and the standby devices: "[...] the hypervisor needs to enable VIRTIO_NET_F_STANDBY feature on the virtio-net interface and assign the same MAC address to both virtio-net and VF interfaces." Documentation/networking/net_failover.rst This patch disables the STANDBY feature if the MAC address is not provided by the hypervisor. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01riscv: disable generation of unwind tablesAndreas Schwab
GCC 13 will enable -fasynchronous-unwind-tables by default on riscv. In the kernel, we don't have any use for unwind tables yet, so disable them. More importantly, the .eh_frame section brings relocations (R_RISC_32_PCREL, R_RISCV_SET{6,8,16}, R_RISCV_SUB{6,8,16}) into modules that we are not prepared to handle. Signed-off-by: Andreas Schwab <schwab@suse.de> Link: https://lore.kernel.org/r/mvmzg9xybqu.fsf@suse.de Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-01riscv: kprobe: Fixup kernel panic when probing an illegal positionGuo Ren
The kernel would panic when probed for an illegal position. eg: (CONFIG_RISCV_ISA_C=n) echo 'p:hello kernel_clone+0x16 a0=%a0' >> kprobe_events echo 1 > events/kprobes/hello/enable cat trace Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __do_sys_newfstatat+0xb8/0xb8 CPU: 0 PID: 111 Comm: sh Not tainted 6.2.0-rc1-00027-g2d398fe49a4d #490 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff80007268>] dump_backtrace+0x38/0x48 [<ffffffff80c5e83c>] show_stack+0x50/0x68 [<ffffffff80c6da28>] dump_stack_lvl+0x60/0x84 [<ffffffff80c6da6c>] dump_stack+0x20/0x30 [<ffffffff80c5ecf4>] panic+0x160/0x374 [<ffffffff80c6db94>] generic_handle_arch_irq+0x0/0xa8 [<ffffffff802deeb0>] sys_newstat+0x0/0x30 [<ffffffff800158c0>] sys_clone+0x20/0x30 [<ffffffff800039e8>] ret_from_syscall+0x0/0x4 ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __do_sys_newfstatat+0xb8/0xb8 ]--- That is because the kprobe's ebreak instruction broke the kernel's original code. The user should guarantee the correction of the probe position, but it couldn't make the kernel panic. This patch adds arch_check_kprobe in arch_prepare_kprobe to prevent an illegal position (Such as the middle of an instruction). Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20230201040604.3390509-1-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-01nfp: correct cleanup related to DCB resourcesHuayu Chen
This patch corrects two oversights relating to releasing resources and DCB initialisation. 1. If mapping of the dcbcfg_tbl area fails: an error should be propagated, allowing partial initialisation (probe) to be unwound. 2. Conversely, if where dcbcfg_tbl is successfully mapped: it should be unmapped in nfp_nic_dcb_clean() which is called via various error cleanup paths, and shutdown or removal of the PCIE device. Fixes: 9b7fe8046d74 ("nfp: add DCB IEEE support") Signed-off-by: Huayu Chen <huayu.chen@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230131163033.981937-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01nfp: flower: avoid taking mutex in atomic contextYanguo Li
A mutex may sleep, which is not permitted in atomic context. Avoid a case where this may arise by moving the to nfp_flower_lag_get_info_from_netdev() in nfp_tun_write_neigh() spinlock. Fixes: abc210952af7 ("nfp: flower: tunnel neigh support bond offload") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Yanguo Li <yanguo.li@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230131080313.2076060-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01ipv6: ICMPV6: Use swap() instead of open coding itJiapeng Chong
Swap is a function interface that provides exchange function. To avoid code duplication, we can use swap function. ./net/ipv6/icmp.c:344:25-26: WARNING opportunity for swap(). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3896 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230131063456.76302-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01Merge branch ↵Jakub Kicinski
'ip-ip6_gre-fix-gre-tunnels-not-generating-ipv6-link-local-addresses' Thomas Winter says: ==================== ip/ip6_gre: Fix GRE tunnels not generating IPv6 link local addresses For our point-to-point GRE tunnels, they have IN6_ADDR_GEN_MODE_NONE when they are created then we set IN6_ADDR_GEN_MODE_EUI64 when they come up to generate the IPv6 link local address for the interface. Recently we found that they were no longer generating IPv6 addresses. Also, non-point-to-point tunnels were not generating any IPv6 link local address and instead generating an IPv6 compat address, breaking IPv6 communication on the tunnel. These failures were caused by commit e5dd729460ca and this patch set aims to resolve these issues. ==================== Link: https://lore.kernel.org/r/20230131034646.237671-1-Thomas.Winter@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01ip/ip6_gre: Fix non-point-to-point tunnel not generating IPv6 link local addressThomas Winter
We recently found that our non-point-to-point tunnels were not generating any IPv6 link local address and instead generating an IPv6 compat address, breaking IPv6 communication on the tunnel. Previously, addrconf_gre_config always would call addrconf_addr_gen and generate a EUI64 link local address for the tunnel. Then commit e5dd729460ca changed the code path so that add_v4_addrs is called but this only generates a compat IPv6 address for non-point-to-point tunnels. I assume the compat address is specifically for SIT tunnels so have kept that only for SIT - GRE tunnels now always generate link local addresses. Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01ip/ip6_gre: Fix changing addr gen mode not generating IPv6 link local addressThomas Winter
For our point-to-point GRE tunnels, they have IN6_ADDR_GEN_MODE_NONE when they are created then we set IN6_ADDR_GEN_MODE_EUI64 when they come up to generate the IPv6 link local address for the interface. Recently we found that they were no longer generating IPv6 addresses. This issue would also have affected SIT tunnels. Commit e5dd729460ca changed the code path so that GRE tunnels generate an IPv6 address based on the tunnel source address. It also changed the code path so GRE tunnels don't call addrconf_addr_gen in addrconf_dev_config which is called by addrconf_sysctl_addr_gen_mode when the IN6_ADDR_GEN_MODE is changed. This patch aims to fix this issue by moving the code in addrconf_notify which calls the addr gen for GRE and SIT into a separate function and calling it in the places that expect the IPv6 address to be generated. The previous addrconf_dev_config is renamed to addrconf_eth_config since it only expected eth type interfaces and follows the addrconf_gre/sit_config format. A part of this changes means that the loopback address will be attempted to be configured when changing addr_gen_mode for lo. This should not be a problem because the address should exist anyway and if does already exist then no error is produced. Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01drm/amd/display: Properly handle additional cases where DCN is not supportedAlex Deucher
There could be boards with DCN listed in IP discovery, but no display hardware actually wired up. In this case the vbios display table will not be populated. Detect this case and skip loading DM when we detect it. v2: Mark DCN as harvested as well so other display checks elsewhere in the driver are handled properly. Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amdgpu: Enable vclk dclk node for gc11.0.3Yiqing Yao
These sysfs nodes are tested supported, so enable them. Signed-off-by: Yiqing Yao <yiqing.yao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amd: Fix initialization for nbio 4.3.0Mario Limonciello
A mistake has been made on some boards with NBIO 4.3.0 where some NBIO registers aren't properly set by the hardware. Ensure that they're set during initialization. Cc: Natikar Basavaraj <Basavaraj.Natikar@amd.com> Tested-by: Satyanarayana ReddyTVN <Satyanarayana.ReddyTVN@amd.com> Tested-by: Rutvij Gajjar <Rutvij.Gajjar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-02-01drm/amdgpu: enable HDP SD for gfx 11.0.3Evan Quan
Enable HDP clock gating control for gfx 11.0.3. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amd/pm: drop unneeded dpm features disablement for SMU 13.0.4/11Tim Huang
PMFW will handle the features disablement properly for gpu reset case, driver involvement may cause some unexpected issues. Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amd/display: Reset DMUB mailbox SW state after HW resetNicholas Kazlauskas
[Why] Otherwise we can be out of sync with what's in the hardware, leading to us rerunning every command that's presently in the ringbuffer. [How] Reset software state for the mailboxes in hw_reset callback. This is already done as part of the mailbox init in hw_init, but we do need to remember to reset the last cached wptr value as well here. Reviewed-by: Hansen Dsouza <hansen.dsouza@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amd/display: Unassign does_plane_fit_in_mall function from dcn3.2George Shen
[Why] The hwss function does_plane_fit_in_mall not applicable to dcn3.2 asics. Using it with dcn3.2 can result in undefined behaviour. [How] Assign the function pointer to NULL. Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amd/display: Adjust downscaling limits for dcn314Daniel Miess
[Why] Lower max_downscale_ratio and ARGB888 downscale factor to prevent cases where underflow may occur on dcn314 [How] Set max_downscale_ratio to 400 and ARGB downscale factor to 250 for dcn314 Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Daniel Miess <Daniel.Miess@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amd/display: Add missing brackets in calculationDaniel Miess
[Why] Brackets missing in the calculation for MIN_DST_Y_NEXT_START [How] Add missing brackets for this calculation Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Daniel Miess <Daniel.Miess@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amdgpu: update wave data type to 3 for gfx11Graham Sider
SQ_WAVE_INST_DW0 isn't present on gfx11 compared to gfx10, so update wave data type to signify a difference. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Mukul Joshi <Mukul.Joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-02-01bnxt_en: Remove runtime interrupt vector allocationAjit Khaparde
Modified the bnxt_en code to create and pre-configure RDMA devices with the right MSI-X vector count for the ROCE driver to use. This is to align the ROCE driver to the auxiliary device model which will simply bind the driver without getting into PCI-related handling. All PCI-related logic will now be in the bnxt_en driver. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01RDMA/bnxt_re: Remove the sriov config callbackAjit Khaparde
Remove the SRIOV config callback which the bnxt_en was calling to reconfigure the chip resources for a PF device when VFs are created. The code is now modified to provision the VF resources based on the total VF count instead of the actual VF count. This allows the SRIOV config callback to be removed from the list of ulp_ops. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Remove struct bnxt access from RoCE driverHongguang Gao
Decouple RoCE driver from directly accessing L2's private bnxt structure. Move the fields needed by RoCE driver into bnxt_en_dev. They'll be passed to RoCE driver by bnxt_rdma_aux_device_add() function. Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Use auxiliary bus calls over proprietary callsAjit Khaparde
Wherever possible use the function ops provided by auxiliary bus instead of using proprietary ops. Defined bnxt_re_suspend and bnxt_re_resume calls which can be invoked by the bnxt_en driver instead of the ULP stop/start calls. Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Use direct API instead of indirectionAjit Khaparde
For a single ULP user there is no need for complicating function indirection calls. Remove all this complexity in favour of direct function calls exported by the bnxt_en driver. This allows to simplify the code greatly. Also remove unused ulp_async_notifier. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Remove usage of ulp_idAjit Khaparde
Since the driver continues to use the single ULP model, the extra complexity and indirection is unnecessary. Remove the usage of ulp_id from the code. Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01RDMA/bnxt_re: Use auxiliary driver interfaceAjit Khaparde
Use auxiliary driver interface for driver load, unload ROCE driver. The driver does not need to register the interface using the netdev notifier anymore. Removed the bnxt_re_dev_list which is not needed. Currently probe, remove and shutdown ops have been implemented for the auxiliary device. Also remove exccessve validation checks for rdev. Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01bnxt_en: Add auxiliary driver supportAjit Khaparde
Add auxiliary driver support. An auxiliary device will be created if the hardware indicates support for RDMA. The bnxt_ulp_probe() function has been removed and a new bnxt_rdma_aux_device_add() function has been added. The bnxt_free_msix_vecs() and bnxt_req_msix_vecs() will now hold the RTNL lock when they call the bnxt_close_nic()and bnxt_open_nic() since the device close and open need to be protected under RTNL lock. The operations between the bnxt_en and bnxt_re will be protected using the en_ops_lock. This will be used by the bnxt_re driver in a follow-on patch to create ROCE interfaces. Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-01blk-cgroup: don't update io stat for root cgroupMing Lei
We source root cgroup stats from the system-wide stats, see blkcg_print_stat and blkcg_rstat_flush, so don't update io state for root cgroup. Fixes blkg leak issue introduced in commit 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()") which starts to grab blkg's reference when adding iostat_cpu into percpu blkcg list, but this state won't be consumed by blkcg_rstat_flush() where the blkg reference is dropped. Tested-by: Bart van Assche <bvanassche@acm.org> Reported-by: Bart van Assche <bvanassche@acm.org> Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()") Cc: Tejun Heo <tj@kernel.org> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230202021804.278582-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-02powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()Michael Ellerman
Commit baf1ed24b27d ("powerpc/mm: Remove empty hash__ functions") removed some empty hash MMU flushing routines, but got a bit overeager and also removed the call to hash__tlb_flush() from tlb_flush(). In regular use this doesn't lead to any noticable breakage, which is a little concerning. Presumably there are flushes happening via other paths such as arch_leave_lazy_mmu_mode(), and/or a bit of luck. Fix it by reinstating the call to hash__tlb_flush(). Fixes: baf1ed24b27d ("powerpc/mm: Remove empty hash__ functions") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230131111407.806770-1-mpe@ellerman.id.au
2023-02-02selftests/bpf: xdp_hw_metadata use strncpy for ifnameJesper Dangaard Brouer
The ifname char pointer is taken directly from the command line as input and the string is copied directly into struct ifreq via strcpy. This makes it easy to corrupt other members of ifreq and generally do stack overflows. Most often the ioctl will fail with: ./xdp_hw_metadata: ioctl(SIOCETHTOOL): Bad address As people will likely copy-paste code for getting NIC queue channels (rxq_num) and enabling HW timestamping (hwtstamp_ioctl) lets make this code a bit more secure by using strncpy. Fixes: 297a3f124155 ("selftests/bpf: Simple program to dump XDP RX metadata") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/167527272543.937063.16993147790832546209.stgit@firesoul
2023-02-02selftests/bpf: xdp_hw_metadata correct status value in error(3)Jesper Dangaard Brouer
The glibc error reporting function error(): void error(int status, int errnum, const char *format, ...); The status argument should be a positive value between 0-255 as it is passed over to the exit(3) function as the value as the shell exit status. The least significant byte of status (i.e., status & 0xFF) is returned to the shell parent. Fix this by using 1 instead of -1. As 1 corresponds to C standard constant EXIT_FAILURE. Fixes: 297a3f124155 ("selftests/bpf: Simple program to dump XDP RX metadata") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/167527272038.937063.9137108142012298120.stgit@firesoul
2023-02-02selftests/bpf: xdp_hw_metadata cleanup cause segfaultJesper Dangaard Brouer
Using xdp_hw_metadata I experince Segmentation fault after seeing "detaching bpf program....". On my system the segfault happened when accessing bpf_obj->skeleton in xdp_hw_metadata__destroy(bpf_obj) call. That doesn't make any sense as this memory have not been freed by program at this point in time. Prior to calling xdp_hw_metadata__destroy(bpf_obj) the function close_xsk() is called for each RX-queue xsk. The real bug lays in close_xsk() that unmap via munmap() the wrong memory pointer. The call xsk_umem__delete(xsk->umem) will free xsk->umem, thus the call to munmap(xsk->umem, UMEM_SIZE) will have unpredictable behavior. And man page explain subsequent references to these pages will generate SIGSEGV. Unmapping xsk->umem_area instead removes the segfault. Fixes: 297a3f124155 ("selftests/bpf: Simple program to dump XDP RX metadata") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/167527271533.937063.5717065138099679142.stgit@firesoul
2023-02-02selftests/bpf: xdp_hw_metadata clear metadata when -EOPNOTSUPPJesper Dangaard Brouer
The AF_XDP userspace part of xdp_hw_metadata see non-zero as a signal of the availability of rx_timestamp and rx_hash in data_meta area. The kernel-side BPF-prog code doesn't initialize these members when kernel returns an error e.g. -EOPNOTSUPP. This memory area is not guaranteed to be zeroed, and can contain garbage/previous values, which will be read and interpreted by AF_XDP userspace side. Tested this on different drivers. The experiences are that for most packets they will have zeroed this data_meta area, but occasionally it will contain garbage data. Example of failure tested on ixgbe: poll: 1 (0) xsk_ring_cons__peek: 1 0x18ec788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 rx_hash: 3697961069 rx_timestamp: 9024981991734834796 (sec:9024981991.7348) 0x18ec788: complete idx=8 addr=8000 Converting to date: date -d @9024981991 2255-12-28T20:26:31 CET I choose a simple fix in this patch. When kfunc fails or isn't supported assign zero to the corresponding struct meta value. It's up to the individual BPF-programmer to do something smarter e.g. that fits their use-case, like getting a software timestamp and marking a flag that gives the type of timestamp. Fixes: 297a3f124155 ("selftests/bpf: Simple program to dump XDP RX metadata") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/167527271027.937063.5177725618616476592.stgit@firesoul
2023-02-02selftests/bpf: Fix unmap bug in prog_tests/xdp_metadata.cJesper Dangaard Brouer
The function close_xsk() unmap via munmap() the wrong memory pointer. The call xsk_umem__delete(xsk->umem) have already freed xsk->umem. Thus the call to munmap(xsk->umem, UMEM_SIZE) will have unpredictable behavior that can lead to Segmentation fault elsewhere, as man page explain subsequent references to these pages will generate SIGSEGV. Fixes: e2a46d54d7a1 ("selftests/bpf: Verify xdp_metadata xdp->af_xdp path") Reported-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/167527517464.938135.13750760520577765269.stgit@firesoul
2023-02-02Merge branch 'kfunc-annotation'Daniel Borkmann
David Vernet says: ==================== This is v3 of the patchset [0]. v2 can be found at [1]. [0]: https://lore.kernel.org/bpf/Y7kCsjBZ%2FFrsWW%2Fe@maniforge.lan/T/ [1]: https://lore.kernel.org/lkml/20230123171506.71995-1-void@manifault.com/ Changelog: ---------- v2 -> v3: - Go back to the __bpf_kfunc approach from v1. The BPF_KFUNC macro received pushback as it didn't match the more typical EXPORT_SYMBOL* APIs used elsewhere in the kernel. It's the longer term plan, but for now we're proposing something less controversial to fix kfuncs and BTF encoding. - Add __bpf_kfunc macro to newly added cpumask kfuncs. - Add __bpf_kfunc macro to newly added XDP metadata kfuncs, which were failing to be BTF encoded in the thread in [2]. - Update patch description(s) to reference the discussions in [2]. - Add a selftest that validates that a static kfunc with unused args is properly BTF encoded and can be invoked. [2]: https://lore.kernel.org/all/fe5d42d1-faad-d05e-99ad-1c2c04776950@oracle.com/ v1 -> v2: - Wrap entire function signature in BPF_KFUNC macro instead of using __bpf_kfunc tag (Kumar) - Update all kfunc definitions to use this macro. - Update kfuncs.rst documentation to describe and illustrate the macro. - Also clean up a few small parts of kfuncs.rst, e.g. some grammar, and in general making it a bit tighter. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2023-02-02selftests/bpf: Add testcase for static kfunc with unused argDavid Vernet
kfuncs are allowed to be static, or not use one or more of their arguments. For example, bpf_xdp_metadata_rx_hash() in net/core/xdp.c is meant to be implemented by drivers, with the default implementation just returning -EOPNOTSUPP. As described in [0], such kfuncs can have their arguments elided, which can cause BTF encoding to be skipped. The new __bpf_kfunc macro should address this, and this patch adds a selftest which verifies that a static kfunc with at least one unused argument can still be encoded and invoked by a BPF program. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230201173016.342758-5-void@manifault.com
2023-02-02bpf: Add __bpf_kfunc tag to all kfuncsDavid Vernet
Now that we have the __bpf_kfunc tag, we should use add it to all existing kfuncs to ensure that they'll never be elided in LTO builds. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com
2023-02-02bpf: Document usage of the new __bpf_kfunc macroDavid Vernet
Now that the __bpf_kfunc macro has been added to linux/btf.h, include a blurb about it in the kfuncs.rst file. In order for the macro to successfully render with .. kernel-doc, we'll also need to add it to the c_id_attributes array. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230201173016.342758-3-void@manifault.com
2023-02-02bpf: Add __bpf_kfunc tag for marking kernel functions as kfuncsDavid Vernet
kfuncs are functions defined in the kernel, which may be invoked by BPF programs. They may or may not also be used as regular kernel functions, implying that they may be static (in which case the compiler could e.g. inline it away, or elide one or more arguments), or it could have external linkage, but potentially be elided in an LTO build if a function is observed to never be used, and is stripped from the final kernel binary. This has already resulted in some issues, such as those discussed in [0] wherein changes in DWARF that identify when a parameter has been optimized out can break BTF encodings (and in general break the kfunc). [0]: https://lore.kernel.org/all/1675088985-20300-2-git-send-email-alan.maguire@oracle.com/ We therefore require some convenience macro that kfunc developers can use just add to their kfuncs, and which will prevent all of the above issues from happening. This is in contrast with what we have today, where some kfunc definitions have "noinline", some have "__used", and others are static and have neither. Note that longer term, this mechanism may be replaced by a macro that more closely resembles EXPORT_SYMBOL_GPL(), as described in [1]. For now, we're going with this shorter-term approach to fix existing issues in kfuncs. [1]: https://lore.kernel.org/lkml/Y9AFT4pTydKh+PD3@maniforge.lan/ Note as well that checkpatch complains about this patch with the following: ERROR: Macros with complex values should be enclosed in parentheses +#define __bpf_kfunc __used noinline There seems to be a precedent for using this pattern in other places such as compiler_types.h (see e.g. __randomize_layout and noinstr), so it seems appropriate. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230201173016.342758-2-void@manifault.com