summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-04-23libbpf: Support extern resolution for BTF-defined maps in .maps sectionAndrii Nakryiko
Add extra logic to handle map externs (only BTF-defined maps are supported for linking). Re-use the map parsing logic used during bpf_object__open(). Map externs are currently restricted to always match complete map definition. So all the specified attributes will be compared (down to pining, map_flags, numa_node, etc). In the future this restriction might be relaxed with no backwards compatibility issues. If any attribute is mismatched between extern and actual map definition, linker will report an error, pointing out which one mismatches. The original intent was to allow for extern to specify attributes that matters (to user) to enforce. E.g., if you specify just key information and omit value, then any value fits. Similarly, it should have been possible to enforce map_flags, pinning, and any other possible map attribute. Unfortunately, that means that multiple externs can be only partially overlapping with each other, which means linker would need to combine their type definitions to end up with the most restrictive and fullest map definition. This requires an extra amount of BTF manipulation which at this time was deemed unnecessary and would require further extending generic BTF writer APIs. So that is left for future follow ups, if there will be demand for that. But the idea seems intresting and useful, so I want to document it here. Weak definitions are also supported, but are pretty strict as well, just like externs: all weak map definitions have to match exactly. In the follow up patches this most probably will be relaxed, with __weak map definitions being able to differ between each other (with non-weak definition always winning, of course). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-13-andrii@kernel.org
2021-04-23libbpf: Add linker extern resolution support for functions and global variablesAndrii Nakryiko
Add BPF static linker logic to resolve extern variables and functions across multiple linked together BPF object files. For that, linker maintains a separate list of struct glob_sym structures, which keeps track of few pieces of metadata (is it extern or resolved global, is it a weak symbol, which ELF section it belongs to, etc) and ties together BTF type info and ELF symbol information and keeps them in sync. With adding support for extern variables/funcs, it's now possible for some sections to contain both extern and non-extern definitions. This means that some sections may start out as ephemeral (if only externs are present and thus there is not corresponding ELF section), but will be "upgraded" to actual ELF section as symbols are resolved or new non-extern definitions are appended. Additional care is taken to not duplicate extern entries in sections like .kconfig and .ksyms. Given libbpf requires BTF type to always be present for .kconfig/.ksym externs, linker extends this requirement to all the externs, even those that are supposed to be resolved during static linking and which won't be visible to libbpf. With BTF information always present, static linker will check not just ELF symbol matches, but entire BTF type signature match as well. That logic is stricter that BPF CO-RE checks. It probably should be re-used by .ksym resolution logic in libbpf as well, but that's left for follow up patches. To make it unnecessary to rewrite ELF symbols and minimize BTF type rewriting/removal, ELF symbols that correspond to externs initially will be updated in place once they are resolved. Similarly for BTF type info, VAR/FUNC and var_secinfo's (sec_vars in struct bpf_linker) are staying stable, but types they point to might get replaced when extern is resolved. This might leave some left-over types (even though we try to minimize this for common cases of having extern funcs with not argument names vs concrete function with names properly specified). That can be addresses later with a generic BTF garbage collection. That's left for a follow up as well. Given BTF type appending phase is separate from ELF symbol appending/resolution, special struct glob_sym->underlying_btf_id variable is used to communicate resolution and rewrite decisions. 0 means underlying_btf_id needs to be appended (it's not yet in final linker->btf), <0 values are used for temporary storage of source BTF type ID (not yet rewritten), so -glob_sym->underlying_btf_id is BTF type id in obj-btf. But by the end of linker_append_btf() phase, that underlying_btf_id will be remapped and will always be > 0. This is the uglies part of the whole process, but keeps the other parts much simpler due to stability of sec_var and VAR/FUNC types, as well as ELF symbol, so please keep that in mind while reviewing. BTF-defined maps require some extra custom logic and is addressed separate in the next patch, so that to keep this one smaller and easier to review. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-12-andrii@kernel.org
2021-04-23libbpf: Tighten BTF type ID rewriting with error checkingAndrii Nakryiko
It should never fail, but if it does, it's better to know about this rather than end up with nonsensical type IDs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-11-andrii@kernel.org
2021-04-23libbpf: Extend sanity checking ELF symbols with externs validationAndrii Nakryiko
Add logic to validate extern symbols, plus some other minor extra checks, like ELF symbol #0 validation, general symbol visibility and binding validations. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-10-andrii@kernel.org
2021-04-23libbpf: Make few internal helpers available outside of libbpf.cAndrii Nakryiko
Make skip_mods_and_typedefs(), btf_kind_str(), and btf_func_linkage() helpers available outside of libbpf.c, to be used by static linker code. Also do few cleanups (error code fixes, comment clean up, etc) that don't deserve their own commit. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-9-andrii@kernel.org
2021-04-23libbpf: Factor out symtab and relos sanity checksAndrii Nakryiko
Factor out logic for sanity checking SHT_SYMTAB and SHT_REL sections into separate sections. They are already quite extensive and are suffering from too deep indentation. Subsequent changes will extend SYMTAB sanity checking further, so it's better to factor each into a separate function. No functional changes are intended. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-8-andrii@kernel.org
2021-04-23libbpf: Refactor BTF map definition parsingAndrii Nakryiko
Refactor BTF-defined maps parsing logic to allow it to be nicely reused by BPF static linker. Further, at least for BPF static linker, it's important to know which attributes of a BPF map were defined explicitly, so provide a bit set for each known portion of BTF map definition. This allows BPF static linker to do a simple check when dealing with extern map declarations. The same capabilities allow to distinguish attributes explicitly set to zero (e.g., __uint(max_entries, 0)) vs the case of not specifying it at all (no max_entries attribute at all). Libbpf is currently not utilizing that, but it could be useful for backwards compatibility reasons later. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-7-andrii@kernel.org
2021-04-23libbpf: Allow gaps in BPF program sections to support overriden weak functionsAndrii Nakryiko
Currently libbpf is very strict about parsing BPF program instruction sections. No gaps are allowed between sequential BPF programs within a given ELF section. Libbpf enforced that by keeping track of the next section offset that should start a new BPF (sub)program and cross-checks that by searching for a corresponding STT_FUNC ELF symbol. But this is too restrictive once we allow to have weak BPF programs and link together two or more BPF object files. In such case, some weak BPF programs might be "overridden" by either non-weak BPF program with the same name and signature, or even by another weak BPF program that just happened to be linked first. That, in turn, leaves BPF instructions of the "lost" BPF (sub)program intact, but there is no corresponding ELF symbol, because no one is going to be referencing it. Libbpf already correctly handles such cases in the sense that it won't append such dead code to actual BPF programs loaded into kernel. So the only change that needs to be done is to relax the logic of parsing BPF instruction sections. Instead of assuming next BPF (sub)program section offset, iterate available STT_FUNC ELF symbols to discover all available BPF subprograms and programs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-6-andrii@kernel.org
2021-04-23libbpf: Mark BPF subprogs with hidden visibility as static for BPF verifierAndrii Nakryiko
Define __hidden helper macro in bpf_helpers.h, which is a short-hand for __attribute__((visibility("hidden"))). Add libbpf support to mark BPF subprograms marked with __hidden as static in BTF information to enforce BPF verifier's static function validation algorithm, which takes more information (caller's context) into account during a subprogram validation. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-5-andrii@kernel.org
2021-04-23libbpf: Suppress compiler warning when using SEC() macro with externsAndrii Nakryiko
When used on externs SEC() macro will trigger compilation warning about inapplicable `__attribute__((used))`. That's expected for extern declarations, so suppress it with the corresponding _Pragma. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-4-andrii@kernel.org
2021-04-23bpftool: Dump more info about DATASEC membersAndrii Nakryiko
Dump succinct information for each member of DATASEC: its kinds and name. This is extremely helpful to see at a quick glance what is inside each DATASEC of a given BTF. Without this, one has to jump around BTF data to just find out the name of a VAR or FUNC. DATASEC's var_secinfo member is special in that regard because it doesn't itself contain the name of the member, delegating that to the referenced VAR and FUNC kinds. Other kinds, like STRUCT/UNION/FUNC/ENUM, encode member names directly and thus are clearly identifiable in BTF dump. The new output looks like this: [35] DATASEC '.bss' size=0 vlen=6 type_id=8 offset=0 size=4 (VAR 'input_bss1') type_id=13 offset=0 size=4 (VAR 'input_bss_weak') type_id=16 offset=0 size=4 (VAR 'output_bss1') type_id=17 offset=0 size=4 (VAR 'output_data1') type_id=18 offset=0 size=4 (VAR 'output_rodata1') type_id=20 offset=0 size=8 (VAR 'output_sink1') [36] DATASEC '.data' size=0 vlen=2 type_id=9 offset=0 size=4 (VAR 'input_data1') type_id=14 offset=0 size=4 (VAR 'input_data_weak') [37] DATASEC '.kconfig' size=0 vlen=2 type_id=25 offset=0 size=4 (VAR 'LINUX_KERNEL_VERSION') type_id=28 offset=0 size=1 (VAR 'CONFIG_BPF_SYSCALL') [38] DATASEC '.ksyms' size=0 vlen=1 type_id=30 offset=0 size=1 (VAR 'bpf_link_fops') [39] DATASEC '.rodata' size=0 vlen=2 type_id=12 offset=0 size=4 (VAR 'input_rodata1') type_id=15 offset=0 size=4 (VAR 'input_rodata_weak') [40] DATASEC 'license' size=0 vlen=1 type_id=24 offset=0 size=4 (VAR 'LICENSE') Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-3-andrii@kernel.org
2021-04-23bpftool: Support dumping BTF VAR's "extern" linkageAndrii Nakryiko
Add dumping of "extern" linkage for BTF VAR kind. Also shorten "global-allocated" to "global" to be in line with FUNC's "global". Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-2-andrii@kernel.org
2021-04-23Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2021-04-23 This series contains updates to i40e and iavf drivers. Aleksandr adds support for VIRTCHNL_VF_CAP_ADV_LINK_SPEED in i40e which allows for reporting link speed to VF as a value instead of using an enum; helper functions are created to remove repeated code. Coiby Xu reduces memory use of i40e when using kdump by reducing Tx, Rx, and admin queue to minimum values. Current use causes failure of kdump. Stefan Assmann removes duplicated free calls in iavf. Haiyue cleans up a loop to return directly when if the value is found and changes some magic numbers to defines for better maintainability in iavf. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23mptcp: Retransmit DATA_FINMat Martineau
With this change, the MPTCP-level retransmission timer is used to resend DATA_FIN. The retranmit timer is not stopped while waiting for a MPTCP-level ACK of DATA_FIN, and retransmitted DATA_FINs are sent on all subflows. The retry interval starts at TCP_RTO_MIN and then doubles on each attempt, up to TCP_RTO_MAX. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/146 Fixes: 43b54c6ee382 ("mptcp: Use full MPTCP-level disconnect state machine") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23Merge branch 'mlxsw-selftest-fixes'David S. Miller
Petr Machata says: ==================== selftests: mlxsw: Fixes This patch set carries fixes to selftest issues that we have hit in our nightly regression run. Almost all are in mlxsw selftests, though one is in a generic forwarding selftest. - In patch #1, in an ERSPAN test, install an FDB entry as static instead of (implicitly) as local. - In the mlxsw resource-scale test, an if statement overrides the value of $?, which is supposed to contain the result of the test. As a result, the resource scale test can spuriously pass. In patches #2 and #3, remove the if statements to fix the issue in, respectively, port_scale test and tc_flower_scale tests. - Again in the mlxsw resource-scale test, when more then one sub-test is run, a successful sub-test overrides any previous failures. This causes a spurious pass of the overall test. This is fixed in patch #4. - In patch #5, increase a tolerance in a mlxsw-specific RED backlog test. This test is very noisy, due to rounding errors and the unpredictability of software traffic generation. By bumping the tolerance from 5 % to 10, get the failure rate to zero. This shouldn't impact the accuracy, mistakes in backlog configuration (e.g. due to wrong cell size) are likely to cause a much larger discrepancy. - In patch #6, fix mausezahn invocation in the mlxsw ERSPAN scale test. The test failed because of the wrong invocation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23selftests: mlxsw: Fix mausezahn invocation in ERSPAN scale testPetr Machata
The mirror_gre_scale test creates as many ERSPAN sessions as the underlying chip supports, and tests that they all work. In order to determine that it issues a stream of ICMP packets and checks if they are mirrored as expected. However, the mausezahn invocation missed the -6 flag to identify the use of IPv6 protocol, and was sending ICMP messages over IPv6, as opposed to ICMP6. It also didn't pass an explicit source IP address, which apparently worked at some point in the past, but does not anymore. To fix these issues, extend the function mirror_test() in mirror_lib by detecting the IPv6 protocol addresses, and using a different ICMP scheme. Fix __mirror_gre_test() in the selftest itself to pass a source IP address. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23selftests: mlxsw: Increase the tolerance of backlog buildupPetr Machata
The intention behind this test is to make sure that qdisc limit is correctly projected to the HW. However, first, due to rounding in the qdisc, and then in the driver, the number cannot actually be accurate. And second, the approach to testing this is to oversubscribe the port with traffic generated on the same switch. The actual backlog size therefore fluctuates. In practice, this test proved to be noisier than the rest, and spuriously fails every now and then. Increase the tolerance to 10 % to avoid these issues. Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23selftests: mlxsw: Return correct error code in resource scale testsDanielle Ratson
Currently, the resource scale test checks a few cases, when the error code resets between the cases. So for example, if one case fails and the consecutive case passes, the error code eventually will fit the last test and will be 0. Save a new return code that will hold the 'or' return codes of all the cases, so the final return code will consider all the cases. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23selftests: mlxsw: Remove a redundant if statement in tc_flower_scale testDanielle Ratson
Currently, the error return code of the failure condition is lost after using an if statement, so the test doesn't fail when it should. Remove the if statement that separates the condition and the error code check, so the test won't always pass. Fixes: abfce9e062021 ("selftests: mlxsw: Reduce running time using offload indication") Reported-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23selftests: mlxsw: Remove a redundant if statement in port_scale testDanielle Ratson
Currently, the error return code of the failure condition is lost after using an if statement, so the test doesn't fail when it should. Remove the if statement that separates the condition and the error code check, so the test won't always pass. Fixes: 5154b1b826d9b ("selftests: mlxsw: Add a scale test for physical ports") Reported-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23selftests: net: mirror_gre_vlan_bridge_1q: Make an FDB entry staticPetr Machata
The FDB roaming test installs a destination MAC address on the wrong interface of an FDB database and tests whether the mirroring fails, because packets are sent to the wrong port. The test by mistake installs the FDB entry as local. This worked previously, because drivers were notified of local FDB entries in the same way as of static entries. However that has been fixed in the commit 6ab4c3117aec ("net: bridge: don't notify switchdev for local FDB addresses"), and local entries are not notified anymore. As a result, the HW is not reconfigured for the FDB roam, and mirroring keeps working, failing the test. To fix the issue, mark the FDB entry as static. Fixes: 9c7c8a82442c ("selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23KVM: x86/xen: Take srcu lock when accessing kvm_memslots()Wanpeng Li
kvm_memslots() will be called by kvm_write_guest_offset_cached() so we should take the srcu lock. Let's pull the srcu lock operation from kvm_steal_time_set_preempted() again to fix xen part. Fixes: 30b5c851af7 ("KVM: x86/xen: Add support for vCPU runstate information") Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1619166200-9215-1-git-send-email-wanpengli@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-23Merge tag 'wireless-drivers-next-2021-04-23' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.13 Third, and final, set of patches for v5.13. We got one more week before the merge window and this includes from that extra week. Smaller features to rtw88 and mt76, but mostly this contains fixes. rtw88 * 8822c: Add gap-k calibration to improve long range performance mt76 * parse rate power limits from DT * debugfs file to test firmware crash * debugfs to disable NAPI threaded mode ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23Merge branch 'r8152-adjust-REALTEK_USB_DEVICE'David S. Miller
Hayes Wang says: ==================== r8152: adjust REALTEK_USB_DEVICE Modify REALTEK_USB_DEVICE macro. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23r8152: redefine REALTEK_USB_DEVICE macroHayes Wang
Redefine REALTEK_USB_DEVICE macro with USB_DEVICE_INTERFACE_CLASS and USB_DEVICE_AND_INTERFACE_INFO to simply the code. Although checkpatch.pl shows the following error, it is more readable. ERROR: Macros with complex values should be enclosed in parentheses Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23r8152: remove NCM mode from REALTEK_USB_DEVICE macroHayes Wang
The RTL8156 support CDC NCM mode. And users could set the configuration of the USB device between vendor and NCM mode dynamically by themselves. That is, the driver doesn't need to set vendor mode from NCM mode. Fixes: 195aae321c82 ("r8152: support new chips") Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23enetc: fix locking for one-step timestamping packet transferYangbo Lu
The previous patch to support PTP Sync packet one-step timestamping described one-step timestamping packet handling logic as below in commit message: - Trasmit packet immediately if no other one in transfer, or queue to skb queue if there is already one in transfer. The test_and_set_bit_lock() is used here to lock and check state. - Start a work when complete transfer on hardware, to release the bit lock and to send one skb in skb queue if has. There was not problem of the description, but there was a mistake in implementation. The locking/test_and_set_bit_lock() should be put in enetc_start_xmit() which may be called by worker, rather than in enetc_xmit(). Otherwise, the worker calling enetc_start_xmit() after bit lock released is not able to lock again for transfer. Fixes: 7294380c5211 ("enetc: support PTP Sync packet one-step timestamping") Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-04-23 1) The SPI flow key in struct flowi has no consumers, so remove it. From Florian Westphal. 2) Remove stray synchronize_rcu from xfrm_init. From Florian Westphal. 3) Use the new exit_pre hook to reset the netlink socket on net namespace destruction. From Florian Westphal. 4) Remove an unnecessary get_cpu() in ipcomp, that code is always called with BHs off. From Sabrina Dubroca. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23Merge branch 'mk_eth_soc_fixes-perf-improvements'David S. Miller
Ilya Lipnitskiy says: ==================== mtk_eth_soc: fixes and performance improvements Most of these changes come from OpenWrt where they have been present and tested for months. First three patches are bug fixes. The rest are performance improvements. The last patch is a cleanup to use the iopoll.h macro for busy-waiting instead of a custom loop. v2: - Reverse christmas tree in "use iopoll.h macro for DMA init" - Use cond_resched() instead of iopoll.h macro in "reduce MDIO bus access latency" - Use napi_complete_done and rework NAPI callbacks in a new patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: use iopoll.h macro for DMA initIlya Lipnitskiy
Replace a tight busy-wait loop without a pause with a standard readx_poll_timeout_atomic routine with a 5 us poll period. Tested by booting a MT7621 device to ensure the driver initializes properly. Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: set PPE flow hash as skb hash if presentFelix Fietkau
This improves GRO performance Signed-off-by: Felix Fietkau <nbd@nbd.name> [Ilya: Use MTK_RXD4_FOE_ENTRY instead of GENMASK(13, 0)] Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: rework NAPI callbacksIlya Lipnitskiy
Use napi_complete_done to communicate total TX and RX work done to NAPI. Count total RX work up instead of remaining work down for clarity. Remove unneeded local variables for clarity. Use do {} while instead of goto for clarity. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: reduce unnecessary interruptsFelix Fietkau
Avoid rearming interrupt if napi_complete returns false Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: only read the full RX descriptor if DMA is doneFelix Fietkau
Uncached memory access is expensive, and there is no need to access all descriptor words if we can't process them anyway Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: cache HW pointer of last freed TX descriptorFelix Fietkau
The value is only updated by the CPU, so it is cheaper to access from the ring data structure than from a hardware register. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: implement dynamic interrupt moderationFelix Fietkau
Reduces the number of interrupts under load Signed-off-by: Felix Fietkau <nbd@nbd.name> [Ilya: add documentation for new struct fields] Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: increase DMA ring sizesFelix Fietkau
256 descriptors is not enough for multi-gigabit traffic under load on MT7622. Bump it to 512 to improve performance. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: use larger burst size for QDMA TXFelix Fietkau
Improves tx performance Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: remove unnecessary TX queue stopsFelix Fietkau
When running short on descriptors, only stop the queue for the netdev that tx was attempted for. By the time something tries to send on the other netdev, the ring might have some more room already. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: reduce MDIO bus access latencyFelix Fietkau
usleep_range often ends up sleeping much longer than the 10-20us provided as a range here. This causes significant latency in mdio bus acceses, which easily adds multiple seconds to the boot time on MT7621 when polling DSA slave ports. Use cond_resched instead of usleep_range, since the MDIO access does not take much time Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: use napi_consume_skbFelix Fietkau
Should improve performance, since it can use bulk free Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: fix build_skb cleanupIlya Lipnitskiy
In case build_skb fails, call skb_free_frag on the correct pointer. Also update the DMA structures with the new mapping before exiting, because the mapping was successful Suggested-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: unmap RX data before calling build_skbFelix Fietkau
Since build_skb accesses the data area (for initializing shinfo), dma unmap needs to happen before that call Signed-off-by: Felix Fietkau <nbd@nbd.name> [Ilya: split build_skb cleanup fix into a separate commit] Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: ethernet: mtk_eth_soc: fix RX VLAN offloadFelix Fietkau
The VLAN ID in the rx descriptor is only valid if the RX_DMA_VTAG bit is set. Fixes frames wrongly marked with VLAN tags. Signed-off-by: Felix Fietkau <nbd@nbd.name> [Ilya: fix commit message] Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skbPhillip Potter
Modify the header size check in geneve6_xmit_skb and geneve_xmit_skb to use pskb_inet_may_pull rather than pskb_network_may_pull. This fixes two kernel selftest failures introduced by the commit introducing the checks: IPv4 over geneve6: PMTU exceptions IPv4 over geneve6: PMTU exceptions - nexthop objects It does this by correctly accounting for the fact that IPv4 packets may transit over geneve IPv6 tunnels (and vice versa), and still fixes the uninit-value bug fixed by the original commit. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 6628ddfec758 ("net: geneve: check skb is large enough for IPv4/IPv6 header") Suggested-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: mana: Use int to check the return value of mana_gd_poll_cq()Dexuan Cui
mana_gd_poll_cq() may return -1 if an overflow error is detected (this should never happen unless there is a bug in the driver or the hardware). Fix the type of the variable "comp_read" by using int rather than u32. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23virtio-net: fix use-after-free in skb_gro_receiveXuan Zhuo
When "headroom" > 0, the actual allocated memory space is the entire page, so the address of the page should be used when passing it to build_skb(). BUG: KASAN: use-after-free in skb_gro_receive (net/core/skbuff.c:4260) Write of size 16 at addr ffff88811619fffc by task kworker/u9:0/534 CPU: 2 PID: 534 Comm: kworker/u9:0 Not tainted 5.12.0-rc7-custom-16372-gb150be05b806 #3382 Hardware name: QEMU MSN2700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: xprtiod xs_stream_data_receive_workfn [sunrpc] Call Trace: <IRQ> dump_stack (lib/dump_stack.c:122) print_address_description.constprop.0 (mm/kasan/report.c:233) kasan_report.cold (mm/kasan/report.c:400 mm/kasan/report.c:416) skb_gro_receive (net/core/skbuff.c:4260) tcp_gro_receive (net/ipv4/tcp_offload.c:266 (discriminator 1)) tcp4_gro_receive (net/ipv4/tcp_offload.c:316) inet_gro_receive (net/ipv4/af_inet.c:1545 (discriminator 2)) dev_gro_receive (net/core/dev.c:6075) napi_gro_receive (net/core/dev.c:6168 net/core/dev.c:6198) receive_buf (drivers/net/virtio_net.c:1151) virtio_net virtnet_poll (drivers/net/virtio_net.c:1415 drivers/net/virtio_net.c:1519) virtio_net __napi_poll (net/core/dev.c:6964) net_rx_action (net/core/dev.c:7033 net/core/dev.c:7118) __do_softirq (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:346) irq_exit_rcu (kernel/softirq.c:221 kernel/softirq.c:422 kernel/softirq.c:434) common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14)) </IRQ> Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23net: sock: remove the unnecessary check in proto_registerTonghao Zhang
tw_prot_cleanup will check the twsk_prot. Fixes: 0f5907af3913 ("net: Fix potential memory leak in proto_register()") Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23openvswitch: meter: remove rate from the bucket size calculationIlya Maximets
Implementation of meters supposed to be a classic token bucket with 2 typical parameters: rate and burst size. Burst size in this schema is the maximum number of bytes/packets that could pass without being rate limited. Recent changes to userspace datapath made meter implementation to be in line with the kernel one, and this uncovered several issues. The main problem is that maximum bucket size for unknown reason accounts not only burst size, but also the numerical value of rate. This creates a lot of confusion around behavior of meters. For example, if rate is configured as 1000 pps and burst size set to 1, this should mean that meter will tolerate bursts of 1 packet at most, i.e. not a single packet above the rate should pass the meter. However, current implementation calculates maximum bucket size as (rate + burst size), so the effective bucket size will be 1001. This means that first 1000 packets will not be rate limited and average rate might be twice as high as the configured rate. This also makes it practically impossible to configure meter that will have burst size lower than the rate, which might be a desirable configuration if the rate is high. Inability to configure low values of a burst size and overall inability for a user to predict what will be a maximum and average rate from the configured parameters of a meter without looking at the OVS and kernel code might be also classified as a security issue, because drop meters are frequently used as a way of protection from DoS attacks. This change removes rate from the calculation of a bucket size, making it in line with the classic token bucket algorithm and essentially making the rate and burst tolerance being predictable from a users' perspective. Same change proposed for the userspace implementation. Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23Merge tag 'arm-fixes-5.12-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "These should be the final fixes for v5.12. There is one fix for SD card detection on one Allwinner board, and a few fixes for the Tegra platform that I had already queued up for v5.13 due to a communication problem. This addresses MMC device ordering on multiple machines, audio support on Jetson AGX Xavier and suspend/resume on Jetson TX2" * tag 'arm-fixes-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS arm64: tegra: Move clocks from RT5658 endpoint to device node arm64: tegra: Fix mmc0 alias for Jetson Xavier NX arm64: tegra: Set fw_devlink=on for Jetson TX2 arm64: tegra: Add unit-address for ACONNECT on Tegra186