summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-03-23net/sched: fix incorrect vlan_push_eth dest fieldLouis Peens
Seems like a potential copy-paste bug slipped in here, the second memcpy should of course be populating src and not dest. Fixes: ab95465cde23 ("net/sched: add vlan push_eth and pop_eth action to the hardware IR") Signed-off-by: Louis Peens <louis.peens@corigine.com> Link: https://lore.kernel.org/r/20220323092506.21639-1-louis.peens@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22net/mlx5e: Fix build warning, detected write beyond size of fieldSaeed Mahameed
When merged with Linus tree, the cited patch below will cause the following build warning: In function 'fortify_memset_chk', inlined from 'mlx5e_xmit_xdp_frame' at drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c:438:3: include/linux/fortify-string.h:242:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix that by grouping the fields to memeset in struct_group() to avoid the false alarm. Fixes: 9ded70fa1d81 ("net/mlx5e: Don't prefill WQEs in XDP SQ in the multi buffer mode") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20220322172224.31849-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-03-21 v2 We've added 137 non-merge commits during the last 17 day(s) which contain a total of 143 files changed, 7123 insertions(+), 1092 deletions(-). The main changes are: 1) Custom SEC() handling in libbpf, from Andrii. 2) subskeleton support, from Delyan. 3) Use btf_tag to recognize __percpu pointers in the verifier, from Hao. 4) Fix net.core.bpf_jit_harden race, from Hou. 5) Fix bpf_sk_lookup remote_port on big-endian, from Jakub. 6) Introduce fprobe (multi kprobe) _without_ arch bits, from Masami. The arch specific bits will come later. 7) Introduce multi_kprobe bpf programs on top of fprobe, from Jiri. 8) Enable non-atomic allocations in local storage, from Joanne. 9) Various var_off ptr_to_btf_id fixed, from Kumar. 10) bpf_ima_file_hash helper, from Roberto. 11) Add "live packet" mode for XDP in BPF_PROG_RUN, from Toke. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (137 commits) selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" bpftool: Fix a bug in subskeleton code generation bpf: Fix bpf_prog_pack when PMU_SIZE is not defined bpf: Fix bpf_prog_pack for multi-node setup bpf: Fix warning for cast from restricted gfp_t in verifier bpf, arm: Fix various typos in comments libbpf: Close fd in bpf_object__reuse_map bpftool: Fix print error when show bpf map bpf: Fix kprobe_multi return probe backtrace Revert "bpf: Add support to inline bpf_get_func_ip helper on x86" bpf: Simplify check in btf_parse_hdr() selftests/bpf/test_lirc_mode2.sh: Exit with proper code bpf: Check for NULL return from bpf_get_btf_vmlinux selftests/bpf: Test skipping stacktrace bpf: Adjust BPF stack helper functions to accommodate skip > 0 bpf: Select proper size for bpf_prog_pack ... ==================== Link: https://lore.kernel.org/r/20220322050159.5507-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22netdevice: add missing dm_private kdocJakub Kicinski
Building htmldocs complains: include/linux/netdevice.h:2295: warning: Function parameter or member 'dm_private' not described in 'net_device' Fixes: b26ef81c46ed ("drop_monitor: remove quadratic behavior") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220322051053.1883186-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-21net: Revert the softirq will run annotation in ____napi_schedule().Sebastian Andrzej Siewior
The lockdep annotation lockdep_assert_softirq_will_run() expects that either hard or soft interrupts are disabled because both guaranty that the "raised" soft-interrupts will be processed once the context is left. This triggers in flush_smp_call_function_from_idle() but it this case it explicitly calls do_softirq() in case of pending softirqs. Revert the "softirq will run" annotation in ____napi_schedule() and move the check back to __netif_rx() as it was. Keep the IRQ-off assert in ____napi_schedule() because this is always required. Fixes: fbd9a2ceba5c7 ("net: Add lockdep asserts to ____napi_schedule().") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/YjhD3ZKWysyw8rc6@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-21devlink: add explicitly locked flavor of the rate node APIsJakub Kicinski
We'll need an explicitly locked rate node API for netdevsim to switch eswitch mode setting to locked. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-20bpf: Adjust BPF stack helper functions to accommodate skip > 0Namhyung Kim
Let's say that the caller has storage for num_elem stack frames. Then, the BPF stack helper functions walk the stack for only num_elem frames. This means that if skip > 0, one keeps only 'num_elem - skip' frames. This is because it sets init_nr in the perf_callchain_entry to the end of the buffer to save num_elem entries only. I believe it was because the perf callchain code unwound the stack frames until it reached the global max size (sysctl_perf_event_max_stack). However it now has perf_callchain_entry_ctx.max_stack to limit the iteration locally. This simplifies the code to handle init_nr in the BPF callstack entries and removes the confusion with the perf_event's __PERF_SAMPLE_CALLCHAIN_EARLY which sets init_nr to 0. Also change the comment on bpf_get_stack() in the header file to be more explicit what the return value means. Fixes: c195651e565a ("bpf: add bpf_get_stack helper") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com Link: https://lore.kernel.org/bpf/20220314182042.71025-1-namhyung@kernel.org Based-on-patch-by: Eugene Loh <eugene.loh@oracle.com>
2022-03-20bpf: Enable non-atomic allocations in local storageJoanne Koong
Currently, local storage memory can only be allocated atomically (GFP_ATOMIC). This restriction is too strict for sleepable bpf programs. In this patch, the verifier detects whether the program is sleepable, and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate down to the local storage functions that allocate memory. Please note that bpf_task/sk/inode_storage_update_elem functions are invoked by userspace applications through syscalls. Preemption is disabled before bpf_task/sk/inode_storage_update_elem is called, which means they will always have to allocate memory atomically. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220318045553.3091807-2-joannekoong@fb.com
2022-03-20netfilter: nft_fib: add reduce supportFlorian Westphal
The fib expression stores to a register, so we can't add empty stub. Check that the register that is being written is in fact redundant. In most cases, this is expected to cancel tracking as re-use is unlikely. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20netfilter: nft_meta: extend reduce support to bridge familyFlorian Westphal
its enough to export the meta get reduce helper and then call it from nft_meta_bridge too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20netfilter: nf_tables: cancel tracking for clobbered destination registersPablo Neira Ayuso
Output of expressions might be larger than one single register, this might clobber existing data. Reset tracking for all destination registers that required to store the expression output. This patch adds three new helper functions: - nft_reg_track_update: cancel previous register tracking and update it. - nft_reg_track_cancel: cancel any previous register tracking info. - __nft_reg_track_cancel: cancel only one single register tracking info. Partial register clobbering detection is also supported by checking the .num_reg field which describes the number of register that are used. This patch updates the following expressions: - meta_bridge - bitwise - byteorder - meta - payload to use these helper functions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20netfilter: nf_tables: do not reduce read-only expressionsPablo Neira Ayuso
Skip register tracking for expressions that perform read-only operations on the registers. Define and use a cookie pointer NFT_REDUCE_READONLY to avoid defining stubs for these expressions. This patch re-enables register tracking which was disabled in ed5f85d42290 ("netfilter: nf_tables: disable register tracking"). Follow up patches add remaining register tracking for existing expressions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20netfilter: conntrack: Add and use nf_ct_set_auto_assign_helper_warned()Phil Sutter
The function sets the pernet boolean to avoid the spurious warning from nf_ct_lookup_helper() when assigning conntrack helpers via nftables. Fixes: 1a64edf54f55 ("netfilter: nft_ct: add helper set support") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-19Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/David S. Miller
ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2022-03-19 1) Delete duplicated functions that calls same xfrm_api_check. From Leon Romanovsky. 2) Align userland API of the default policy structure to the internal structures. From Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-18Merge tag 'for-net-next-2022-03-18' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for Asus TF103C - Add support for Realtek RTL8852B - Add support for Realtek RTL8723BE - Add WBS support to mt7921s * tag 'for-net-next-2022-03-18' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (24 commits) Bluetooth: ath3k: remove superfluous header files Bluetooth: bcm203x: remove superfluous header files Bluetooth: hci_bcm: Add the Asus TF103C to the bcm_broken_irq_dmi_table Bluetooth: mt7921s: Add WBS support Bluetooth: mt7921s: Add .btmtk_get_codec_config_data Bluetooth: mt7921s: Add .get_data_path_id Bluetooth: mt7921s: Set HCI_QUIRK_VALID_LE_STATES Bluetooth: btmtksdio: Fix kernel oops in btmtksdio_interrupt Bluetooth: btmtkuart: fix error handling in mtk_hci_wmt_sync() Bluetooth: call hci_le_conn_failed with hdev lock in hci_le_conn_failed Bluetooth: Send AdvMonitor Dev Found for all matched devices Bluetooth: msft: Clear tracked devices on resume Bluetooth: fix incorrect nonblock bitmask in bt_sock_wait_ready() Bluetooth: Don't assign twice the same value Bluetooth: btrtl: Add support for RTL8852B Bluetooth: hci_uart: add missing NULL check in h5_enqueue Bluetooth: Fix use after free in hci_send_acl Bluetooth: btusb: Use quirk to skip HCI_FLT_CLEAR_ALL on fake CSR controllers Bluetooth: hci_sync: Add a new quirk to skip HCI_FLT_CLEAR_ALL Bluetooth: btmtkuart: fix the conflict between mtk and msft vendor event ... ==================== Link: https://lore.kernel.org/r/20220318224752.1477292-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: set default rss queues num to physical cores / 2Íñigo Huguet
Network drivers can call to netif_get_num_default_rss_queues to get the default number of receive queues to use. Right now, this default number is min(8, num_online_cpus()). Instead, as suggested by Jakub, use the number of physical cores divided by 2 as a way to avoid wasting CPU resources and to avoid using both CPU threads, but still allowing to scale for high-end processors with many cores. As an exception, select 2 queues for processors with 2 cores, because otherwise it won't take any advantage of RSS despite being SMP capable. Tested: Processor Intel Xeon E5-2620 (2 sockets, 6 cores/socket, 2 threads/core). NIC Broadcom NetXtreme II BCM57810 (10GBps). Ran some tests with `perf stat iperf3 -R`, with parallelisms of 1, 8 and 24, getting the following results: - Number of queues: 6 (instead of 8) - Network throughput: not affected - CPU usage: utilized 0.05-0.12 CPUs more than before (having 24 CPUs this is only 0.2-0.5% higher) - Reduced the number of context switches by 7-50%, being more noticeable when using a higher number of parallel threads. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Link: https://lore.kernel.org/r/20220315091832.13873-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18Bluetooth: fix incorrect nonblock bitmask in bt_sock_wait_ready()Gavin Li
Callers pass msg->msg_flags as flags, which contains MSG_DONTWAIT instead of O_NONBLOCK. Signed-off-by: Gavin Li <gavin@matician.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-03-18Bluetooth: hci_sync: Add a new quirk to skip HCI_FLT_CLEAR_ALLIsmael Ferreras Morezuelas
Some controllers have problems with being sent a command to clear all filtering. While the HCI code does not unconditionally send a clear-all anymore at BR/EDR setup (after the state machine refactor), there might be more ways of hitting these codepaths in the future as the kernel develops. Cc: stable@vger.kernel.org Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-03-18Merge tag 'wireless-next-2022-03-18' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v5.18 Third set of patches for v5.18. Smaller set this time, support for mt7921u and some work on MBSSID support. Also a workaround for rfkill userspace event. Major changes: mac80211 * MBSSID beacon handling in AP mode rfkill * make new event layout opt-in to workaround buggy user space rtlwifi * support On Networks N150 device id mt76 * mt7915: MBSSID and 6 GHz band support * new driver mt7921u ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-18rfkill: make new event layout opt-inJohannes Berg
Again new complaints surfaced that we had broken the ABI here, although previously all the userspace tools had agreed that it was their mistake and fixed it. Yet now there are cases (e.g. RHEL) that want to run old userspace with newer kernels, and thus are broken. Since this is a bit of a whack-a-mole thing, change the whole extensibility scheme of rfkill to no longer just rely on the message lengths, but instead require userspace to opt in via a new ioctl to a given maximum event size that it is willing to understand. By default, set that to RFKILL_EVENT_SIZE_V1 (8), so that the behaviour for userspace not calling the ioctl will look as if it's just running on an older kernel. Fixes: 14486c82612a ("rfkill: add a reason to the HW rfkill state") Cc: stable@vger.kernel.org # 5.11+ Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220316212749.16491491b270.Ifcb1950998330a596f29a2a162e00b7546a1d6d0@changeid
2022-03-18Merge tag 'mlx5-updates-2022-03-17' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-03-17 1) From Maxim Mikityanskiy, Datapath improvements in preparation for XDP multi buffer This series contains general improvements for the datapath that are useful for the upcoming XDP multi buffer support: a. Non-linear legacy RQ: validate MTU for robustness, build the linear part of SKB over the first hardware fragment (instead of copying the packet headers), adjust headroom calculations to allow enabling headroom in the non-linear mode (useful for XDP multi buffer). b. XDP: do the XDP program test before function call, optimize parameters of mlx5e_xdp_handle. 2) From Rongwei Liu, DR, reduce steering memory usage Currently, mlx5 driver uses mlx5_htbl/chunk/ste to organize steering logic. However there is a little memory waste. This update targets to reduce steering memory footprint by: a. Adjust struct member layout. b. Remove duplicated indicator by using simple functions call. With 500k TX rules(3 ste) plus 500k RX rules(6 stes), these patches can save around 17% memory. 3) Three cleanup commits at the end of this series. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-18xfrm: rework default policy structureNicolas Dichtel
This is a follow up of commit f8d858e607b2 ("xfrm: make user policy API complete"). The goal is to align userland API to the internal structures. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-17bpf: Add cookie support to programs attached with kprobe multi linkJiri Olsa
Adding support to call bpf_get_attach_cookie helper from kprobe programs attached with kprobe multi link. The cookie is provided by array of u64 values, where each value is paired with provided function address or symbol with the same array index. When cookie array is provided it's sorted together with addresses (check bpf_kprobe_multi_cookie_swap). This way we can find cookie based on the address in bpf_get_attach_cookie helper. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-7-jolsa@kernel.org
2022-03-17bpf: Add multi kprobe linkJiri Olsa
Adding new link type BPF_LINK_TYPE_KPROBE_MULTI that attaches kprobe program through fprobe API. The fprobe API allows to attach probe on multiple functions at once very fast, because it works on top of ftrace. On the other hand this limits the probe point to the function entry or return. The kprobe program gets the same pt_regs input ctx as when it's attached through the perf API. Adding new attach type BPF_TRACE_KPROBE_MULTI that allows attachment kprobe to multiple function with new link. User provides array of addresses or symbols with count to attach the kprobe program to. The new link_create uapi interface looks like: struct { __u32 flags; __u32 cnt; __aligned_u64 syms; __aligned_u64 addrs; } kprobe_multi; The flags field allows single BPF_TRACE_KPROBE_MULTI bit to create return multi kprobe. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-4-jolsa@kernel.org
2022-03-17lib/sort: Add priv pointer to swap functionJiri Olsa
Adding support to have priv pointer in swap callback function. Following the initial change on cmp callback functions [1] and adding SWAP_WRAPPER macro to identify sort call of sort_r. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-2-jolsa@kernel.org [1] 4333fb96ca10 ("media: lib/sort.c: implement sort() variant taking context argument")
2022-03-17fprobe: Introduce FPROBE_FL_KPROBE_SHARED flag for fprobeMasami Hiramatsu
Introduce FPROBE_FL_KPROBE_SHARED flag for sharing fprobe callback with kprobes safely from the viewpoint of recursion. Since the recursion safety of the fprobe (and ftrace) is a bit different from the kprobes, this may cause an issue if user wants to run the same code from the fprobe and the kprobes. The kprobes has per-cpu 'current_kprobe' variable which protects the kprobe handler from recursion in any case. On the other hand, the fprobe uses only ftrace_test_recursion_trylock(), which will allow interrupt context calls another (or same) fprobe during the fprobe user handler is running. This is not a matter in cases if the common callback shared among the kprobes and the fprobe has its own recursion detection, or it can handle the recursion in the different contexts (normal/interrupt/NMI.) But if it relies on the 'current_kprobe' recursion lock, it has to check kprobe_running() and use kprobe_busy_*() APIs. Fprobe has FPROBE_FL_KPROBE_SHARED flag to do this. If your common callback code will be shared with kprobes, please set FPROBE_FL_KPROBE_SHARED *before* registering the fprobe, like; fprobe.flags = FPROBE_FL_KPROBE_SHARED; register_fprobe(&fprobe, "func*", NULL); This will protect your common callback from the nested call. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735293127.1084943.15687374237275817599.stgit@devnote2
2022-03-17fprobe: Add exit_handler supportMasami Hiramatsu
Add exit_handler to fprobe. fprobe + rethook allows us to hook the kernel function return. The rethook will be enabled only if the fprobe::exit_handler is set. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735290790.1084943.10601965782208052202.stgit@devnote2
2022-03-17rethook: Add a generic return hookMasami Hiramatsu
Add a return hook framework which hooks the function return. Most of the logic came from the kretprobe, but this is independent from kretprobe. Note that this is expected to be used with other function entry hooking feature, like ftrace, fprobe, adn kprobes. Eventually this will replace the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment, this is just an additional hook. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2
2022-03-17fprobe: Add ftrace based probe APIsMasami Hiramatsu
The fprobe is a wrapper API for ftrace function tracer. Unlike kprobes, this probes only supports the function entry, but this can probe multiple functions by one fprobe. The usage is similar, user will set their callback to fprobe::entry_handler and call register_fprobe*() with probed functions. There are 3 registration interfaces, - register_fprobe() takes filtering patterns of the functin names. - register_fprobe_ips() takes an array of ftrace-location addresses. - register_fprobe_syms() takes an array of function names. The registered fprobes can be unregistered with unregister_fprobe(). e.g. struct fprobe fp = { .entry_handler = user_handler }; const char *targets[] = { "func1", "func2", "func3"}; ... ret = register_fprobe_syms(&fp, targets, ARRAY_SIZE(targets)); ... unregister_fprobe(&fp); Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735283857.1084943.1154436951479395551.stgit@devnote2
2022-03-17ftrace: Add ftrace_set_filter_ips functionJiri Olsa
Adding ftrace_set_filter_ips function to be able to set filter on multiple ip addresses at once. With the kprobe multi attach interface we have cases where we need to initialize ftrace_ops object with thousands of functions, so having single function diving into ftrace_hash_move_and_update_ops with ftrace_lock is faster. The functions ips are passed as unsigned long array with count. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735282673.1084943.18310504594134769804.stgit@devnote2
2022-03-17net: dsa: pass extack to dsa_switch_ops :: port_mirror_add()Vladimir Oltean
Drivers might have error messages to propagate to user space, most common being that they support a single mirror port. Propagate the netlink extack so that they can inform user space in a verbal way of their limitations. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: mscc: ocelot: offload per-flow mirroring using tc-mirred and VCAP IS2Vladimir Oltean
Per-flow mirroring with the VCAP IS2 TCAM (in itself handled as an offload for tc-flower) is done by setting the MIRROR_ENA bit from the action vector of the filter. The packet is mirrored to the port mask configured in the ANA:ANA:MIRRORPORTS register (the same port mask as the destinations for port-based mirroring). Functionality was tested with: tc qdisc add dev swp3 clsact tc filter add dev swp3 ingress protocol ip \ flower skip_sw ip_proto icmp \ action mirred egress mirror dev swp1 and pinging through swp3, while seeing that the ICMP replies are mirrored towards swp1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: mscc: ocelot: add port mirroring support using tc-matchallVladimir Oltean
Ocelot switches perform port-based ingress mirroring if ANA:PORT:PORT_CFG field SRC_MIRROR_ENA is set, and egress mirroring if the port is in ANA:ANA:EMIRRORPORTS. Both ingress-mirrored and egress-mirrored frames are copied to the port mask from ANA:ANA:MIRRORPORTS. So the choice of limiting to a single mirror port via ocelot_mirror_get() and ocelot_mirror_put() may seem bizarre, but the hardware model doesn't map very well to the user space model. If the user wants to mirror the ingress of swp1 towards swp2 and the ingress of swp3 towards swp4, we'd have to program ANA:ANA:MIRRORPORTS with BIT(2) | BIT(4), and that would make swp1 be mirrored towards swp4 too, and swp3 towards swp2. But there are no tc-matchall rules to describe those actions. Now, we could offload a matchall rule with multiple mirred actions, one per desired mirror port, and force the user to stick to the multi-action rule format for subsequent matchall filters. But both DSA and ocelot have the flow_offload_has_one_action() check for the matchall offload, plus the fact that it will get cumbersome to cross-check matchall mirrors with flower mirrors (which will be added in the next patch). As a result, we limit the configuration to a single mirror port, with the possibility of lifting the restriction in the future. Frames injected from the CPU don't get egress-mirrored, since they are sent with the BYPASS bit in the injection frame header, and this bypasses the analyzer module (effectively also the mirroring logic). I don't know what to do/say about this. Functionality was tested with: tc qdisc add dev swp3 clsact tc filter add dev swp3 ingress \ matchall skip_sw \ action mirred egress mirror dev swp1 and pinging through swp3, while seeing that the ICMP replies are mirrored towards swp1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: Handle MST state changesTobias Waldekranz
Add the usual trampoline functionality from the generic DSA layer down to the drivers for MST state changes. When a state changes to disabled/blocking/listening, make sure to fast age any dynamic entries in the affected VLANs (those controlled by the MSTI in question). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: Pass VLAN MSTI migration notifications to driverTobias Waldekranz
Add the usual trampoline functionality from the generic DSA layer down to the drivers for VLAN MSTI migrations. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Add helper to query a port's MST stateTobias Waldekranz
This is useful for switchdev drivers who are offloading MST states into hardware. As an example, a driver may wish to flush the FDB for a port when it transitions from forwarding to blocking - which means that the previous state must be discoverable. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Add helper to check if MST is enabledTobias Waldekranz
This is useful for switchdev drivers that might want to refuse to join a bridge where MST is enabled, if the hardware can't support it. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Add helper to map an MSTI to a VID setTobias Waldekranz
br_mst_get_info answers the question: "On this bridge, which VIDs are mapped to the given MSTI?" This is useful in switchdev drivers, which might have to fan-out operations, relating to an MSTI, per VLAN. An example: When a port's MST state changes from forwarding to blocking, a driver may choose to flush the dynamic FDB entries on that port to get faster reconvergence of the network, but this should only be done in the VLANs that are managed by the MSTI in question. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Notify switchdev drivers of MST state changesTobias Waldekranz
Generate a switchdev notification whenever an MST state changes. This notification is keyed by the VLANs MSTI rather than the VID, since multiple VLANs may share the same MST instance. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrationsTobias Waldekranz
Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Notify switchdev drivers of MST mode changesTobias Waldekranz
Trigger a switchdev event whenever the bridge's MST mode is enabled/disabled. This allows constituent ports to either perform any required hardware config, or refuse the change if it not supported. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Support setting and reporting MST port statesTobias Waldekranz
Make it possible to change the port state in a given MSTI by extending the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The proposed iproute2 interface would be: bridge mst set dev <PORT> msti <MSTI> state <STATE> Current states in all applicable MSTIs can also be dumped via a corresponding RTM_GETLINK. The proposed iproute interface looks like this: $ bridge mst port msti vb1 0 state forwarding 100 state disabled vb2 0 state forwarding 100 state forwarding The preexisting per-VLAN states are still valid in the MST mode (although they are read-only), and can be queried as usual if one is interested in knowing a particular VLAN's state without having to care about the VID to MSTI mapping (in this example VLAN 20 and 30 are bound to MSTI 100): $ bridge -d vlan port vlan-id vb1 10 state forwarding mcast_router 1 20 state disabled mcast_router 1 30 state disabled mcast_router 1 40 state forwarding mcast_router 1 vb2 10 state forwarding mcast_router 1 20 state forwarding mcast_router 1 30 state forwarding mcast_router 1 40 state forwarding mcast_router 1 Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Allow changing a VLAN's MSTITobias Waldekranz
Allow a VLAN to move out of the CST (MSTI 0), to an independent tree. The user manages the VID to MSTI mappings via a global VLAN setting. The proposed iproute2 interface would be: bridge vlan global set dev br0 vid <VID> msti <MSTI> Changing the state in non-zero MSTIs is still not supported, but will be addressed in upcoming changes. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Multiple Spanning Tree (MST) modeTobias Waldekranz
Allow the user to switch from the current per-VLAN STP mode to an MST mode. Up to this point, per-VLAN STP states where always isolated from each other. This is in contrast to the MSTP standard (802.1Q-2018, Clause 13.5), where VLANs are grouped into MST instances (MSTIs), and the state is managed on a per-MSTI level, rather that at the per-VLAN level. Perhaps due to the prevalence of the standard, many switching ASICs are built after the same model. Therefore, add a corresponding MST mode to the bridge, which we can later add offloading support for in a straight-forward way. For now, all VLANs are fixed to MSTI 0, also called the Common Spanning Tree (CST). That is, all VLANs will follow the port-global state. Upcoming changes will make this actually useful by allowing VLANs to be mapped to arbitrary MSTIs and allow individual MSTI states to be changed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17Merge tag 'net-5.17-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, ipsec, and wireless. A few last minute revert / disable and fix patches came down from our sub-trees. We're not waiting for any fixes at this point. Current release - regressions: - Revert "netfilter: nat: force port remap to prevent shadowing well-known ports", restore working conntrack on asymmetric paths - Revert "ath10k: drop beacon and probe response which leak from other channel", restore working AP and mesh mode on QCA9984 - eth: intel: fix hang during reboot/shutdown Current release - new code bugs: - netfilter: nf_tables: disable register tracking, it needs more work to cover all corner cases Previous releases - regressions: - ipv6: fix skb_over_panic in __ip6_append_data when (admin-only) extension headers get specified - esp6: fix ESP over TCP/UDP, interpret ipv6_skip_exthdr's return value more selectively - bnx2x: fix driver load failure when FW not present in initrd Previous releases - always broken: - vsock: stop destroying unrelated sockets in nested virtualization - packet: fix slab-out-of-bounds access in packet_recvmsg() Misc: - add Paolo Abeni to networking maintainers!" * tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (26 commits) iavf: Fix hang during reboot/shutdown net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload net: bcmgenet: skip invalid partial checksums bnx2x: fix built-in kernel driver load failure net: phy: mscc: Add MODULE_FIRMWARE macros net: dsa: Add missing of_node_put() in dsa_port_parse_of net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit() Revert "ath10k: drop beacon and probe response which leak from other channel" hv_netvsc: Add check for kvmalloc_array iavf: Fix double free in iavf_reset_task ice: destroy flow director filter mutex after releasing VSIs ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats() Add Paolo Abeni to networking maintainers atm: eni: Add check for dma_map_single net/packet: fix slab-out-of-bounds access in packet_recvmsg() net: mdio: mscc-miim: fix duplicate debugfs entry net: phy: marvell: Fix invalid comparison in the resume and suspend functions esp6: fix check on ipv6_skip_exthdr's return value net: dsa: microchip: add spi_device_id tables netfilter: nf_tables: disable register tracking ...
2022-03-17net: veth: Account total xdp_frame len running ndo_xdp_xmitLorenzo Bianconi
Even if this is a theoretical issue since it is not possible to perform XDP_REDIRECT on a non-linear xdp_frame, veth driver does not account paged area in ndo_xdp_xmit function pointer. Introduce xdp_get_frame_len utility routine to get the xdp_frame full length and account total frame size running XDP_REDIRECT of a non-linear xdp frame into a veth device. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/54f9fd3bb65d190daf2c0bbae2f852ff16cfbaa0.1646989407.git.lorenzo@kernel.org
2022-03-17net/mlx5: Remove unused fill page array API functionTariq Toukan
mlx5_fill_page_array API function is not used. Remove it, reduce the number of exported functions. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17net/mlx5: Remove unused exported contiguous coherent buffer allocation APITariq Toukan
All WQ types moved to using the fragmented allocation API for coherent memory. Contiguous API is not used anymore. Remove it, reduce the number of exported functions. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17net: geneve: support IPv4/IPv6 as inner protocolEyal Birger
This patch adds support for encapsulating IPv4/IPv6 within GENEVE. In order to use this, a new IFLA_GENEVE_INNER_PROTO_INHERIT flag needs to be provided at device creation. This property cannot be changed for the time being. In case IP traffic is received on a non-tun device the drop count is increased. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20220316061557.431872-1-eyal.birger@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>