git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-02-20	net: add location to trace_consume_skb()	Eric Dumazet
	kfree_skb() includes the location, it makes sense to add it to consume_skb() as well. After patch: taskd_EventMana 8602 [004] 420.406239: skb:consume_skb: skbaddr=0xffff893a4a6d0500 location=unix_stream_read_generic swapper 0 [011] 422.732607: skb:consume_skb: skbaddr=0xffff89597f68cee0 location=mlx4_en_free_tx_desc discipline 9141 [043] 423.065653: skb:consume_skb: skbaddr=0xffff893a487e9c00 location=skb_consume_udp swapper 0 [010] 423.073166: skb:consume_skb: skbaddr=0xffff8949ce9cdb00 location=icmpv6_rcv borglet 8672 [014] 425.628256: skb:consume_skb: skbaddr=0xffff8949c42e9400 location=netlink_dump swapper 0 [028] 426.263317: skb:consume_skb: skbaddr=0xffff893b1589dce0 location=net_rx_action wget 14339 [009] 426.686380: skb:consume_skb: skbaddr=0xffff893a51b552e0 location=tcp_rcv_state_process Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	xsk: support use vaddr as ring	Xuan Zhuo
	When we try to start AF_XDP on some machines with long running time, due to the machine's memory fragmentation problem, there is no sufficient contiguous physical memory that will cause the start failure. If the size of the queue is 8 * 1024, then the size of the desc[] is 8 * 1024 * 8 = 16 * PAGE, but we also add struct xdp_ring size, so it is 16page+. This is necessary to apply for a 4-order memory. If there are a lot of queues, it is difficult to these machine with long running time. Here, that we actually waste 15 pages. 4-Order memory is 32 pages, but we only use 17 pages. This patch replaces __get_free_pages() by vmalloc() to allocate memory to solve these problems. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	Merge branch 'taprio-queuemaxsdu-fixes'	Paolo Abeni
	Vladimir Oltean says: ==================== taprio queueMaxSDU fixes This fixes 3 issues noticed while attempting to reoffload the dynamically calculated queueMaxSDU values. These are: - Dynamic queueMaxSDU is not calculated correctly due to a lost patch - Dynamically calculated queueMaxSDU needs to be clamped on the low end - Dynamically calculated queueMaxSDU needs to be clamped on the high end ==================== Link: https://lore.kernel.org/r/20230215224632.2532685-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net/sched: taprio: dynamic max_sdu larger than the max_mtu is unlimited	Vladimir Oltean
	It makes no sense to keep randomly large max_sdu values, especially if larger than the device's max_mtu. These are visible in "tc qdisc show". Such a max_sdu is practically unlimited and will cause no packets for that traffic class to be dropped on enqueue. Just set max_sdu_dynamic to U32_MAX, which in the logic below causes taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user space of 0 (unlimited). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net/sched: taprio: don't allow dynamic max_sdu to go negative after stab ↵	Vladimir Oltean
	adjustment The overhead specified in the size table comes from the user. With small time intervals (or gates always closed), the overhead can be larger than the max interval for that traffic class, and their difference is negative. What we want to happen is for max_sdu_dynamic to have the smallest non-zero value possible (1) which means that all packets on that traffic class are dropped on enqueue. However, since max_sdu_dynamic is u32, a negative is represented as a large value and oversized dropping never happens. Use max_t with int to force a truncation of max_frm_len to no smaller than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no smaller than 1. Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net/sched: taprio: fix calculation of maximum gate durations	Vladimir Oltean
	taprio_calculate_gate_durations() depends on netdev_get_num_tc() and this returns 0. So it calculates the maximum gate durations for no traffic class. I had tested the blamed commit only with another patch in my tree, one which in the end I decided isn't valuable enough to submit ("net/sched: taprio: mask off bits in gate mask that exceed number of TCs"). The problem is that having this patch threw off my testing. By moving the netdev_set_num_tc() call earlier, we implicitly gave to taprio_calculate_gate_durations() the information it needed. Extract only the portion from the unsubmitted change which applies the mqprio configuration to the netdev earlier. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-15-vladimir.oltean@nxp.com/ Fixes: a306a90c8ffe ("net/sched: taprio: calculate tc gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	rxrpc: Fix overproduction of wakeups to recvmsg()	David Howells
	Fix three cases of overproduction of wakeups: (1) rxrpc_input_split_jumbo() conditionally notifies the app that there's data for recvmsg() to collect if it queues some data - and then its only caller, rxrpc_input_data(), goes and wakes up recvmsg() anyway. Fix the rxrpc_input_data() to only do the wakeup in failure cases. (2) If a DATA packet is received for a call by the I/O thread whilst recvmsg() is busy draining the call's rx queue in the app thread, the call will left on the recvmsg() queue for recvmsg() to pick up, even though there isn't any data on it. This can cause an unexpected recvmsg() with a 0 return and no MSG_EOR set after the reply has been posted to a service call. Fix this by discarding pending calls from the recvmsg() queue that don't need servicing yet. (3) Not-yet-completed calls get requeued after having data read from them, even if they have no data to read. Fix this by only requeuing them if they have data waiting on them; if they don't, the I/O thread will requeue them when data arrives or they fail. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/3386149.1676497685@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	Merge branch 'net-final-gsi-register-updates'	Paolo Abeni
	Alex Elder says: ==================== net: final GSI register updates I believe this is the last set of changes required to allow IPA v5.0 to be supported. There is a little cleanup work remaining, but that can happen in the next Linux release cycle. Otherwise we just need config data and register definitions for IPA v5.0 (and DTS updates). These are ready but won't be posted without further testing. The first patch in this series fixes a minor bug in a patch just posted, which I found too late. The second eliminates the GSI memory "adjustment"; this was done previously to avoid/delay the need to implement a more general way to define GSI register offsets. Note that this patch causes "checkpatch" warnings due to indentation that aligns with an open parenthesis. The third patch makes use of the newly-defined register offsets, to eliminate the need for a function that hid a few details. The next modifies a different helper function to work properly for IPA v5.0+. The fifth patch changes the way the event ring size is specified based on how it's now done for IPA v5.0+. And the last defines a new register required for IPA v5.0+. ==================== Link: https://lore.kernel.org/r/20230215195352.755744-1-elder@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: add HW_PARAM_4 GSI register	Alex Elder
	Starting at IPA v5.0, the number of event rings per EE is defined in a field in a new HW_PARAM_4 GSI register rather than HW_PARAM_2. Define this new register and its fields, and update the code that checks the number of rings supported by hardware to use the proper field based on IPA version. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: support different event ring encoding	Alex Elder
	Starting with IPA v5.0, a channel's event ring index is encoded in a field in the CH_C_CNTXT_1 GSI register rather than CH_C_CNTXT_0. Define a new field ID for the former register and encode the event ring in the appropriate register. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: avoid setting an undefined field	Alex Elder
	The GSI channel protocol field in the CH_C_CNTXT_0 GSI register is widened starting IPA v5.0, making the CHTYPE_PROTOCOL_MSB field added in IPA v4.5 unnecessary. Update the code to reflect this. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: kill ev_ch_e_cntxt_1_length_encode()	Alex Elder
	Now that we explicitly define each register field width there is no need to have a special encoding function for the event ring length. Add a field for this to the EV_CH_E_CNTXT_1 GSI register, and use it in place of ev_ch_e_cntxt_1_length_encode() (which can be removed). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: kill gsi->virt_raw	Alex Elder
	Starting at IPA v4.5, almost all GSI registers had their offsets changed by a fixed amount (shifted downward by 0xd000). Rather than defining offsets for all those registers dependent on version, an adjustment was applied for most register accesses. This was implemented in commit cdeee49f3ef7f ("net: ipa: adjust GSI register addresses"). It was later modified to be a bit more obvious about the adjusment, in commit 571b1e7e58ad3 ("net: ipa: use a separate pointer for adjusted GSI memory"). We now are able to define every GSI register with its own offset, so there's no need to implement this special adjustment. So get rid of the "virt_raw" pointer, and just maintain "virt" as the (non-adjusted) base address of I/O mapped GSI register memory. Redefine the offsets of all GSI registers (other than the INTER_EE ones, which were not subject to the adjustment) for IPA v4.5+, subtracting 0xd000 from their defined offsets instead. Move the ERROR_LOG and ERROR_LOG_CLR definitions further down in the register definition files so all registers are defined in order of their offset. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: ipa: fix an incorrect assignment	Alex Elder
	I spotted an error in a patch posted this week, unfortunately just after it got accepted. The effect of the bug is that time-based interrupt moderation is disabled. This is not technically a bug, but it is not what is intended. The problem is that a \|= assignment got implemented as a simple assignment, so the previously assigned value was ignored. Fixes: edc6158b18af ("net: ipa: define fields for event-ring related registers") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20	net: dpaa2-eth: do not always set xsk support in xdp_features flag	Lorenzo Bianconi
	Do not always add NETDEV_XDP_ACT_XSK_ZEROCOPY bit in xdp_features flag but check if the NIC really supports it. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/3dba6ea42dc343a9f2d7d1a6a6a6c173235e1ebf.1676471386.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-18	ieee802154: Drop device trackers	Miquel Raynal
	In order to prevent a device from disappearing when a background job was started, dev_hold() and dev_put() calls were made. During the stabilization phase of the scan/beacon features, it was later decided that removing the device while a background job was ongoing was a valid use case, and we should instead stop the background job and then remove the device, rather than prevent the device from being removed. This is what is currently done, which means manually reference counting the device during background jobs is no longer needed. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Fixes: 9bc114504b07 ("ieee802154: Add support for user beaconing requests") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-7-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18	mac802154: Fix an always true condition	Miquel Raynal
	At this stage we simply do not care about the delayed work value, because active scan is not yet supported, so we can blindly queue another work once a beacon has been sent. It fixes a smatch warning: mac802154_beacon_worker() warn: always true condition '(local->beacon_interval >= 0) => (0-u32max >= 0)' Fixes: 3accf4762734 ("mac802154: Handle basic beaconing") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-6-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18	mac802154: Send beacons using the MLME Tx path	Miquel Raynal
	Using ieee802154_subif_start_xmit() to bypass the net queue when sending beacons is broken because it does not acquire the HARD_TX_LOCK(), hence not preventing datagram buffers to be smashed by beacons upon contention situation. Using the mlme_tx helper is not the best fit either but at least it is not buggy and has little-to-no performance hit. More details are given in the comment explaining this choice in the code. Fixes: 3accf4762734 ("mac802154: Handle basic beaconing") Reported-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-5-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18	ieee802154: Change error code on monitor scan netlink request	Miquel Raynal
	Returning EPERM gives the impression that "right now" it is not possible, but "later" it could be, while what we want to express is the fact that this is not currently supported at all (might change in the future). So let's return EOPNOTSUPP instead. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Suggested-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-4-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18	ieee802154: Convert scan error messages to extack	Miquel Raynal
	Instead of printing error messages in the kernel log, let's use extack. When there is a netlink error returned that could be further specified with a string, use extack as well. Apply this logic to the very recent scan/beacon infrastructure. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Fixes: 9bc114504b07 ("ieee802154: Add support for user beaconing requests") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-3-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18	ieee802154: Use netlink policies when relevant on scan parameters	Miquel Raynal
	Instead of open-coding scan parameters (page, channels, duration, etc), let's use the existing NLA_POLICY* macros. This help greatly reducing the error handling and clarifying the overall logic. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-2-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18	net/mlx5e: RX, Remove doubtful unlikely call	Gal Pressman
	When building an skb in non-linear mode, it is not likely nor unlikely that the xdp buff has fragments, it depends on the size of the packet received. Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
2023-02-18	net/mlx5e: Fix outdated TLS comment	Tariq Toukan
	Comment is outdated since commit 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support"). Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-18	net/mlx5e: Remove unused function mlx5e_sq_xmit_simple	Tariq Toukan
	The last usage was removed as part of commit 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support"). Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-18	net/mlx5e: Allow offloading of ct 'new' match	Vlad Buslov
	Allow offloading filters that match on conntrack 'new' state in order to enable UDP NEW offload in the following patch. Unhardcode ct 'established' from ct modify header infrastructure code and determine correct ct state bit according to the metadata action 'cookie' field. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-18	net/mlx5e: Implement CT entry update	Vlad Buslov
	With support for UDP NEW offload the flow_table may now send updates for existing flows. Support properly replacing existing entries by updating flow restore_cookie and replacing the rule with new one with the same match but new mod_hdr action that sets updated ctinfo. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-18	net/mlx5: Simplify eq list traversal	Parav Pandit
	EQ list is read only while finding the matching EQ. Hence, avoid *_safe() version. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
2023-02-18	net/mlx5e: Remove redundant page argument in mlx5e_xdp_handle()	Tariq Toukan
	Remove the page parameter, it can be derived from the xdp_buff member of mlx5e_xdp_buff. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-18	net/mlx5e: Remove redundant page argument in mlx5e_xmit_xdp_buff()	Tariq Toukan
	Remove the page parameter, it can be derived from the xdp_buff. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-18	net/mlx5e: Switch to using napi_build_skb()	Tariq Toukan
	Use napi_build_skb() which uses NAPI percpu caches to obtain skbuff_head instead of inplace allocation. napi_build_skb() calls napi_skb_cache_get(), which returns a cached skb, or allocates a bulk of NAPI_SKB_CACHE_BULK (16) if cache is empty. Performance test: TCP single stream, single ring, single core, default MTU (1500B). Before: 26.5 Gbits/sec After: 30.1 Gbits/sec (+13.6%) Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
2023-02-17	selftests/bpf: Add bpf_fib_lookup test	Martin KaFai Lau
	This patch tests the bpf_fib_lookup helper when looking up a neigh in NUD_FAILED and NUD_STALE state. It also adds test for the new BPF_FIB_LOOKUP_SKIP_NEIGH flag. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-2-martin.lau@linux.dev
2023-02-17	bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup	Martin KaFai Lau
	The bpf_fib_lookup() also looks up the neigh table. This was done before bpf_redirect_neigh() was added. In the use case that does not manage the neigh table and requires bpf_fib_lookup() to lookup a fib to decide if it needs to redirect or not, the bpf prog can depend only on using bpf_redirect_neigh() to lookup the neigh. It also keeps the neigh entries fresh and connected. This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid the double neigh lookup when the bpf prog always call bpf_redirect_neigh() to do the neigh lookup. The params->smac output is skipped together when SKIP_NEIGH is set because bpf_redirect_neigh() will figure out the smac also. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev
2023-02-17	riscv, bpf: Add bpf trampoline support for RV64	Pu Lehui
	BPF trampoline is the critical infrastructure of the BPF subsystem, acting as a mediator between kernel functions and BPF programs. Numerous important features, such as using BPF program for zero overhead kernel introspection, rely on this key component. We can't wait to support bpf trampoline on RV64. The related tests have passed, as well as the test_verifier with no new failure ceses. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-5-pulehui@huaweicloud.com
2023-02-17	riscv, bpf: Add bpf_arch_text_poke support for RV64	Pu Lehui
	Implement bpf_arch_text_poke for RV64. For call scenario, to make BPF trampoline compatible with the kernel and BPF context, we follow the framework of RV64 ftrace to reserve 4 nops for BPF programs as function entry, and use auipc+jalr instructions for function call. However, since auipc+jalr call instruction is non-atomic operation, we need to use stop-machine to make sure instructions patching in atomic context. Also, we use auipc+jalr pair and need to patch in stop-machine context for jump scenario. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-4-pulehui@huaweicloud.com
2023-02-17	riscv, bpf: Factor out emit_call for kernel and bpf context	Pu Lehui
	The current emit_call function is not suitable for kernel function call as it store return value to bpf R0 register. We can separate it out for common use. Meanwhile, simplify judgment logic, that is, fixed function address can use jal or auipc+jalr, while the unfixed can use only auipc+jalr. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-3-pulehui@huaweicloud.com
2023-02-17	riscv: Extend patch_text for multiple instructions	Pu Lehui
	Extend patch_text for multiple instructions. This is the preparaiton for multiple instructions text patching in riscv BPF trampoline, and may be useful for other scenario. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-2-pulehui@huaweicloud.com
2023-02-17	Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"	Martin KaFai Lau
	This reverts commit 6c20822fada1b8adb77fa450d03a0d449686a4a9. build bot failed on arch with different cache line size: https://lore.kernel.org/bpf/50c35055-afa9-d01e-9a05-ea5351280e4f@intel.com/ Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-17	selftests/bpf: Add global subprog context passing tests	Andrii Nakryiko
	Add tests validating that it's possible to pass context arguments into global subprogs for various types of programs, including a particularly tricky KPROBE programs (which cover kprobes, uprobes, USDTs, a vast and important class of programs). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230216045954.3002473-4-andrii@kernel.org
2023-02-17	selftests/bpf: Convert test_global_funcs test to test_loader framework	Andrii Nakryiko
	Convert 17 test_global_funcs subtests into test_loader framework for easier maintenance and more declarative way to define expected failures/successes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230216045954.3002473-3-andrii@kernel.org
2023-02-17	bpf: Fix global subprog context argument resolution logic	Andrii Nakryiko
	KPROBE program's user-facing context type is defined as typedef bpf_user_pt_regs_t. This leads to a problem when trying to passing kprobe/uprobe/usdt context argument into global subprog, as kernel always strip away mods and typedefs of user-supplied type, but takes expected type from bpf_ctx_convert as is, which causes mismatch. Current way to work around this is to define a fake struct with the same name as expected typedef: struct bpf_user_pt_regs_t {}; __noinline my_global_subprog(struct bpf_user_pt_regs_t *ctx) { ... } This patch fixes the issue by resolving expected type, if it's not a struct. It still leaves the above work-around working for backwards compatibility. Fixes: 91cc1a99740e ("bpf: Annotate context types") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230216045954.3002473-2-andrii@kernel.org
2023-02-17	LoongArch, bpf: Use 4 instructions for function address in JIT	Hengqi Chen
	This patch fixes the following issue of function calls in JIT, like: [ 29.346981] multi-func JIT bug 105 != 103 The issus can be reproduced by running the "inline simple bpf_loop call" verifier test. This is because we are emiting 2-4 instructions for 64-bit immediate moves. During the first pass of JIT, the placeholder address is zero, emiting two instructions for it. In the extra pass, the function address is in XKVRANGE, emiting four instructions for it. This change the instruction index in JIT context. Let's always use 4 instructions for function address in JIT. So that the instruction sequences don't change between the first pass and the extra pass for function calls. Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support") Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/bpf/20230214152633.2265699-1-hengqi.chen@gmail.com
2023-02-17	wifi: rtl8xxxu: add LEDS_CLASS dependency	Arnd Bergmann
	rtl8xxxu now unconditionally uses LEDS_CLASS, so a Kconfig dependency is required to avoid link errors: aarch64-linux-ld: drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.o: in function `rtl8xxxu_disconnect': rtl8xxxu_core.c:(.text+0x730): undefined reference to `led_classdev_unregister' ERROR: modpost: "led_classdev_unregister" [drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.ko] undefined! ERROR: modpost: "led_classdev_register_ext" [drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.ko] undefined! Fixes: 3be01622995b ("wifi: rtl8xxxu: Register the LED and make it blink") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230217095910.2480356-1-arnd@kernel.org
2023-02-17	bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state	Martin KaFai Lau
	The bpf_fib_lookup() helper does not only look up the fib (ie. route) but it also looks up the neigh. Before returning the neigh, the helper does not check for NUD_VALID. When a neigh state (neigh->nud_state) is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper still returns SUCCESS instead of NO_NEIGH in this case. Because of the SUCCESS return value, the bpf prog directly uses the returned dmac and ends up filling all zero in the eth header. This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is not valid. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217004150.2980689-3-martin.lau@linux.dev
2023-02-17	bpf: Disable bh in bpf_test_run for xdp and tc prog	Martin KaFai Lau
	Some of the bpf helpers require bh disabled. eg. The bpf_fib_lookup helper that will be used in a latter selftest. In particular, it calls ___neigh_lookup_noref that expects the bh disabled. This patch disables bh before calling bpf_prog_run[_xdp], so the testing prog can also use those helpers. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217004150.2980689-2-martin.lau@linux.dev
2023-02-17	xsk: check IFF_UP earlier in Tx path	Maciej Fijalkowski
	Xsk Tx can be triggered via either sendmsg() or poll() syscalls. These two paths share a call to common function xsk_xmit() which has two sanity checks within. A pseudo code example to show the two paths: __xsk_sendmsg() : xsk_poll(): if (unlikely(!xsk_is_bound(xs))) if (unlikely(!xsk_is_bound(xs))) return -ENXIO; return mask; if (unlikely(need_wait)) (...) return -EOPNOTSUPP; xsk_xmit() mark napi id (...) xsk_xmit() xsk_xmit(): if (unlikely(!(xs->dev->flags & IFF_UP))) return -ENETDOWN; if (unlikely(!xs->tx)) return -ENOBUFS; As it can be observed above, in sendmsg() napi id can be marked on interface that was not brought up and this causes a NULL ptr dereference: [31757.505631] BUG: kernel NULL pointer dereference, address: 0000000000000018 [31757.512710] #PF: supervisor read access in kernel mode [31757.517936] #PF: error_code(0x0000) - not-present page [31757.523149] PGD 0 P4D 0 [31757.525726] Oops: 0000 [#1] PREEMPT SMP NOPTI [31757.530154] CPU: 26 PID: 95641 Comm: xdpsock Not tainted 6.2.0-rc5+ #40 [31757.536871] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [31757.547457] RIP: 0010:xsk_sendmsg+0xde/0x180 [31757.551799] Code: 00 75 a2 48 8b 00 a8 04 75 9b 84 d2 74 69 8b 85 14 01 00 00 85 c0 75 1b 48 8b 85 28 03 00 00 48 8b 80 98 00 00 00 48 8b 40 20 <8b> 40 18 89 85 14 01 00 00 8b bd 14 01 00 00 81 ff 00 01 00 00 0f [31757.570840] RSP: 0018:ffffc90034f27dc0 EFLAGS: 00010246 [31757.576143] RAX: 0000000000000000 RBX: ffffc90034f27e18 RCX: 0000000000000000 [31757.583389] RDX: 0000000000000001 RSI: ffffc90034f27e18 RDI: ffff88984cf3c100 [31757.590631] RBP: ffff88984714a800 R08: ffff88984714a800 R09: 0000000000000000 [31757.597877] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffa [31757.605123] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000 [31757.612364] FS: 00007fb4c5931180(0000) GS:ffff88afdfa00000(0000) knlGS:0000000000000000 [31757.620571] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [31757.626406] CR2: 0000000000000018 CR3: 000000184b41c003 CR4: 00000000007706e0 [31757.633648] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [31757.640894] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [31757.648139] PKRU: 55555554 [31757.650894] Call Trace: [31757.653385] <TASK> [31757.655524] sock_sendmsg+0x8f/0xa0 [31757.659077] ? sockfd_lookup_light+0x12/0x70 [31757.663416] __sys_sendto+0xfc/0x170 [31757.667051] ? do_sched_setscheduler+0xdb/0x1b0 [31757.671658] __x64_sys_sendto+0x20/0x30 [31757.675557] do_syscall_64+0x38/0x90 [31757.679197] entry_SYSCALL_64_after_hwframe+0x72/0xdc [31757.687969] Code: 8e f6 ff 44 8b 4c 24 2c 4c 8b 44 24 20 41 89 c4 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 3a 44 89 e7 48 89 44 24 08 e8 b5 8e f6 ff 48 [31757.707007] RSP: 002b:00007ffd49c73c70 EFLAGS: 00000293 ORIG_RAX: 000000000000002c [31757.714694] RAX: ffffffffffffffda RBX: 000055a996565380 RCX: 00007fb4c5727c16 [31757.721939] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 [31757.729184] RBP: 0000000000000040 R08: 0000000000000000 R09: 0000000000000000 [31757.736429] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000 [31757.743673] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [31757.754940] </TASK> To fix this, let's make xsk_xmit a function that will be responsible for generic Tx, where RCU is handled accordingly and pull out sanity checks and xs->zc handling. Populate sanity checks to __xsk_sendmsg() and xsk_poll(). Fixes: ca2e1a627035 ("xsk: Mark napi_id on sendmsg()") Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230215143309.13145-1-maciej.fijalkowski@intel.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2023-02-17	netfilter: let reset rules clean out conntrack entries	Florian Westphal
	iptables/nftables support responding to tcp packets with tcp resets. The generated tcp reset packet passes through both output and postrouting netfilter hooks, but conntrack will never see them because the generated skb has its ->nfct pointer copied over from the packet that triggered the reset rule. If the reset rule is used for established connections, this may result in the conntrack entry to be around for a very long time (default timeout is 5 days). One way to avoid this would be to not copy the nf_conn pointer so that the rest packet passes through conntrack too. Problem is that output rules might not have the same conntrack zone setup as the prerouting ones, so its possible that the reset skb won't find the correct entry. Generating a template entry for the skb seems error prone as well. Add an explicit "closing" function that switches a confirmed conntrack entry to closed state and wire this up for tcp. If the entry isn't confirmed, no action is needed because the conntrack entry will never be committed to the table. Reported-by: Russel King <linux@armlinux.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-17	Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net	David S. Miller
	Some of the devlink bits were tricky, but I think I got it right. Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-17	wifi: iwlegacy: avoid fortify warning	Johannes Berg
	There are two different alive messages, the "init" one is bigger than the other one, so we have a fortify read warn here. Avoid it by copying from the variable-sized 'raw' instead. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230216203444.134310-1-johannes@sipsolutions.net
2023-02-17	wifi: iwlwifi: mvm: remove unused iwl_dbgfs_is_match()	Johannes Berg
	This inline function is unused, remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230216205754.d500dcc2e90c.Id87df297263f86b5bba002f7cbb387abc13adf53@changeid
2023-02-17	wifi: rtw89: fix AP mode authentication transmission failed	Po-Hao Huang
	For some ICs, packets can't be sent correctly without initializing CMAC table first. Previous flow do this initialization after associated, results in authentication response fails to transmit. Move the initialization up front to a proper place to solve this. Signed-off-by: Po-Hao Huang <phhuang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230216082807.22285-1-pkshih@realtek.com