linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-08-25	selftests: xsk: Generate packet directly in umem	Magnus Karlsson
	Generate the packet directly in the umem instead of in a temporary buffer that is copied out. Simplifies the code and improves performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-14-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Simplify cleanup of ifobjects	Magnus Karlsson
	Simpify the cleanup of ifobjects right before the program exits by introducing functions for creating and destroying these objects. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-13-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Decrease sending speed	Magnus Karlsson
	Decrease sending speed to avoid potentially overflowing some buffers in the skb case that leads to dropped packets we cannot control (and thus the tests may generate false negatives). Decrease batch size and introduce a usleep in the transmit thread to not overflow the receiver. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-12-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Validate tx stats on tx thread	Magnus Karlsson
	Validate the tx stats on the Tx thread instead of the Rx thread. Depending on your settings, you might not be allowed to query the statistics of a socket you do not own, so better to do this on the correct thread to start with. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-11-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Simplify packet validation in xsk tests	Magnus Karlsson
	Simplify packet validation in the xsk selftests by performing it at once for every packet. The current code performed this per batch and did this on copied packet data. Make it simpler and faster by validating it at once and on the umem packet data thus skipping the copy and the memory allocation for the temprary buffer. The optional packet dump feature is also simplified in the same manner. Memory allocation and copying is removed and the dump is performed directly on the umem data. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-10-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Rename worker_* functions that are not thread entry points	Magnus Karlsson
	Rename worker_* functions that are not thread entry points to something else. This was confusing. Now only thread entry points are worker_something. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-9-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Disassociate umem size with packets sent	Magnus Karlsson
	Disassociate the number of packets sent with the number of buffers in the umem. This so we can loop over the umem to test more things. Set the size of the umem to be a multiple of 2M. A requirement for huge pages that are needed in unaligned mode. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-8-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Remove end-of-test packet	Magnus Karlsson
	Get rid of the end-of-test packet and just count the number of packets received and quit when the expected number as been received. Simplifies the code. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-7-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Simplify the retry code	Magnus Karlsson
	Simplify the retry code and make it more efficient by waiting first, instead of trying immediately which always fails due to the asynchronous nature of xsk socket close. Also decrease the wait time to significantly lower the run-time of the test suite. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-6-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Return correct error codes	Magnus Karlsson
	Return the correct error codes so they can be printed correctly. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-5-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Remove unused variables	Magnus Karlsson
	Remove unused variables and typedefs. The *_npkts variables are incremented but never used. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-4-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Remove the num_tx_packets option	Magnus Karlsson
	Remove the number of tx packet option as this should be decided by the test itself. Also change the number of packets to be sent to 4096 speeding up the execution. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-3-magnus.karlsson@gmail.com
2021-08-25	selftests: xsk: Remove color mode	Magnus Karlsson
	Remove color mode since it does not add any value and having less code means less maintenance which is a good thing. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825093722.10219-2-magnus.karlsson@gmail.com
2021-08-25	Merge branch 'bpf: Add bpf_task_pt_regs() helper'	Alexei Starovoitov
	Daniel Xu says: ==================== The motivation behind this helper is to access userspace pt_regs in a kprobe handler. uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a kprobe handler. The final case (kernelspace pt_regs in uprobe) is pretty rare (usermode helper) so I think that can be solved later if necessary. More concretely, this helper is useful in doing BPF-based DWARF stack unwinding. Currently the kernel can only do framepointer based stack unwinds for userspace code. This is because the DWARF state machines are too fragile to be computed in kernelspace [0]. The idea behind DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace stack (while in prog context) and send it up to userspace for unwinding (probably with libunwind) [1]. This would effectively enable profiling applications with -fomit-frame-pointer using kprobes and uprobes. [0]: https://lkml.org/lkml/2012/2/10/356 [1]: https://github.com/danobi/bpf-dwarf-walk Changes from v1: - Conwolidate BTF_ID decls for task_struct - Enable bpf_get_current_task_btf() for all prog types - Enable bpf_task_pt_regs() for all prog types - Use ASSERT_* macros instead of CHECK ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-08-25	bpf: selftests: Add bpf_task_pt_regs() selftest	Daniel Xu
	This test retrieves the uprobe's pt_regs in two different ways and compares the contents in an arch-agnostic way. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/5581eb8800f6625ec8813fe21e9dce1fbdef4937.1629772842.git.dxu@dxuuu.xyz
2021-08-25	bpf: Add bpf_task_pt_regs() helper	Daniel Xu
	The motivation behind this helper is to access userspace pt_regs in a kprobe handler. uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a kprobe handler. The final case (kernelspace pt_regs in uprobe) is pretty rare (usermode helper) so I think that can be solved later if necessary. More concretely, this helper is useful in doing BPF-based DWARF stack unwinding. Currently the kernel can only do framepointer based stack unwinds for userspace code. This is because the DWARF state machines are too fragile to be computed in kernelspace [0]. The idea behind DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace stack (while in prog context) and send it up to userspace for unwinding (probably with libunwind) [1]. This would effectively enable profiling applications with -fomit-frame-pointer using kprobes and uprobes. [0]: https://lkml.org/lkml/2012/2/10/356 [1]: https://github.com/danobi/bpf-dwarf-walk Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/e2718ced2d51ef4268590ab8562962438ab82815.1629772842.git.dxu@dxuuu.xyz
2021-08-25	bpf: Extend bpf_base_func_proto helpers with bpf_get_current_task_btf()	Daniel Xu
	bpf_get_current_task() is already supported so it's natural to also include the _btf() variant for btf-powered helpers. This is required for non-tracing progs to use bpf_task_pt_regs() in the next commit. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/f99870ed5f834c9803d73b3476f8272b1bb987c0.1629772842.git.dxu@dxuuu.xyz
2021-08-25	bpf: Consolidate task_struct BTF_ID declarations	Daniel Xu
	No need to have it defined 5 times. Once is enough. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/6dcefa5bed26fe1226f26683f36819bb53ec19a2.1629772842.git.dxu@dxuuu.xyz
2021-08-25	bpf: Add BTF_ID_LIST_GLOBAL_SINGLE macro	Daniel Xu
	Same as BTF_ID_LIST_SINGLE macro except defines a global ID. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/a867a97517df42fd3953eeb5454402b57e74538f.1629772842.git.dxu@dxuuu.xyz
2021-08-25	pipe: do FASYNC notifications for every pipe IO, not just state changes	Linus Torvalds
	It turns out that the SIGIO/FASYNC situation is almost exactly the same as the EPOLLET case was: user space really wants to be notified after every operation. Now, in a perfect world it should be sufficient to only notify user space on "state transitions" when the IO state changes (ie when a pipe goes from unreadable to readable, or from unwritable to writable). User space should then do as much as possible - fully emptying the buffer or what not - and we'll notify it again the next time the state changes. But as with EPOLLET, we have at least one case (stress-ng) where the kernel sent SIGIO due to the pipe being marked for asynchronous notification, but the user space signal handler then didn't actually necessarily read it all before returning (it read more than what was written, but since there could be multiple writes, it could leave data pending). The user space code then expected to get another SIGIO for subsequent writes - even though the pipe had been readable the whole time - and would only then read more. This is arguably a user space bug - and Colin King already fixed the stress-ng code in question - but the kernel regression rules are clear: it doesn't matter if kernel people think that user space did something silly and wrong. What matters is that it used to work. So if user space depends on specific historical kernel behavior, it's a regression when that behavior changes. It's on us: we were silly to have that non-optimal historical behavior, and our old kernel behavior was what user space was tested against. Because of how the FASYNC notification was tied to wakeup behavior, this was first broken by commits f467a6a66419 and 1b6b26ae7053 ("pipe: fix and clarify pipe read/write wakeup logic"), but at the time it seems nobody noticed. Probably because the stress-ng problem case ends up being timing-dependent too. It was then unwittingly fixed by commit 3a34b13a88ca ("pipe: make pipe writes always wake up readers") only to be broken again when by commit 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads"). And at that point the kernel test robot noticed the performance refression in the stress-ng.sigio.ops_per_sec case. So the "Fixes" tag below is somewhat ad hoc, but it matches when the issue was noticed. Fix it for good (knock wood) by simply making the kill_fasync() case separate from the wakeup case. FASYNC is quite rare, and we clearly shouldn't even try to use the "avoid unnecessary wakeups" logic for it. Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/ Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25	Merge branch 'for-v5.14' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucount fixes from Eric Biederman: "This branch fixes a regression that made it impossible to increase rlimits that had been converted to the ucount infrastructure, and also fixes a reference counting bug where the reference was not incremented soon enough. The fixes are trivial and the bugs have been encountered in the wild, and the fixes have been tested" * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Increase ucounts reference counter before the security hook ucounts: Fix regression preventing increasing of rlimits in init_user_ns
2021-08-25	ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()	Tuo Li
	kcalloc() is called to allocate memory for m->m_info, and if it fails, ceph_mdsmap_destroy() behind the label out_err will be called: ceph_mdsmap_destroy(m); In ceph_mdsmap_destroy(), m->m_info is dereferenced through: kfree(m->m_info[i].export_targets); To fix this possible null-pointer dereference, check m->m_info before the for loop to free m->m_info[i].export_targets. [ jlayton: fix up whitespace damage only kfree(m->m_info) if it's non-NULL ] Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-25	ceph: correctly handle releasing an embedded cap flush	Xiubo Li
	The ceph_cap_flush structures are usually dynamically allocated, but the ceph_cap_snap has an embedded one. When force umounting, the client will try to remove all the session caps. During this, it will free them, but that should not be done with the ones embedded in a capsnap. Fix this by adding a new boolean that indicates that the cap flush is embedded in a capsnap, and skip freeing it if that's set. At the same time, switch to using list_del_init() when detaching the i_list and g_list heads. It's possible for a forced umount to remove these objects but then handle_cap_flushsnap_ack() races in and does the list_del_init() again, corrupting memory. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52283 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-25	Revert "btrfs: compression: don't try to compress if we don't have enough pages"	Qu Wenruo
	This reverts commit f2165627319ffd33a6217275e5690b1ab5c45763. [BUG] It's no longer possible to create compressed inline extent after commit f2165627319f ("btrfs: compression: don't try to compress if we don't have enough pages"). [CAUSE] For compression code, there are several possible reasons we have a range that needs to be compressed while it's no more than one page. - Compressed inline write The data is always smaller than one sector and the test lacks the condition to properly recognize a non-inline extent. - Compressed subpage write For the incoming subpage compressed write support, we require page alignment of the delalloc range. And for 64K page size, we can compress just one page into smaller sectors. For those reasons, the requirement for the data to be more than one page is not correct, and is already causing regression for compressed inline data writeback. The idea of skipping one page to avoid wasting CPU time could be revisited in the future. [FIX] Fix it by reverting the offending commit. Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Link: https://lore.kernel.org/linux-btrfs/afa2742.c084f5d6.17b6b08dffc@tnonline.net Fixes: f2165627319f ("btrfs: compression: don't try to compress if we don't have enough pages") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25	Merge branch 'pktgen-samples-next'	David S. Miller
	Juhee Kang says: ==================== samples: pktgen: enhance the ability to print the execution results of samples This patch series improves the ability to print the execution result of pktgen samples by adding a line which calls the function before termination and adding trap SIGINT. Also, this series documents the latest pktgen usage options. Currently, pktgen samples print the execution result when terminated usually. However, sample03 is not working properly. This is results of sample04 and sample03: # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 1 Running... ctrl^C to stop Device: eth0@0 Result: OK: 19(c5+d13) usec, 1 (60byte,0frags) 51762pps 24Mb/sec (24845760bps) errors: 0 # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample03_burst_single_flow.sh -n 1 Running... ctrl^C to stop Because sample03 doesn't call the function which prints the execution result when terminated normally, unlike other samples. So the first commit solves this issue by adding a line which calls the function before termination. Also, all pktgen samples are able to send infinite messages per thread by setting the count option to 0, and pktgen is stopped by Ctrl-C. However, the sample besides sample{3...5} don't work appropriately because Ctrl-C stops the script, not just pktgen. This is results of samples: # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 0 Running... ctrl^C to stop ^CDevice: eth0@0 Result: OK: 569657(c569538+d118) usec, 84650 (60byte,0frags) 148597pps 71Mb/sec (71326560bps) errors: 0 # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -n 0 Running... ctrl^C to stop ^C # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample02_multiqueue.sh -n 0 Running... ctrl^C to stop ^C # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample06_numa_awared_queue_irq_affinity.sh -n 0 Running... ctrl^C to stop ^C # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_bench_xmit_mode_netif_receive.sh -n 0 Running... ctrl^C to stop ^C # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_bench_xmit_mode_queue_xmit.sh -n 0 Running... ctrl^C to stop ^C So the second commit solves this issue by adding trap SIGINT. Also, changes control_c function to print_results to maintain consistency with other samples on the first commit and second commit. And current pktgen.rst documentation doesn't add the latest pktgen sample usage options such as count and IPv6, and so on. Also, the old pktgen sample scripts are still included in the document. The old scripts were removed by the commit a4b6ade8359f ("samples/pktgen: remove remaining old pktgen sample scripts"). Thus, the last commit documents the latest pktgen sample usage and removes old sample scripts. And fixes a minor typo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	pktgen: document the latest pktgen usage options	Juhee Kang
	Currently, the pktgen.rst documentation doesn't cover the latest pktgen sample usage options such as count and IPv6, and so on. Also, this documentation includes the old sample scripts which are no longer use because it was removed by the commit a4b6ade8359f ("samples/pktgen : remove remaining old pktgen sample scripts") Thus, this commit documents pktgen sample usage using the latest options and removes old sample scripts, and fixes a minor typo. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	samples: pktgen: add trap SIGINT for printing execution result	Juhee Kang
	All pktgen samples can send indefinitely num messages per thread by setting the count option to 0(-n 0). If running sample with setting count 0 and press Ctrl-C to stop this program, the program prints the result of the execution so far. Currently, the samples besides sample{3...5} don't work properly. Because Ctrl-C stops the script, not just pktgen. This is results of samples: # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 0 Running... ctrl^C to stop ^CDevice: eth0@0 Result: OK: 569657(c569538+d118) usec, 84650 (60byte,0frags) 148597pps 71Mb/sec (71326560bps) errors: 0 # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -n 0 Running... ctrl^C to stop ^C In order to solve this, this commit adds trap SIGINT. Also, this commit changes control_c function to print_result to maintain consistency with other samples. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	samples: pktgen: fix to print when terminated normally	Juhee Kang
	Currently, most pktgen samples print the execution result when the program is terminated normally. However, sample03 doesn't work appropriately. This is results of samples: # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 1 Running... ctrl^C to stop Device: eth0@0 Result: OK: 19(c5+d13) usec, 1 (60byte,0frags) 51762pps 24Mb/sec (24845760bps) errors: 0 # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample03_burst_single_flow.sh -n 1 Running... ctrl^C to stop The reason why it doesn't print the execution result when the program is terminated usually is that sample03 doesn't call the function which prints the result, unlike other samples. So, this commit solves this issue by calling the function before termination. Also, this commit changes control_c function to print_result to maintain consistency with other samples. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	Merge branch 'octeontx2-traffic-shaping'	David S. Miller
	Sunil Goutham says: ==================== Octeontx2: Traffic shaping and SDP link config support This patch series adds support for traffic shaping configuration on all silicons available after 96xx C0. And also adds SDP link related configuration needed when Octeon is connected as an end-point and traffic needs to flow from end-point to host and vice versa. Series also has other changes like - New mbox messages in admin function driver for PF/VF drivers to retrieve available HW resource count. HW resources like block LFs, bandwidth profiles etc are covered. - Added PTP device ID for new CN10K and 95O silicons. - etc ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-af: Add mbox to retrieve bandwidth profile free count	Sunil Goutham
	Added mbox for PF/VF drivers to retrieve current ingress bandwidth profile free count. Also added current policer timeunit configuration info based on which ratelimiting decisions can be taken by PF/VF drivers. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-af: Remove channel verification while installing MCAM rules	Sunil Goutham
	New usecases are popping up where in user wants to install common MCAM filters for all interfaces. Having channel verification will result in duplicating such MCAM filters for each of the ingress interface. Hence removed channel verification. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-af: Add PTP device id for CN10K and 95O silcons	Subbaraya Sundeep
	CN10K slicon has different device id for PTP device. Hence this patch updates the driver with new id. Though ptp driver being a separate driver AF manages configuring PTP block by all PFs. To manage ptp, AF driver checks in its probe whether 1. ptp hardware device found on silicon 2. A driver is bound to ptp device 3. The ptp driver probe is successful In failure of cases 1 and 3, AF proceeds with out ptp and for case 2 defers the probe. This patch refactors code also to check for all the PTP device ids given in ptp device ids table for case 1. Also added PTP device ID for 95O silicon Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-af: Add free rsrc count mbox msg	George Cherian
	Upon receiving the MBOX_MSG_FREE_RSRC_CNT, the AF will find out the current number of free resources and reply it back to the requester. No guarantee is given on the future state of the free resources yet. If another requester sends MBOX_MSG_ATTACH_RESOURCES after this call, the number of available resources might change. Signed-off-by: George Cherian <george.cherian@marvell.com> Signed-off-by: Stanislaw Kardach <skardach@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-af: Add SDP interface support	Radha Mohan Chintakuntla
	Added support for packet IO via SDK links which is used when Octeon is connected as a end-point. Traffic host to end-point and vice versa flow through SDP links. This patch also support dual SDP blocks supported in 98xx silicon. Signed-off-by: Radha Mohan Chintakuntla <radhac@marvell.com> Signed-off-by: Nalla Pradeep <pnalla@marvell.com> Signed-off-by: Subrahmanyam Nilla <snilla@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-af: nix and lbk in loop mode in 98xx	Harman Kalra
	In 98xx, there are 2 NIX blocks and 4 LBK blocks present. The way these NIX-LBK should be configured depends on the use case. By default loopback functionality is supported in AF VF pairs which are attached to NIX0 and NIX1 LFs alternatively to ensure load balancing. NIX0 transmits a packet to LBK1 which will be received by NIX1 and packet transmitted by NIX1 will get received by NIX0 via LBK2. There are some requirements where only one AF VF is used and respective NIX is expected to operate in a mode where it can receive it own packet back. This can be achieved if NIX0 sends packet to LBK0 and not LBK1. Adding a flag in LF alloc request mailbox which can setup NIX0 to use LBK0 and NIX1 can use LBK3. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-pf: cleanup transmit link deriving logic	Subbaraya Sundeep
	Unlike OcteonTx2, the channel numbers used by CGX/RPM and LBK on CN10K silicons aren't fixed in HW. They are SW programmable, hence we cannot derive transmit link from static channel numbers anymore. Get the same from admin function via mailbox. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-af: Allow to configure flow tag LSB byte as RSS adder	Jerin Jacob
	Before C0 HW revision, The RSS adder was computed based the following static formula. rss_adder<7:0> = flow_tag<7:0> ^ flow_tag<15:8> ^ flow_tag<23:16> ^ flow_tag<31:24> The above scheme has the following drawbacks: 1) It is not in line with other standard NIC behavior. 2) There can be an SW use case where SW can compute the hash upfront using Toeplitz function and predict the queue selection to optimize some packet lookup function. The nonstandard way of doing XOR makes the consumer to not predict the queue selection. C0 HW revision onwards, The HW can configure the rss_adder<7:0> as flow_tag<7:0> to align with standard NICs. This patch adds an option to select legacy RSS adder mode vs standard NIC behavior by setting NIX_LF_RSS_TAG_LSB_AS_ADDER flag. Since this bit field is used as reserved in old HW revisions, No need to have an additional HW version check. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-af: enable tx shaping feature for 96xx C0	Nithin Dabilpuram
	Starting from 96xx C0 onwards all silicons support traffic shaping. This patch enables that feature along with other changes - When PIR/CIR shaping config is modified, toggle SW_XOFF for config to take effect - Before SMQ flush, clear SW_XOFF at all parent schedulers - Support to read current transmit scheduler configuration via mbox Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	net: ethernet: actions: Add helper dependency on COMPILE_TEST	Cai Huoqing
	it's helpful for complie test in other platform(e.g.X86) Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	netfilter: x_tables: handle xt_register_template() returning an error value	Lukas Bulwahn
	Commit fdacd57c79b7 ("netfilter: x_tables: never register tables by default") introduces the function xt_register_template(), and in one case, a call to that function was missing the error-case handling. Handle when xt_register_template() returns an error value. This was identified with the clang-analyzer's Dead-Store analysis. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25	netfilter: ctnetlink: missing counters and timestamp in nfnetlink_{log,queue}	Pablo Neira Ayuso
	Add counters and timestamps (if available) to the conntrack object that is represented in nfnetlink_log and _queue messages. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25	octeontx2-af: Wait for TX link idle for credits change	Nithin Dabilpuram
	NIX_AF_TX_LINKX_NORM_CREDIT holds running counter of tx credits available per link. But, tx credits should be configured based on MTU config. So MTU change needs tx credit count update. An issue exists whereby when both PF & VF are enabled and PF traffic is flowing, if VF requests for MTU update, updating the NORM_CREDIT register will lead to corruption of credit count and subsequent deadlock of tx link as the NORM_CREDIT register holds running count. This patch provides workaround by pausing link traffic using NIX_AF_TL1X_SW_XOFF, waiting for existing packets to drain, and used credits be returned before updating new credit count. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-af: Change the order of queue work and interrupt disable	Nithin Dabilpuram
	Clear and disable interrupt before queueing work as there might be a chance that work gets completed on other core faster and interrupt enable as a part of the work completes before interrupt disable in the interrupt context. This leads to permanent disable of interrupt. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	octeontx2-af: cn10k: Set cache lines for NPA batch alloc	Geetha sowjanya
	Set NPA batch allocation engine to process 35 cache lines per turn on CN10k platform. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	netfilter: ecache: remove nf_exp_event_notifier structure	Florian Westphal
	Reuse the conntrack event notofier struct, this allows to remove the extra register/unregister functions and avoids a pointer in struct net. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25	netfilter: ecache: prepare for event notifier merge	Florian Westphal
	This prepares for merge for ct and exp notifier structs. The 'fcn' member is renamed to something unique. Second, the register/unregister api is simplified. There is only one implementation so there is no need to do any error checking. Replace the EBUSY logic with WARN_ON_ONCE. This allows to remove error unwinding. The exp notifier register/unregister function is removed in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25	netfilter: ecache: add common helper for nf_conntrack_eventmask_report	Florian Westphal
	nf_ct_deliver_cached_events and nf_conntrack_eventmask_report are very similar. Split nf_conntrack_eventmask_report into a common helper function that can be used for both cases. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25	netfilter: ecache: remove another indent level	Florian Westphal
	... by changing: if (unlikely(ret < 0 \|\| missed)) { if (ret < 0) { to if (likely(ret >= 0 && !missed)) goto out; if (ret < 0) { After this nf_conntrack_eventmask_report and nf_ct_deliver_cached_events look pretty much the same, next patch moves common code to a helper. This patch has no effect on generated code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25	netfilter: ecache: remove one indent level	Florian Westphal
	nf_conntrack_eventmask_report and nf_ct_deliver_cached_events shared most of their code. This unifies the layout by changing if (nf_ct_is_confirmed(ct)) { foo } to if (!nf_ct_is_confirmed(ct))) return foo This removes one level of indentation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25	Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"	Will Deacon
	This partially reverts commit 16c9afc776608324ca71c0bc354987bab532f51d. Alex Bee reports a regression in 5.14 on their RK3328 SoC when configuring the PL330 DMA controller: \| ------------[ cut here ]------------ \| WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0 \| Modules linked in: spi_rockchip(+) fuse \| CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1 \| Hardware name: Pine64 Rock64 (DT) \| pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) \| pc : dma_map_resource+0x68/0xc0 \| lr : pl330_prep_slave_fifo+0x78/0xd0 This appears to be because dma_map_resource() is being called for a physical address which does not correspond to a memory address yet does have a valid 'struct page' due to the way in which the vmemmap is constructed. Prior to 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), the arm64 implementation of pfn_valid() called memblock_is_memory() to return 'false' for such regions and the DMA mapping request would proceed. However, now that we are using the generic implementation where only the presence of the memory map entry is considered, we return 'true' and erroneously fail with DMA_MAPPING_ERROR because we identify the region as DRAM. Although fixing this in the DMA mapping code is arguably the right fix, it is a risky, cross-architecture change at this stage in the cycle. So just revert arm64 back to its old pfn_valid() implementation for v5.14. The change to the generic pfn_valid() code is preserved from the original patch, so as to avoid impacting other architectures. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Christoph Hellwig <hch@lst.de> Reported-by: Alex Bee <knaerzche@gmail.com> Link: https://lore.kernel.org/r/d3a3c828-b777-faf8-e901-904995688437@gmail.com Signed-off-by: Will Deacon <will@kernel.org>