linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2022-10-27	net/mlx5e: Do not increment ESN when updating IPsec ESN state	Hyong Youb Kim
	An offloaded SA stops receiving after about 2^32 + replay_window packets. For example, when SA reaches <seq-hi 0x1, seq 0x2c>, all subsequent packets get dropped with SA-icv-failure (integrity_failed). To reproduce the bug: - ConnectX-6 Dx with crypto enabled (FW 22.30.1004) - ipsec.conf: nic-offload = yes replay-window = 32 esn = yes salifetime=24h - Run netperf for a long time to send more than 2^32 packets netperf -H <device-under-test> -t TCP_STREAM -l 20000 When 2^32 + replay_window packets are received, the replay window moves from the 2nd half of subspace (overlap=1) to the 1st half (overlap=0). The driver then updates the 'esn' value in NIC (i.e. seq_hi) as follows. seq_hi = xfrm_replay_seqhi(seq_bottom) new esn in NIC = seq_hi + 1 The +1 increment is wrong, as seq_hi already contains the correct seq_hi. For example, when seq_hi=1, the driver actually tells NIC to use seq_hi=2 (esn). This incorrect esn value causes all subsequent packets to fail integrity checks (SA-icv-failure). So, do not increment. Fixes: cb01008390bb ("net/mlx5: IPSec, Add support for ESN") Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	Merge branch 'fix-some-issues-in-netdevsim-driver'	Jakub Kicinski
	Zhengchao Shao says: ==================== fix some issues in netdevsim driver When strace tool is used to perform memory injection, memory leaks and files not removed issues are found. Fix them. ==================== Link: https://lore.kernel.org/r/20221026014642.116261-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed	Zhengchao Shao
	Remove dir in nsim_dev_debugfs_init() when creating ports dir failed. Otherwise, the netdevsim device will not be created next time. Kernel reports an error: debugfs: Directory 'netdevsim1' with parent 'netdevsim' already present! Fixes: ab1d0cc004d7 ("netdevsim: change debugfs tree topology") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	netdevsim: fix memory leak in nsim_drv_probe() when ↵	Zhengchao Shao
	nsim_dev_resources_register() failed If some items in nsim_dev_resources_register() fail, memory leak will occur. The following is the memory leak information. unreferenced object 0xffff888074c02600 (size 128): comm "echo", pid 8159, jiffies 4294945184 (age 493.530s) hex dump (first 32 bytes): 40 47 ea 89 ff ff ff ff 01 00 00 00 00 00 00 00 @G.............. ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ backtrace: [<0000000011a31c98>] kmalloc_trace+0x22/0x60 [<0000000027384c69>] devl_resource_register+0x144/0x4e0 [<00000000a16db248>] nsim_drv_probe+0x37a/0x1260 [<000000007d1f448c>] really_probe+0x20b/0xb10 [<00000000c416848a>] __driver_probe_device+0x1b3/0x4a0 [<00000000077e0351>] driver_probe_device+0x49/0x140 [<0000000054f2465a>] __device_attach_driver+0x18c/0x2a0 [<000000008538f359>] bus_for_each_drv+0x151/0x1d0 [<0000000038e09747>] __device_attach+0x1c9/0x4e0 [<00000000dd86e533>] bus_probe_device+0x1d5/0x280 [<00000000839bea35>] device_add+0xae0/0x1cb0 [<000000009c2abf46>] new_device_store+0x3b6/0x5f0 [<00000000fb823d7f>] bus_attr_store+0x72/0xa0 [<000000007acc4295>] sysfs_kf_write+0x106/0x160 [<000000005f50cb4d>] kernfs_fop_write_iter+0x3a8/0x5a0 [<0000000075eb41bf>] vfs_write+0x8f0/0xc80 Fixes: 37923ed6b8ce ("netdevsim: Add simple FIB resource controller via devlink") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	netdevsim: fix memory leak in nsim_bus_dev_new()	Zhengchao Shao
	If device_register() failed in nsim_bus_dev_new(), the value of reference in nsim_bus_dev->dev is 1. obj->name in nsim_bus_dev->dev will not be released. unreferenced object 0xffff88810352c480 (size 16): comm "echo", pid 5691, jiffies 4294945921 (age 133.270s) hex dump (first 16 bytes): 6e 65 74 64 65 76 73 69 6d 31 00 00 00 00 00 00 netdevsim1...... backtrace: [<000000005e2e5e26>] __kmalloc_node_track_caller+0x3a/0xb0 [<0000000094ca4fc8>] kvasprintf+0xc3/0x160 [<00000000aad09bcc>] kvasprintf_const+0x55/0x180 [<000000009bac868d>] kobject_set_name_vargs+0x56/0x150 [<000000007c1a5d70>] dev_set_name+0xbb/0xf0 [<00000000ad0d126b>] device_add+0x1f8/0x1cb0 [<00000000c222ae24>] new_device_store+0x3b6/0x5e0 [<0000000043593421>] bus_attr_store+0x72/0xa0 [<00000000cbb1833a>] sysfs_kf_write+0x106/0x160 [<00000000d0dedb8a>] kernfs_fop_write_iter+0x3a8/0x5a0 [<00000000770b66e2>] vfs_write+0x8f0/0xc80 [<0000000078bb39be>] ksys_write+0x106/0x210 [<00000000005e55a4>] do_syscall_64+0x35/0x80 [<00000000eaa40bbc>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 40e4fe4ce115 ("netdevsim: move device registration and related code to bus.c") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20221026015405.128795-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	Merge tag 'linux-can-fixes-for-6.1-20221027' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-10-27 Anssi Hannula fixes the use of the completions in the kvaser_usb driver. Biju Das contributes 2 patches for the rcar_canfd driver. A IRQ storm that can be triggered by high CAN bus load and channel specific IRQ handlers are fixed. Yang Yingliang fixes the j1939 transport protocol by moving a kfree_skb() out of a spin_lock_irqsave protected section. * tag 'linux-can-fixes-for-6.1-20221027' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: j1939: transport: j1939_session_skb_drop_old(): spin_unlock_irqrestore() before kfree_skb() can: rcar_canfd: fix channel specific IRQ handling for RZ/G2L can: rcar_canfd: rcar_canfd_handle_global_receive(): fix IRQ storm on global FIFO receive can: kvaser_usb: Fix possible completions during init_completion ==================== Link: https://lore.kernel.org/r/20221027114356.1939821-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	net: broadcom: bcm4908_enet: update TX stats after actual transmission	Rafał Miłecki
	Queueing packets doesn't guarantee their transmission. Update TX stats after hardware confirms consuming submitted data. This also fixes a possible race and NULL dereference. bcm4908_enet_start_xmit() could try to access skb after freeing it in the bcm4908_enet_poll_tx(). Reported-by: Florian Fainelli <f.fainelli@gmail.com> Fixes: 4feffeadbcb2e ("net: broadcom: bcm4908enet: add BCM4908 controller driver") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221027112430.8696-1-zajec5@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	Merge branch 'ip-rework-the-fix-for-dflt-addr-selection-for-connected-nexthop'	Jakub Kicinski
	Nicolas Dichtel says: ==================== ip: rework the fix for dflt addr selection for connected nexthop" This series reworks the fix that is reverted in the second commit. As Julian explained, nhc_scope is related to nhc_gw, it's not the scope of the route. ==================== Link: https://lore.kernel.org/r/20221020100952.8748-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	nh: fix scope used to find saddr when adding non gw nh	Nicolas Dichtel
	As explained by Julian, fib_nh_scope is related to fib_nh_gw4, but fib_info_update_nhc_saddr() needs the scope of the route, which is the scope "before" fib_nh_scope, ie fib_nh_scope - 1. This patch fixes the problem described in commit 747c14307214 ("ip: fix dflt addr selection for connected nexthop"). Fixes: 597cfe4fc339 ("nexthop: Add support for IPv4 nexthops") Link: https://lore.kernel.org/netdev/6c8a44ba-c2d5-cdf-c5c7-5baf97cba38@ssi.bg/ Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	Revert "ip: fix dflt addr selection for connected nexthop"	Nicolas Dichtel
	This reverts commit 747c14307214b55dbd8250e1ab44cad8305756f1. As explained by Julian, nhc_scope is related to nhc_gw, not to the route. Revert the original patch. The initial problem is fixed differently in the next commit. Link: https://lore.kernel.org/netdev/6c8a44ba-c2d5-cdf-c5c7-5baf97cba38@ssi.bg/ Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	Revert "ip: fix triggering of 'icmp redirect'"	Nicolas Dichtel
	This reverts commit eb55dc09b5dd040232d5de32812cc83001a23da6. The patch that introduces this bug is reverted right after this one. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27	io_uring: unlock if __io_run_local_work locked inside	Dylan Yudaken
	It is possible for tw to lock the ring, and this was not propogated out to io_run_local_work. This can cause an unlock to be missed. Instead pass a pointer to locked into __io_run_local_work. Fixes: 8ac5d85a89b4 ("io_uring: add local task_work run helper that is entered locked") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-3-dylany@meta.com [axboe: WARN_ON() -> WARN_ON_ONCE() and add a minor comment] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-27	io_uring: use io_run_local_work_locked helper	Dylan Yudaken
	prefer to use io_run_local_work_locked helper for consistency Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-27	genetlink: limit the use of validation workarounds to old ops	Jakub Kicinski
	During review of previous change another thing came up - we should limit the use of validation workarounds to old commands. Don't list the workarounds one by one, as we're rejecting all existing ones. We can deal with the masking in the unlikely event that new flag is added. Link: https://lore.kernel.org/all/6ba9f727e555fd376623a298d5d305ad408c3d47.camel@sipsolutions.net/ Link: https://lore.kernel.org/r/20221026001524.1892202-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28	kbuild: fix typo in modpost	Will McVicker
	Commit f73edc8951b2 ("kbuild: unify two modpost invocations") introduced a typo (moudle.symvers-if-present) which results in the kernel's Module.symvers to not be included as a prerequisite for $(KBUILD_EXTMOD)/Module.symvers. Fix the typo to restore the intended functionality. Fixes: f73edc8951b2 ("kbuild: unify two modpost invocations") Signed-off-by: Will McVicker <willmcvicker@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-28	Documentation: kbuild: Add description of git for reproducible builds	Dan Li
	The status of git will affect the final compilation result, add it to the documentation of reproducible builds. Signed-off-by: Dan Li <ashimida@linux.alibaba.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-28	kbuild: use POSIX-compatible grep option	Stefan Hansson
	--file is a GNU extension to grep which is not available in all implementations (such as BusyBox). Use the -f option instead which is eqvuialent according to the GNU grep manpage[1] and is present in POSIX[2]. [1] https://www.gnu.org/software/grep/manual/grep.html [2] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html Signed-off-by: Stefan Hansson <newbie13xd@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-27	net/mlx5: DR, Remove the buddy used_list	Yevgeny Kliteynik
	No need to have the used_list - we don't need to keep track of the used chunks, we only need to know the amount of used memory. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, Keep track of hot ICM chunks in an array instead of list	Yevgeny Kliteynik
	When ICM chunk is freed, it might still be accessed by HW until we do sync with HW. This sync is expensive operation, so we don't do it often. Instead, when the chunk is freed, it is moved to the buddy's "hot memory" list. Once sync is done, we traverse the hot list and finally free all the chunks. It appears that traversing a long list takes unusually long time due to cache misses on many entries, which causes a big "hiccup" during rule insertion. This patch deals with this issue the following way: - Move hot chunks list from buddy to pool, so that the pool will keep track of all its hot memory. - Replace the list with pre-allocated array on the memory pool struct, and store only the information that is needed to later free this chunk in its buddy allocator. This cost additional memory for the array that is dynamically allocated, but it allows not to save long list of hot chunks, so at peak times it actually saves memory due to the fact that each array entry is much smaller than the chunk struct. This way an overhead of traversing the long list is virtually removed: the loop of freeing hot chunks takes ~27 msec instead of ~70 msec, where most of it are the actual freeing activities. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, Lower sync threshold for ICM hot memory	Yevgeny Kliteynik
	Instead of hiding the math in the code, define a value that sets the fraction of allowed hot memory of ICM pool. Set the threshold for sync of ICM hot chunks to 1/4 of the pool instead of 1/2 of the pool. Although we will have more syncs, each sync will be shorter and will help with insertion rate stability. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, Allocate htbl from its own slab allocator	Yevgeny Kliteynik
	SW steering allocates/frees lots of htbl structs. Create a separate kmem_cache and allocate htbls from this allocator. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, Allocate icm_chunks from their own slab allocator	Yevgeny Kliteynik
	SW steering allocates/frees lots of icm_chunk structs. To make this more efficiently, create a separate kmem_cache and allocate these chunks from this allocator. By doing this we observe that the alloc/free "hiccups" frequency has become much lower, which allows for a more steady rule insersion rate. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, Manage STE send info objects in pool	Yevgeny Kliteynik
	Instead of allocating/freeing send info objects dynamically, manage them in pool. The number of send info objects doesn't depend on rules, so after pre-populating the pool with an initial batch of send info objects, the pool is not expected to grow. This way we save alloc/free during writing STEs to ICM, which can sometimes take up to 40msec. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, In rehash write the line in the entry immediately	Yevgeny Kliteynik
	Don't wait for the whole table to be ready - write each row immediately. This way we save allocations of the ste_send_info structure and improve performance. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, Handle domain memory resources init/uninit separately	Yevgeny Kliteynik
	Handle creation/destruction of all the domain's memory pools and other memory-related fields in a separate init/uninit functions. This simplifies error flow and allows cleaner addition of new pools. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, Initialize chunk's ste_arrays at chunk creation	Yevgeny Kliteynik
	Rather than cleaning the corresponding chunk's section of ste_arrays on chunk deletion, initialize these areas upon chunk creation. Chunk destruction tend to come in large batches (during pool syncing). To reduce the "hiccup" in such cases, moving ste_arrays init from chunk destruction to initialization. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically	Yevgeny Kliteynik
	While creating rule, ste_arr is an array that is allocated at the start of the function and freed at the end. This memory allocation can sometimes lead to "hiccups" of up to 10ms. However, the common use case is short chains of STEs. For such cases, we can use a local buffer on stack instead. Changes in v2: Use small local array for short rules, allocate dynamically for long rules Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, Remove unneeded argument from dr_icm_chunk_destroy	Yevgeny Kliteynik
	Remove an argument that can be extracted in the function. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, Check device state when polling CQ	Yevgeny Kliteynik
	Calling fast teardown as part of the normal unloading caused a problem with SW steering - SW steering still needs to clear its tables, write to ICM and poll for completions. When teardown has been done, SW steering keeps polling the CQ forever, because nobody flushes it. This patch fixes the issue by checking the device state in cases where no CQE was returned. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, Fix the SMFS sync_steering for fast teardown	Yevgeny Kliteynik
	If sync happens when the device is in fast teardown, just bail and don't do anything, because the PCI device is not there any more. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net/mlx5: DR, In destroy flow, free resources even if FW command failed	Yevgeny Kliteynik
	Otherwise resources will never be freed and refcount will not be decreased. Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-27	net: bcmsysport: Indicate MAC is in charge of PHY PM	Florian Fainelli
	Avoid the PHY library call unnecessarily into the suspend/resume functions by setting phydev->mac_managed_pm to true. The SYSTEMPORT driver essentially does exactly what mdio_bus_phy_resume() does by calling phy_resume(). Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221025234201.2549360-1-f.fainelli@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	Merge tag 'nvme-6.1-2022-10-27' of git://git.infradead.org/nvme into block-6.1	Jens Axboe
	Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.1 - make the multipath dma alignment to match the non-multipath one (Keith Busch) - fix a bogus use of sg_init_marker() (Nam Cao) - fix circulr locking in nvme-tcp (Sagi Grimberg)" * tag 'nvme-6.1-2022-10-27' of git://git.infradead.org/nvme: nvme-multipath: set queue dma alignment to 3 nvme-tcp: fix possible circular locking when deleting a controller under memory pressure nvme-tcp: replace sg_init_marker() with sg_init_table()
2022-10-27	skbuff: Proactively round up to kmalloc bucket size	Kees Cook
	Instead of discovering the kmalloc bucket size _after_ allocation, round up proactively so the allocation is explicitly made for the full size, allowing the compiler to correctly reason about the resulting size of the buffer through the existing __alloc_size() hint. This will allow for kernels built with CONFIG_UBSAN_BOUNDS or the coming dynamic bounds checking under CONFIG_FORTIFY_SOURCE to gain back the __alloc_size() hints that were temporarily reverted in commit 93dd04ab0b2b ("slab: remove __alloc_size attribute from __kmalloc_track_caller") Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Link: https://patchwork.kernel.org/project/netdevbpf/patch/20221021234713.you.031-kees@kernel.org/ Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221025223811.up.360-kees@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	blk-mq: don't add non-pt request with ->end_io to batch	Ming Lei
	dm-rq implements ->end_io callback for request issued to underlying queue, and it isn't passthrough request. Commit ab3e1d3bbab9 ("block: allow end_io based requests in the completion batch handling") doesn't clear rq->bio and rq->__data_len for request with ->end_io in blk_mq_end_request_batch(), and this way is actually dangerous, but so far it is only for nvme passthrough request. dm-rq needs to clean up remained bios in case of partial completion, and req->bio is required, then use-after-free is triggered, so the underlying clone request can't be completed in blk_mq_end_request_batch. Fix panic by not adding such request into batch list, and the issue can be triggered simply by exposing nvme pci to dm-mpath simply. Fixes: ab3e1d3bbab9 ("block: allow end_io based requests in the completion batch handling") Cc: dm-devel@redhat.com Cc: Mike Snitzer <snitzer@kernel.org> Reported-by: Changhui Zhong <czhong@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20221027085709.513175-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-27	rbd: fix possible memory leak in rbd_sysfs_init()	Yang Yingliang
	If device_register() returns error in rbd_sysfs_init(), name of kobject which is allocated in dev_set_name() called in device_add() is leaked. As comment of device_add() says, it should call put_device() to drop the reference count that was set in device_initialize() when it fails, so the name can be freed in kobject_cleanup(). Fault injection test can trigger this problem: unreferenced object 0xffff88810173aa78 (size 8): comm "modprobe", pid 247, jiffies 4294714278 (age 31.789s) hex dump (first 8 bytes): 72 62 64 00 81 88 ff ff rbd..... backtrace: [<00000000f58fae56>] __kmalloc_node_track_caller+0x44/0x1b0 [<00000000bdd44fe7>] kstrdup+0x3a/0x70 [<00000000f7844d0b>] kstrdup_const+0x63/0x80 [<000000001b0a0eeb>] kvasprintf_const+0x10b/0x190 [<00000000a47bd894>] kobject_set_name_vargs+0x56/0x150 [<00000000d5edbf18>] dev_set_name+0xab/0xe0 [<00000000f5153e80>] device_add+0x106/0x1f20 Fixes: dfc5606dc513 ("rbd: replace the rbd sysfs interface") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20221027091918.2294132-1-yangyingliang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-27	Merge branch 'net-ipa-don-t-use-fixed-table-sizes'	Paolo Abeni
	Alex Elder says: ==================== net: ipa: don't use fixed table sizes Currently, routing and filter tables are assumed to have a fixed size for all platforms. In fact, these tables can support many more entries than what has been assumed; the only limitation is the size of the IPA-resident memory regions that contain them. This series rearranges things so that the size of the table is determined from the memory region size defined in configuration data, rather than assuming it is fixed. This will required for IPA versions 5.0+, where the number of entries in a routing table is larger. ==================== Link: https://lore.kernel.org/r/20221025195143.255934-1-elder@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	net: ipa: determine filter table size from memory region	Alex Elder
	Currently we assume that any filter table contains a fixed number of entries. Like routing tables, the number of entries in a filter table is limited only by the size of the IPA-local memory region used to hold the table. Stop assuming that a filter table has exactly 14 entries. Instead, determine the number of entries in a routing table by dividing its memory region size by the size of an entry. (Note that the first "entry" in a filter table contains an endpoint bitmap.) Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	net: ipa: don't assume 8 modem routing table entries	Alex Elder
	Currently all platforms are assumed allot 8 routing table entries for use by the modem. Instead, add a new configuration data entry that defines the number of modem routing table entries, and record that in the IPA structure. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	net: ipa: determine route table size from memory region	Alex Elder
	Currently we assume that any routing table contains a fixed number of entries. The number of entries in a routing table can actually vary, depending only on the size of the IPA-local memory region used to hold the table. Stop assuming that a routing table has exactly 15 entries. Instead, determine the number of entries in a routing table by dividing its memory region size by the size of an entry. The number of entries is computed early, when ipa_table_mem_valid() is called by ipa_table_init(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	net: ipa: record the route table size in the IPA structure	Alex Elder
	The non-hashed routing tables for IPv4 and IPv6 will be the same size. And if supported, the hashed routing tables will be the same size as the non-hashed tables. Record the size (number of entries) of all routing tables in the IPA structure. For now, initialize this field using IPA_ROUTE_TABLE_MAX, and just do so when the first route table is validated. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	can: j1939: transport: j1939_session_skb_drop_old(): ↵	Yang Yingliang
	spin_unlock_irqrestore() before kfree_skb() It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. The skb is unlinked from the queue, so it can be freed after spin_unlock_irqrestore(). Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20221027091237.2290111-1-yangyingliang@huawei.com Cc: stable@vger.kernel.org [mkl: adjust subject] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-10-27	eth: fealnx: delete the driver for Myson MTD-800	Jakub Kicinski
	The git history for this driver seems to be completely automated / tree wide changes. I can't find any boards or systems which would use this chip. Google search shows pictures of towel warmers and no networking products. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221025184254.1717982-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	ice: Add support Flex RXD	Michal Jaron
	Add new VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC flag, opcode VIRTCHNL_OP_GET_SUPPORTED_RXDIDS and add member rxdid in struct virtchnl_rxq_info to support AVF Flex RXD extension. Add support to allow VF to query flexible descriptor RXDIDs supported by DDP package and configure Rx queues with selected RXDID for IAVF. Add code to allow VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message to be processed. Add necessary macros for registers. Signed-off-by: Leyi Rong <leyi.rong@intel.com> Signed-off-by: Xu Ting <ting.xu@intel.com> Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221025161252.1952939-1-jacob.e.keller@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	net: broadcom: bcm4908_enet: use build_skb()	Rafał Miłecki
	RX code can be more efficient with the build_skb(). Allocating actual SKB around eth packet buffer - right before passing it up - results in a better cache usage. Without RPS (echo 0 > rps_cpus) BCM4908 NAT masq performance "jumps" between two speeds: ~900 Mbps and 940 Mbps (it's a 4 CPUs SoC). This change bumps the lower speed from 905 Mb/s to 918 Mb/s (tested using single stream iperf 2.0.5 traffic). There are more optimizations to consider. One obvious to try is GRO however as BCM4908 doesn't do hw csum is may actually lower performance. Sometimes. Some early testing: ┌─────────────────────────────────┬─────────────────────┬────────────────────┐ │ │ netif_receive_skb() │ napi_gro_receive() │ ├─────────────────────────────────┼─────────────────────┼────────────────────┤ │ netdev_alloc_skb() │ 905 Mb/s │ 892 Mb/s │ │ napi_alloc_frag() + build_skb() │ 918 Mb/s │ 917 Mb/s │ └─────────────────────────────────┴─────────────────────┴────────────────────┘ Another ideas: 1. napi_build_skb() 2. skb_copy_from_linear_data() for small packets Those need proper testing first though. That can be done later. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Link: https://lore.kernel.org/r/20221025132245.22871-1-zajec5@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	Drivers: hv: fix repeated words in comments	Jilin Yuan
	Delete the redundant word 'of'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20221019125604.52999-1-yuanjilin@cdjrlc.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-10-27	x86/hyperv: Remove BUG_ON() for kmap_local_page()	Zhao Liu
	The commit 154fb14df7a3c ("x86/hyperv: Replace kmap() with kmap_local_page()") keeps the BUG_ON() to check if kmap_local_page() fails. But in fact, kmap_local_page() always returns a valid kernel address and won't return NULL here. It will BUG on its own if it fails. [1] So directly use memcpy_to_page() which creates local mapping to copy. [1]: https://lore.kernel.org/lkml/YztFEyUA48et0yTt@iweiny-mobl/ Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20221020083820.2341088-1-zhao1.liu@linux.intel.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-10-27	net: ehea: fix possible memory leak in ehea_register_port()	Yang Yingliang
	If of_device_register() returns error, the of node and the name allocated in dev_set_name() is leaked, call put_device() to give up the reference that was set in device_initialize(), so that of node is put in logical_port_release() and the name is freed in kobject_cleanup(). Fixes: 1acf2318dd13 ("ehea: dynamic add / remove port") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221025130011.1071357-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	net: dp83822: Print the SOR1 strap status	Fabio Estevam
	During the bring-up of the Ethernet PHY, it is very useful to see the bootstrap status information, as it can help identifying hardware bootstrap mistakes. Allow printing the SOR1 register, which contains the strap status to ease the bring-up. Signed-off-by: Fabio Estevam <festevam@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20221025120109.779337-1-festevam@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27	KVM: Reject attempts to consume or refresh inactive gfn_to_pfn_cache	Sean Christopherson
	Reject kvm_gpc_check() and kvm_gpc_refresh() if the cache is inactive. Not checking the active flag during refresh is particularly egregious, as KVM can end up with a valid, inactive cache, which can lead to a variety of use-after-free bugs, e.g. consuming a NULL kernel pointer or missing an mmu_notifier invalidation due to the cache not being on the list of gfns to invalidate. Note, "active" needs to be set if and only if the cache is on the list of caches, i.e. is reachable via mmu_notifier events. If a relevant mmu_notifier event occurs while the cache is "active" but not on the list, KVM will not acquire the cache's lock and so will not serailize the mmu_notifier event with active users and/or kvm_gpc_refresh(). A race between KVM_XEN_ATTR_TYPE_SHARED_INFO and KVM_XEN_HVM_EVTCHN_SEND can be exploited to trigger the bug. 1. Deactivate shinfo cache: kvm_xen_hvm_set_attr case KVM_XEN_ATTR_TYPE_SHARED_INFO kvm_gpc_deactivate kvm_gpc_unmap gpc->valid = false gpc->khva = NULL gpc->active = false Result: active = false, valid = false 2. Cause cache refresh: kvm_arch_vm_ioctl case KVM_XEN_HVM_EVTCHN_SEND kvm_xen_hvm_evtchn_send kvm_xen_set_evtchn kvm_xen_set_evtchn_fast kvm_gpc_check return -EWOULDBLOCK because !gpc->valid kvm_xen_set_evtchn_fast return -EWOULDBLOCK kvm_gpc_refresh hva_to_pfn_retry gpc->valid = true gpc->khva = not NULL Result: active = false, valid = true 3. Race ioctl KVM_XEN_HVM_EVTCHN_SEND against ioctl KVM_XEN_ATTR_TYPE_SHARED_INFO: kvm_arch_vm_ioctl case KVM_XEN_HVM_EVTCHN_SEND kvm_xen_hvm_evtchn_send kvm_xen_set_evtchn kvm_xen_set_evtchn_fast read_lock gpc->lock kvm_xen_hvm_set_attr case KVM_XEN_ATTR_TYPE_SHARED_INFO mutex_lock kvm->lock kvm_xen_shared_info_init kvm_gpc_activate gpc->khva = NULL kvm_gpc_check [ Check passes because gpc->valid is still true, even though gpc->khva is already NULL. ] shinfo = gpc->khva pending_bits = shinfo->evtchn_pending CRASH: test_and_set_bit(..., pending_bits) Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support") Cc: stable@vger.kernel.org Reported-by: : Michal Luczaj <mhal@rbox.co> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013211234.1318131-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>