summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-03-16Merge branch 'virtio_net-xdp-bugs'David S. Miller
Xuan Zhuo says: ==================== virtio_net: fix two bugs related to XDP This patch set fixes two bugs related to XDP. These two patch is not associated. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16virtio_net: free xdp shinfo frags when build_skb_from_xdp_buff() failsXuan Zhuo
build_skb_from_xdp_buff() may return NULL, in this case we need to free the frags of xdp shinfo. Fixes: fab89bafa95b ("virtio-net: support multi-buffer xdp") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16virtio_net: fix page_to_skb() miss headroomXuan Zhuo
Because headroom is not passed to page_to_skb(), this causes the shinfo exceeds the range. Then the frags of shinfo are changed by other process. [ 157.724634] stack segment: 0000 [#1] PREEMPT SMP NOPTI [ 157.725358] CPU: 3 PID: 679 Comm: xdp_pass_user_f Tainted: G E 6.2.0+ #150 [ 157.726401] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/4 [ 157.727820] RIP: 0010:skb_release_data+0x11b/0x180 [ 157.728449] Code: 44 24 02 48 83 c3 01 39 d8 7e be 48 89 d8 48 c1 e0 04 41 80 7d 7e 00 49 8b 6c 04 30 79 0c 48 89 ef e8 89 b [ 157.730751] RSP: 0018:ffffc90000178b48 EFLAGS: 00010202 [ 157.731383] RAX: 0000000000000010 RBX: 0000000000000001 RCX: 0000000000000000 [ 157.732270] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff888100dd0b00 [ 157.733117] RBP: 5d5d76010f6e2408 R08: ffff888100dd0b2c R09: 0000000000000000 [ 157.734013] R10: ffffffff82effd30 R11: 000000000000a14e R12: ffff88810981ffc0 [ 157.734904] R13: ffff888100dd0b00 R14: 0000000000000002 R15: 0000000000002310 [ 157.735793] FS: 00007f06121d9740(0000) GS:ffff88842fcc0000(0000) knlGS:0000000000000000 [ 157.736794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 157.737522] CR2: 00007ffd9a56c084 CR3: 0000000104bda001 CR4: 0000000000770ee0 [ 157.738420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 157.739283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 157.740146] PKRU: 55555554 [ 157.740502] Call Trace: [ 157.740843] <IRQ> [ 157.741117] kfree_skb_reason+0x50/0x120 [ 157.741613] __udp4_lib_rcv+0x52b/0x5e0 [ 157.742132] ip_protocol_deliver_rcu+0xaf/0x190 [ 157.742715] ip_local_deliver_finish+0x77/0xa0 [ 157.743280] ip_sublist_rcv_finish+0x80/0x90 [ 157.743834] ip_list_rcv_finish.constprop.0+0x16f/0x190 [ 157.744493] ip_list_rcv+0x126/0x140 [ 157.744952] __netif_receive_skb_list_core+0x29b/0x2c0 [ 157.745602] __netif_receive_skb_list+0xed/0x160 [ 157.746190] ? udp4_gro_receive+0x275/0x350 [ 157.746732] netif_receive_skb_list_internal+0xf2/0x1b0 [ 157.747398] napi_gro_receive+0xd1/0x210 [ 157.747911] virtnet_receive+0x75/0x1c0 [ 157.748422] virtnet_poll+0x48/0x1b0 [ 157.748878] __napi_poll+0x29/0x1b0 [ 157.749330] net_rx_action+0x27a/0x340 [ 157.749812] __do_softirq+0xf3/0x2fb [ 157.750298] do_softirq+0xa2/0xd0 [ 157.750745] </IRQ> [ 157.751563] <TASK> [ 157.752329] __local_bh_enable_ip+0x6d/0x80 [ 157.753178] virtnet_xdp_set+0x482/0x860 [ 157.754159] ? __pfx_virtnet_xdp+0x10/0x10 [ 157.755129] dev_xdp_install+0xa4/0xe0 [ 157.756033] dev_xdp_attach+0x20b/0x5e0 [ 157.756933] do_setlink+0x82e/0xc90 [ 157.757777] ? __nla_validate_parse+0x12b/0x1e0 [ 157.758744] rtnl_setlink+0xd8/0x170 [ 157.759549] ? mod_objcg_state+0xcb/0x320 [ 157.760328] ? security_capable+0x37/0x60 [ 157.761209] ? security_capable+0x37/0x60 [ 157.762072] rtnetlink_rcv_msg+0x145/0x3d0 [ 157.762929] ? ___slab_alloc+0x327/0x610 [ 157.763754] ? __alloc_skb+0x141/0x170 [ 157.764533] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 157.765422] netlink_rcv_skb+0x58/0x110 [ 157.766229] netlink_unicast+0x21f/0x330 [ 157.766951] netlink_sendmsg+0x240/0x4a0 [ 157.767654] sock_sendmsg+0x93/0xa0 [ 157.768434] ? sockfd_lookup_light+0x12/0x70 [ 157.769245] __sys_sendto+0xfe/0x170 [ 157.770079] ? handle_mm_fault+0xe9/0x2d0 [ 157.770859] ? preempt_count_add+0x51/0xa0 [ 157.771645] ? up_read+0x3c/0x80 [ 157.772340] ? do_user_addr_fault+0x1e9/0x710 [ 157.773166] ? kvm_read_and_reset_apf_flags+0x49/0x60 [ 157.774087] __x64_sys_sendto+0x29/0x30 [ 157.774856] do_syscall_64+0x3c/0x90 [ 157.775518] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 157.776382] RIP: 0033:0x7f06122def70 Fixes: 18117a842ab0 ("virtio-net: remove xdp related info from page_to_skb()") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16net: Use of_property_read_bool() for boolean propertiesRob Herring
It is preferred to use typed property access functions (i.e. of_property_read_<type> functions) rather than low-level of_get_property/of_find_property functions for reading properties. Convert reading boolean properties to of_property_read_bool(). Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can Acked-by: Kalle Valo <kvalo@kernel.org> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16Merge branch 'net-dsa-marvell-mtu-reporting'David S. Miller
Vladimir Oltean says: ==================== Fix MTU reporting for Marvell DSA switches where we can't change it As explained in patch 2, the driver doesn't know how to change the MTU on MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290, and there is a regression where it actually reports an MTU value below the Ethernet standard (1500). Fixing that shows another issue where DSA is unprepared to be told that a switch supports an MTU of only 1500, and still errors out. That is addressed by patch 1. Testing was not done on "real" hardware, but on a different Marvell DSA switch, with code modified such that the driver doesn't know how to change the MTU on that, either. A key assumption is that these switches don't need any MTU configuration to pass full MTU-sized, DSA-tagged packets, which seems like a reasonable assumption to make. My 6390 and 6190 switches, with .port_set_jumbo_size commented out, certainly don't seem to have any problem passing MTU-sized traffic, as can be seen in this iperf3 session captured with tcpdump on the DSA master: $MAC > $MAC, Marvell DSA mode Forward, dev 2, port 8, untagged, VID 1000, FPri 0, ethertype IPv4 (0x0800), length 1518: 10.0.0.69.49590 > 10.0.0.1.5201: Flags [.], seq 81088:82536, ack 1, win 502, options [nop,nop,TS val 2221498829 ecr 3012859850], length 1448 I don't want to go all the way and say that the adjustment made by commit b9c587fed61c ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports") is completely unnecessary, just that there's an equally good chance that the switches with unknown MTU configuration procedure "just work". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16net: dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290Vladimir Oltean
There are 3 classes of switch families that the driver is aware of, as far as mv88e6xxx_change_mtu() is concerned: - MTU configuration is available per port. Here, the chip->info->ops->port_set_jumbo_size() method will be present. - MTU configuration is global to the switch. Here, the chip->info->ops->set_max_frame_size() method will be present. - We don't know how to change the MTU. Here, none of the above methods will be present. Switch families MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290 fall in category 3. The blamed commit has adjusted the MTU for all 3 categories by EDSA_HLEN (8 bytes), resulting in a new maximum MTU of 1492 being reported by the driver for these switches. I don't have the hardware to test, but I do have a MV88E6390 switch on which I can simulate this by commenting out its .port_set_jumbo_size definition from mv88e6390_ops. The result is this set of messages at probe time: mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 1 mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 2 mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 3 mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 4 mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 5 mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 6 mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 7 mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 8 It is highly implausible that there exist Ethernet switches which don't support the standard MTU of 1500 octets, and this is what the DSA framework says as well - the error comes from dsa_slave_create() -> dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN). But the error messages are alarming, and it would be good to suppress them. As a consequence of this unlikeliness, we reimplement mv88e6xxx_get_max_mtu() and mv88e6xxx_change_mtu() on switches from the 3rd category as follows: the maximum supported MTU is 1500, and any request to set the MTU to a value larger than that fails in dev_validate_mtu(). Fixes: b9c587fed61c ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16net: dsa: don't error out when drivers return ETH_DATA_LEN in .port_max_mtu()Vladimir Oltean
Currently, when dsa_slave_change_mtu() is called on a user port where dev->max_mtu is 1500 (as returned by ds->ops->port_max_mtu()), the code will stumble upon this check: if (new_master_mtu > mtu_limit) return -ERANGE; because new_master_mtu is adjusted for the tagger overhead but mtu_limit is not. But it would be good if the logic went through, for example if the DSA master really depends on an MTU adjustment to accept DSA-tagged frames. To make the code pass through the check, we need to adjust mtu_limit for the overhead as well, if the minimum restriction was caused by the DSA user port's MTU (dev->max_mtu). A DSA user port MTU and a DSA master MTU are always offset by the protocol overhead. Currently no drivers return 1500 .port_max_mtu(), but this is only temporary and a bug in itself - mv88e6xxx should have done that, but since commit b9c587fed61c ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports") it no longer does. This is a preparation for fixing that. Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16ice: xsk: disable txq irq before flushing hwMaciej Fijalkowski
ice_qp_dis() intends to stop a given queue pair that is a target of xsk pool attach/detach. One of the steps is to disable interrupts on these queues. It currently is broken in a way that txq irq is turned off *after* HW flush which in turn takes no effect. ice_qp_dis(): -> ice_qvec_dis_irq() --> disable rxq irq --> flush hw -> ice_vsi_stop_tx_ring() -->disable txq irq Below splat can be triggered by following steps: - start xdpsock WITHOUT loading xdp prog - run xdp_rxq_info with XDP_TX action on this interface - start traffic - terminate xdpsock [ 256.312485] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 256.319560] #PF: supervisor read access in kernel mode [ 256.324775] #PF: error_code(0x0000) - not-present page [ 256.329994] PGD 0 P4D 0 [ 256.332574] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 256.337006] CPU: 3 PID: 32 Comm: ksoftirqd/3 Tainted: G OE 6.2.0-rc5+ #51 [ 256.345218] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 256.355807] RIP: 0010:ice_clean_rx_irq_zc+0x9c/0x7d0 [ice] [ 256.361423] Code: b7 8f 8a 00 00 00 66 39 ca 0f 84 f1 04 00 00 49 8b 47 40 4c 8b 24 d0 41 0f b7 45 04 66 25 ff 3f 66 89 04 24 0f 84 85 02 00 00 <49> 8b 44 24 18 0f b7 14 24 48 05 00 01 00 00 49 89 04 24 49 89 44 [ 256.380463] RSP: 0018:ffffc900088bfd20 EFLAGS: 00010206 [ 256.385765] RAX: 000000000000003c RBX: 0000000000000035 RCX: 000000000000067f [ 256.393012] RDX: 0000000000000775 RSI: 0000000000000000 RDI: ffff8881deb3ac80 [ 256.400256] RBP: 000000000000003c R08: ffff889847982710 R09: 0000000000010000 [ 256.407500] R10: ffffffff82c060c0 R11: 0000000000000004 R12: 0000000000000000 [ 256.414746] R13: ffff88811165eea0 R14: ffffc9000d255000 R15: ffff888119b37600 [ 256.421990] FS: 0000000000000000(0000) GS:ffff8897e0cc0000(0000) knlGS:0000000000000000 [ 256.430207] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 256.436036] CR2: 0000000000000018 CR3: 0000000005c0a006 CR4: 00000000007706e0 [ 256.443283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 256.450527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 256.457770] PKRU: 55555554 [ 256.460529] Call Trace: [ 256.463015] <TASK> [ 256.465157] ? ice_xmit_zc+0x6e/0x150 [ice] [ 256.469437] ice_napi_poll+0x46d/0x680 [ice] [ 256.473815] ? _raw_spin_unlock_irqrestore+0x1b/0x40 [ 256.478863] __napi_poll+0x29/0x160 [ 256.482409] net_rx_action+0x136/0x260 [ 256.486222] __do_softirq+0xe8/0x2e5 [ 256.489853] ? smpboot_thread_fn+0x2c/0x270 [ 256.494108] run_ksoftirqd+0x2a/0x50 [ 256.497747] smpboot_thread_fn+0x1c1/0x270 [ 256.501907] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 256.506594] kthread+0xea/0x120 [ 256.509785] ? __pfx_kthread+0x10/0x10 [ 256.513597] ret_from_fork+0x29/0x50 [ 256.517238] </TASK> In fact, irqs were not disabled and napi managed to be scheduled and run while xsk_pool pointer was still valid, but SW ring of xdp_buff pointers was already freed. To fix this, call ice_qvec_dis_irq() after ice_vsi_stop_tx_ring(). Also while at it, remove redundant ice_clean_rx_ring() call - this is handled in ice_qp_clean_rings(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16Merge branch 'net-virtio-vsock'David S. Miller
2023-03-16test/vsock: copy to user failure testArseniy Krasnov
This adds SOCK_STREAM and SOCK_SEQPACKET tests for invalid buffer case. It tries to read data to NULL buffer (data already presents in socket's queue), then uses valid buffer. For SOCK_STREAM second read must return data, because skbuff is not dropped, but for SOCK_SEQPACKET skbuff will be dropped by kernel, and 'recv()' will return EAGAIN. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16virtio/vsock: don't drop skbuff on copy failureArseniy Krasnov
This returns behaviour of SOCK_STREAM read as before skbuff usage. When copying to user fails current skbuff won't be dropped, but returned to sockets's queue. Technically instead of 'skb_dequeue()', 'skb_peek()' is called and when skbuff becomes empty, it is removed from queue by '__skb_unlink()'. Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16virtio/vsock: remove redundant 'skb_pull()' callArseniy Krasnov
Since we now no longer use 'skb->len' to update credit, there is no sense to update skbuff state, because it is used only once after dequeue to copy data and then will be released. Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16virtio/vsock: don't use skbuff state to account creditArseniy Krasnov
'skb->len' can vary when we partially read the data, this complicates the calculation of credit to be updated in 'virtio_transport_inc_rx_pkt()/ virtio_transport_dec_rx_pkt()'. Also in 'virtio_transport_dec_rx_pkt()' we were miscalculating the credit since 'skb->len' was redundant. For these reasons, let's replace the use of skbuff state to calculate new 'rx_bytes'/'fwd_cnt' values with explicit value as input argument. This makes code more simple, because it is not needed to change skbuff state before each call to update 'rx_bytes'/'fwd_cnt'. Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16block: remove obsolete config BLOCK_COMPATLukas Bulwahn
Before commit bdc1ddad3e5f ("compat_ioctl: block: move blkdev_compat_ioctl() into ioctl.c"), the config BLOCK_COMPAT was used to include compat_ioctl.c into the kernel build. With this commit, the code is moved into ioctl.c and included with the config COMPAT. So, since then, the config BLOCK_COMPAT has no effect and any further purpose. Remove this obsolete config BLOCK_COMPAT. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230316111630.4897-1-lukas.bulwahn@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-16io_uring/rsrc: fix folio accountingPavel Begunkov
| BUG: Bad page state in process kworker/u8:0 pfn:5c001 | page:00000000bfda61c8 refcount:0 mapcount:0 mapping:0000000000000000 index:0x20001 pfn:0x5c001 | head:0000000011409842 order:9 entire_mapcount:0 nr_pages_mapped:0 pincount:1 | anon flags: 0x3fffc00000b0004(uptodate|head|mappedtodisk|swapbacked|node=0|zone=0|lastcpupid=0xffff) | raw: 03fffc0000000000 fffffc0000700001 ffffffff00700903 0000000100000000 | raw: 0000000000000200 0000000000000000 00000000ffffffff 0000000000000000 | head: 03fffc00000b0004 dead000000000100 dead000000000122 ffff00000a809dc1 | head: 0000000000020000 0000000000000000 00000000ffffffff 0000000000000000 | page dumped because: nonzero pincount | CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0-rc2-00001-gc6811bf0cd87 #1 | Hardware name: linux,dummy-virt (DT) | Workqueue: events_unbound io_ring_exit_work | Call trace: | dump_backtrace+0x13c/0x208 | show_stack+0x34/0x58 | dump_stack_lvl+0x150/0x1a8 | dump_stack+0x20/0x30 | bad_page+0xec/0x238 | free_tail_pages_check+0x280/0x350 | free_pcp_prepare+0x60c/0x830 | free_unref_page+0x50/0x498 | free_compound_page+0xcc/0x100 | free_transhuge_page+0x1f0/0x2b8 | destroy_large_folio+0x80/0xc8 | __folio_put+0xc4/0xf8 | gup_put_folio+0xd0/0x250 | unpin_user_page+0xcc/0x128 | io_buffer_unmap+0xec/0x2c0 | __io_sqe_buffers_unregister+0xa4/0x1e0 | io_ring_exit_work+0x68c/0x1188 | process_one_work+0x91c/0x1a58 | worker_thread+0x48c/0xe30 | kthread+0x278/0x2f0 | ret_from_fork+0x10/0x20 Mark reports an issue with the recent patches coalescing compound pages while registering them in io_uring. The reason is that we try to drop excessive references with folio_put_refs(), but pages were acquired with pin_user_pages(), which has extra accounting and so should be put down with matching unpin_user_pages() or at least gup_put_folio(). As a fix unpin_user_pages() all but first page instead, and let's figure out a better API after. Fixes: 57bebf807e2abcf8 ("io_uring/rsrc: optimise registered huge pages") Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/10efd5507d6d1f05ea0f3c601830e08767e189bd.1678980230.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-16fbdev: Use of_property_present() for testing DT property presenceRob Herring
It is preferred to use typed property access functions (i.e. of_property_read_<type> functions) rather than low-level of_get_property/of_find_property functions for reading properties. As part of this, convert of_get_property/of_find_property calls to the recently added of_property_present() helper when we just want to test for presence of a property and nothing more. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Helge Deller <deller@gmx.de>
2023-03-16fbdev: au1200fb: Fix potential divide by zeroWei Chen
var->pixclock can be assigned to zero by user. Without proper check, divide by zero would occur when invoking macro PICOS2KHZ in au1200fb_fb_check_var. Error out if var->pixclock is zero. Signed-off-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-03-16fbdev: lxfb: Fix potential divide by zeroWei Chen
var->pixclock can be assigned to zero by user. Without proper check, divide by zero would occur in lx_set_clock. Error out if var->pixclock is zero. Signed-off-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-03-16fbdev: intelfb: Fix potential divide by zeroWei Chen
Variable var->pixclock is controlled by user and can be assigned to zero. Without proper check, divide by zero would occur in intelfbhw_validate_mode and intelfbhw_mode_to_hw. Error out if var->pixclock is zero. Signed-off-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-03-16fbdev: nvidia: Fix potential divide by zeroWei Chen
variable var->pixclock can be set by user. In case it equals to zero, divide by zero would occur in nvidiafb_set_par. Similar crashes have happened in other fbdev drivers. There is no check and modification on var->pixclock along the call chain to nvidia_check_var and nvidiafb_set_par. We believe it could also be triggered in driver nvidia from user site. Signed-off-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-03-16fbdev: stifb: Provide valid pixelclock and add fb_check_var() checksHelge Deller
Find a valid modeline depending on the machine graphic card configuration and add the fb_check_var() function to validate Xorg provided graphics settings. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org
2023-03-16kbuild: use git-archive for source package creationMasahiro Yamada
Commit 5c3d1d0abb12 ("kbuild: add a tool to list files ignored by git") added a new tool, scripts/list-gitignored. My intention was to create source packages without cleaning the source tree, without relying on git. Linus strongly objected to it, and suggested using 'git archive' instead. [1] [2] [3] This commit goes in that direction - Remove scripts/list-gitignored.c and rewrites Makefiles and scripts to use 'git archive' for building Debian and RPM source packages. It also makes 'make perf-tar*-src-pkg' use 'git archive' again. Going forward, building source packages is only possible in a git-managed tree. Building binary packages does not require git. [1]: https://lore.kernel.org/lkml/CAHk-=wi49sMaC7vY1yMagk7eqLK=1jHeHQ=yZ_k45P=xBccnmA@mail.gmail.com/ [2]: https://lore.kernel.org/lkml/CAHk-=wh5AixGsLeT0qH2oZHKq0FLUTbyTw4qY921L=PwYgoGVw@mail.gmail.com/ [3]: https://lore.kernel.org/lkml/CAHk-=wgM-W6Fu==EoAVCabxyX8eYBz9kNC88-tm9ExRQwA79UQ@mail.gmail.com/ Fixes: 5c3d1d0abb12 ("kbuild: add a tool to list files ignored by git") Fixes: e0ca16749ac3 ("kbuild: make perf-tar*-src-pkg work without relying on git") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-03-16kbuild: rpm-pkg: move source components to rpmbuild/SOURCESMasahiro Yamada
Prepare to add more files to the source RPM. Also, fix the build error when KCONFIG_CONFIG is set: error: Bad file: ./.config: No such file or directory Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-03-16io_uring/msg_ring: let target know allocated indexPavel Begunkov
msg_ring requests transferring files support auto index selection via IORING_FILE_INDEX_ALLOC, however they don't return the selected index to the target ring and there is no other good way for the userspace to know where is the receieved file. Return the index for allocated slots and 0 otherwise, which is consistent with other fixed file installing requests. Cc: stable@vger.kernel.org # v6.0+ Fixes: e6130eba8a848 ("io_uring: add support for passing fixed file descriptors") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://github.com/axboe/liburing/issues/809 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-16Merge tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme into block-6.3Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.3 - avoid potential UAF in nvmet_req_complete (Damien Le Moal) - more quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen) - fix a memory leak in the nvme-pci probe teardown path (Irvin Cote) - repair the MAINTAINERS entry (Lukas Bulwahn) - fix handling single range discard request (Ming Lei) - show more opcode names in trace events (Minwoo Im) - fix nvme-tcp timeout reporting (Sagi Grimberg)" * tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme: nvmet: avoid potential UAF in nvmet_req_complete() nvme-trace: show more opcode names nvme-tcp: add nvme-tcp pdu size build protection nvme-tcp: fix opcode reporting in the timeout handler nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620 nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000 nvme-pci: fixing memory leak in probe teardown path nvme: fix handling single range discard request MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
2023-03-16x86/mm: Fix use of uninitialized buffer in sme_enable()Nikita Zhandarovich
cmdline_find_option() may fail before doing any initialization of the buffer array. This may lead to unpredictable results when the same buffer is used later in calls to strncmp() function. Fix the issue by returning early if cmdline_find_option() returns an error. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20230306160656.14844-1-n.zhandarovich@fintech.ru
2023-03-16xen: remove unnecessary (void*) conversionsYu Zhe
Pointer variables of void * type do not require type cast. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230316083954.4223-1-yuzhe@nfschina.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-03-16Merge tag 'icc-6.3-rc3' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-linus Georgi writes: interconnect fixes for v6.3-rc This contains a bunch of fixes with the highlight being fixes for a race condition that could sometimes occur during the interconnect provider driver registration. There are also fixes for memory overallocation and a memory leak. - interconnect: qcom: osm-l3: fix icc_onecell_data allocation - interconnect: qcom: sm8450: switch to qcom_icc_rpmh_* function - interconnect: qcom: sm8550: switch to qcom_icc_rpmh_* function - interconnect: qcom: qcm2290: Fix MASTER_SNOC_BIMC_NRT - interconnect: fix mem leak when freeing nodes - interconnect: fix icc_provider_del() error handling - interconnect: fix provider registration API - interconnect: imx: fix registration race - interconnect: qcom: osm-l3: fix registration race - interconnect: qcom: rpm: fix probe child-node error handling - interconnect: qcom: rpm: fix registration race - interconnect: qcom: rpmh: fix probe child-node error handling - interconnect: qcom: rpmh: fix registration race - interconnect: qcom: msm8974: fix registration race - interconnect: exynos: fix node leak in probe PM QoS error path - interconnect: exynos: fix registration race - interconnect: exynos: drop redundant link destroy - memory: tegra: fix interconnect registration race - memory: tegra124-emc: fix interconnect registration race - memory: tegra20-emc: fix interconnect registration race - memory: tegra30-emc: fix interconnect registration race Signed-off-by: Georgi Djakov <djakov@kernel.org> * tag 'icc-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc: (21 commits) memory: tegra30-emc: fix interconnect registration race memory: tegra20-emc: fix interconnect registration race memory: tegra124-emc: fix interconnect registration race memory: tegra: fix interconnect registration race interconnect: exynos: drop redundant link destroy interconnect: exynos: fix registration race interconnect: exynos: fix node leak in probe PM QoS error path interconnect: qcom: msm8974: fix registration race interconnect: qcom: rpmh: fix registration race interconnect: qcom: rpmh: fix probe child-node error handling interconnect: qcom: rpm: fix registration race interconnect: qcom: rpm: fix probe child-node error handling interconnect: qcom: osm-l3: fix registration race interconnect: imx: fix registration race interconnect: fix provider registration API interconnect: fix icc_provider_del() error handling interconnect: fix mem leak when freeing nodes interconnect: qcom: qcm2290: Fix MASTER_SNOC_BIMC_NRT interconnect: qcom: sm8550: switch to qcom_icc_rpmh_* function interconnect: qcom: sm8450: switch to qcom_icc_rpmh_* function ...
2023-03-16ata: pata_parport: fix memory leaksOndrej Zary
When ida_alloc() fails, "pi" is not freed although the misleading comment says otherwise. Move the ida_alloc() call up so we really don't have to free "pi" in case of ida_alloc() failure. Also move ida_free() call from pi_remove_one() to pata_parport_dev_release(). It was dereferencing already freed dev pointer. Testing revealed leak even in non-failure case which was tracked down to missing put_device() call after bus_find_device_by_name(). As a result, pata_parport_dev_release() was never called. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202303111822.IHNchbkp-lkp@intel.com/ Signed-off-by: Ondrej Zary <linux@zary.sk> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2023-03-15net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()Vladimir Oltean
Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol() acquire phydev->lock, but the mscc phy driver implementations, vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well, resulting in a deadlock. $ ip link set swp3 down ============================================ WARNING: possible recursive locking detected mscc_felix 0000:00:00.5 swp3: Link is Down -------------------------------------------- ip/375 is trying to acquire lock: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: vsc85xx_wol_get+0x2c/0xf4 but task is already holding lock: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&dev->lock); lock(&dev->lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by ip/375: #0: ffffd43b2a955788 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x144/0x58c #1: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c Call trace: __mutex_lock+0x98/0x454 mutex_lock_nested+0x2c/0x38 vsc85xx_wol_get+0x2c/0xf4 phy_ethtool_get_wol+0x50/0x6c phy_suspend+0x84/0xcc phy_state_machine+0x1b8/0x27c phy_stop+0x70/0x154 phylink_stop+0x34/0xc0 dsa_port_disable_rt+0x2c/0xa4 dsa_slave_close+0x38/0xec __dev_close_many+0xc8/0x16c __dev_change_flags+0xdc/0x218 dev_change_flags+0x24/0x6c do_setlink+0x234/0xea4 __rtnl_newlink+0x46c/0x878 rtnl_newlink+0x50/0x7c rtnetlink_rcv_msg+0x16c/0x58c Removing the mutex_lock(&phydev->lock) calls from the driver restores the functionality. Fixes: 2f987d486610 ("net: phy: Add locks to ethtool functions") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230314153025.2372970-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15veth: Fix use after free in XDP_REDIRECTShawn Bohrer
Commit 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb") introduced a bug where it tried to use pskb_expand_head() if the headroom was less than XDP_PACKET_HEADROOM. This however uses kmalloc to expand the head, which will later allow consume_skb() to free the skb while is it still in use by AF_XDP. Previously if the headroom was less than XDP_PACKET_HEADROOM we continued on to allocate a new skb from pages so this restores that behavior. BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0 Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640 CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1 Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_report+0x170/0x473 ? __xsk_rcv+0x18d/0x2c0 kasan_report+0xad/0x130 ? __xsk_rcv+0x18d/0x2c0 kasan_check_range+0x149/0x1a0 memcpy+0x20/0x60 __xsk_rcv+0x18d/0x2c0 __xsk_map_redirect+0x1f3/0x490 ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] xdp_do_redirect+0x5ca/0xd60 veth_xdp_rcv_skb+0x935/0x1ba0 [veth] ? __netif_receive_skb_list_core+0x671/0x920 ? veth_xdp+0x670/0x670 [veth] veth_xdp_rcv+0x304/0xa20 [veth] ? do_xdp_generic+0x150/0x150 ? veth_xdp_rcv_one+0xde0/0xde0 [veth] ? _raw_spin_lock_bh+0xe0/0xe0 ? newidle_balance+0x887/0xe30 ? __perf_event_task_sched_in+0xdb/0x800 veth_poll+0x139/0x571 [veth] ? veth_xdp_rcv+0xa20/0xa20 [veth] ? _raw_spin_unlock+0x39/0x70 ? finish_task_switch.isra.0+0x17e/0x7d0 ? __switch_to+0x5cf/0x1070 ? __schedule+0x95b/0x2640 ? io_schedule_timeout+0x160/0x160 __napi_poll+0xa1/0x440 napi_threaded_poll+0x3d1/0x460 ? __napi_poll+0x440/0x440 ? __kthread_parkme+0xc6/0x1f0 ? __napi_poll+0x440/0x440 kthread+0x2a2/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Freed by task 148640: kasan_save_stack+0x23/0x50 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x169/0x1d0 slab_free_freelist_hook+0xd2/0x190 __kmem_cache_free+0x1a1/0x2f0 skb_release_data+0x449/0x600 consume_skb+0x9f/0x1c0 veth_xdp_rcv_skb+0x89c/0x1ba0 [veth] veth_xdp_rcv+0x304/0xa20 [veth] veth_poll+0x139/0x571 [veth] __napi_poll+0xa1/0x440 napi_threaded_poll+0x3d1/0x460 kthread+0x2a2/0x340 ret_from_fork+0x22/0x30 The buggy address belongs to the object at ffff888976250000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 340 bytes inside of 2048-byte region [ffff888976250000, ffff888976250800) The buggy address belongs to the physical page: page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250 head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0 flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00 raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 718a18a0c8a6 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb") Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20230314153351.2201328-1-sbohrer@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15hwmon: (ltc2992) Set `can_sleep` flag for GPIO chipLars-Peter Clausen
The ltc2992 drivers uses a mutex and I2C bus access in its GPIO chip `set` and `get` implementation. This means these functions can sleep and the GPIO chip should set the `can_sleep` property to true. This will ensure that a warning is printed when trying to set or get the GPIO value from a context that potentially can't sleep. Fixes: 9ca26df1ba25 ("hwmon: (ltc2992) Add support for GPIOs.") Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Link: https://lore.kernel.org/r/20230314093146.2443845-2-lars@metafoo.de Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-03-15hwmon: (adm1266) Set `can_sleep` flag for GPIO chipLars-Peter Clausen
The adm1266 driver uses I2C bus access in its GPIO chip `set` and `get` implementation. This means these functions can sleep and the GPIO chip should set the `can_sleep` property to true. This will ensure that a warning is printed when trying to set or get the GPIO value from a context that potentially can't sleep. Fixes: d98dfad35c38 ("hwmon: (pmbus/adm1266) Add support for GPIOs") Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Link: https://lore.kernel.org/r/20230314093146.2443845-1-lars@metafoo.de Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-03-15io_uring: rsrc: Optimize return value variable 'ret'Li zeming
The initialization assignment of the variable ret is changed to 0, only in 'goto fail;' Use the ret variable as the function return value. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20230317182538.3027-1-zeming@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15net/mlx5e: TC, Remove error message log printOz Shlomo
The cited commit attempts to update the hw stats when dumping tc actions. However, the driver may be called to update the stats of a police action that may not be in hardware. In such cases the driver will fail to lookup the police action object and will output an error message both to extack and dmesg. The dmesg error is confusing as it may not indicate an actual error. Remove the dmesg error. Fixes: 2b68d659a704 ("net/mlx5e: TC, support per action stats") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: TC, fix cloned flow attributeOz Shlomo
Currently the cloned flow attr resets the original tc action cookies count. Fix that by resetting the cloned flow attribute. Fixes: cca7eac13856 ("net/mlx5e: TC, store tc action cookies per attr") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: TC, fix missing error codeOz Shlomo
Missing error code when mlx5e_tc_act_stats_create fails Fixes: d13674b1d14c ("net/mlx5e: TC, map tc action cookie to a hw counter") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/sched: TC, fix raw counter initializationOz Shlomo
Freed counters may be reused by fs core. As such, raw counters may not be initialized to zero. Cache the counter values when the action stats object is initialized to have a proper base value for calculating the difference from the previous query. Fixes: 2b68d659a704 ("net/mlx5e: TC, support per action stats") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: Lower maximum allowed MTU in XSK to match XDP prerequisitesAdham Faris
XSK redirecting XDP programs require linearity, hence applies restrictions on the MTU. For PAGE_SIZE=4K, MTU shouldn't exceed 3498. Features that contradict with XDP such HW-LRO and HW-GRO are enforced by the driver in advance, during XSK params validation, except for MTU, which was not enforced before this patch. This has been spotted during test scenario described below: Attaching xdpsock program (PAGE_SIZE=4K), with MTU < 3498, detaching XDP program, changing the MTU to arbitrary value in the range [3499, 3754], attaching XDP program again, which ended up with failure since MTU is > 3498. This commit lowers the XSK MTU limitation to be aligned with XDP MTU limitation, since XSK socket is meaningless without XDP program. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: Set BREAK_FW_WAIT flag first when removing driverShay Drory
Currently, BREAK_FW_WAIT flag is set after syncing with fw_reset. However, fw_reset can call mlx5_load_one() which is waiting for fw init bit and BREAK_FW_WAIT flag is intended to stop. e.g.: the driver might wait on a loop it should exit. Fix it by setting the flag before syncing with fw_reset. Fixes: 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: kTLS, Fix missing error unwind on unsupported cipher typeGal Pressman
Do proper error unwinding when adding an unsupported TX/RX cipher type. Move the switch case prior to key creation so there's less to unwind, and change the goto label name to describe the action performed instead of what failed. Fixes: 4960c414db35 ("net/mlx5e: Support 256 bit keys with kTLS device offload") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: Fix cleanup null-ptr deref on encap lockPaul Blakey
During module is unloaded while a peer tc flow is still offloaded, first the peer uplink rep profile is changed to a nic profile, and so neigh encap lock is destroyed. Next during unload, the VF reps netdevs are unregistered which causes the original non-peer tc flow to be deleted, which deletes the peer flow. The peer flow deletion detaches the encap entry and try to take the already destroyed encap lock, causing the below trace. Fix this by clearing peer flows during tc eswitch cleanup (mlx5e_tc_esw_cleanup()). Relevant trace: [ 4316.837128] BUG: kernel NULL pointer dereference, address: 00000000000001d8 [ 4316.842239] RIP: 0010:__mutex_lock+0xb5/0xc40 [ 4316.851897] Call Trace: [ 4316.852481] <TASK> [ 4316.857214] mlx5e_rep_neigh_entry_release+0x93/0x790 [mlx5_core] [ 4316.858258] mlx5e_rep_encap_entry_detach+0xa7/0xf0 [mlx5_core] [ 4316.859134] mlx5e_encap_dealloc+0xa3/0xf0 [mlx5_core] [ 4316.859867] clean_encap_dests.part.0+0x5c/0xe0 [mlx5_core] [ 4316.860605] mlx5e_tc_del_fdb_flow+0x32a/0x810 [mlx5_core] [ 4316.862609] __mlx5e_tc_del_fdb_peer_flow+0x1a2/0x250 [mlx5_core] [ 4316.863394] mlx5e_tc_del_flow+0x(/0x630 [mlx5_core] [ 4316.864090] mlx5e_flow_put+0x5f/0x100 [mlx5_core] [ 4316.864771] mlx5e_delete_flower+0x4de/0xa40 [mlx5_core] [ 4316.865486] tc_setup_cb_reoffload+0x20/0x80 [ 4316.865905] fl_reoffload+0x47c/0x510 [cls_flower] [ 4316.869181] tcf_block_playback_offloads+0x91/0x1d0 [ 4316.869649] tcf_block_unbind+0xe7/0x1b0 [ 4316.870049] tcf_block_offload_cmd.isra.0+0x1ee/0x270 [ 4316.879266] tcf_block_offload_unbind+0x61/0xa0 [ 4316.879711] __tcf_block_put+0xa4/0x310 Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: E-switch, Fix missing set of split_count when forward to ovs ↵Maor Dickman
internal port Rules with mirror actions are split to two FTEs when the actions after the mirror action contains pedit, vlan push/pop or ct. Forward to ovs internal port adds implicit header rewrite (pedit) but missing trigger to do split. Fix by setting split_count when forwarding to ovs internal port which will trigger split in mirror rules. Fixes: 27484f7170ed ("net/mlx5e: Offload tc rules that redirect to ovs internal port") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: E-switch, Fix wrong usage of source port rewrite in split rulesMaor Dickman
In few cases, rules with mirror use case are split to two FTEs, one which do the mirror action and forward to second FTE which do the rest of the rule actions and the second redirect action. In case of mirror rules which do split and forward to ovs internal port or VF stack devices, source port rewrite should be used in the second FTE but it is wrongly also set in the first FTE which break the offload. Fix this issue by removing the wrong check if source port rewrite is needed to be used on the first FTE of the split and instead return EOPNOTSUPP which will block offload of rules which mirror to ovs internal port or VF stack devices which isn't supported. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: Disable eswitch before waiting for VF pagesDaniel Jurgens
The offending commit changed the ordering of moving to legacy mode and waiting for the VF pages. Moving to legacy mode is important in bluefield, because it sends the host driver into error state, and frees its pages. Without this transition we end up waiting 2 minutes for pages that aren't coming before carrying on with the unload process. Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5: Fix setting ec_function bit in MANAGE_PAGESParav Pandit
When ECPF is a page supplier, reclaim pages missed to honor the ec_function bit provided by the firmware. It always used the ec_function to true during driver unload flow for ECPF. This is incorrect. Honor the ec_function bit provided by device during page allocation request event. Fixes: d6945242f45d ("net/mlx5: Hold pages RB tree per VF") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: Don't cache tunnel offloads capabilityParav Pandit
When mlx5e attaches again after device health recovery, the device capabilities might have changed by the eswitch manager. For example in one flow when ECPF changes the eswitch mode between legacy and switchdev, it updates the flow table tunnel capability. The cached value is only used in one place, so just check the capability there instead. Fixes: 5bef709d76a2 ("net/mlx5: Enable host PF HCA after eswitch is initialized") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15net/mlx5e: Fix macsec ASO context alignmentEmeel Hakim
Currently mlx5e_macsec_umr struct does not satisfy hardware memory alignment requirement. Hence the result of querying advanced steering operation (ASO) is not copied to the memory region as expected. Fix by satisfying hardware memory alignment requirement and move context to be first field in struct for better readability. Fixes: 1f53da676439 ("net/mlx5e: Create advanced steering operation (ASO) object for MACsec") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15drm/amdgpu: Don't resume IOMMU after incomplete initFelix Kuehling
Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other kgd2kfd calls. This should fix IOMMU errors on resume from suspend when KFD IOMMU initialization failed. Reported-by: Matt Fagnani <matt.fagnani@bell.net> Link: https://lore.kernel.org/r/4a3b225c-2ffd-e758-4de1-447375e34cad@bell.net/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=217170 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2454 Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info> Cc: stable@vger.kernel.org Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Matt Fagnani <matt.fagnani@bell.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdkfd: Fixed kfd_process cleanup on module exit.David Belanger
Handle case when module is unloaded (kfd_exit) before a process space (mm_struct) is released. v2: Fixed potential race conditions by removing all kfd_process from the process table first, then working on releasing the resources. v3: Fixed loop element access / synchronization. Fixed extra empty lines. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>