git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-02-16	can: xilinx_can: Add ECC support	Srinivas Goud
	Add ECC support for Xilinx CAN Controller, so this driver reports 1bit/2bit ECC errors for FIFO's based on ECC error interrupt. ECC feature for Xilinx CAN Controller selected through 'xlnx,has-ecc' DT property Signed-off-by: Srinivas Goud <srinivas.goud@amd.com> Link: https://lore.kernel.org/all/20240213-xilinx_ecc-v8-2-8d75f8b80771@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-16	dt-bindings: can: xilinx_can: Add 'xlnx,has-ecc' optional property	Srinivas Goud
	ECC feature added to CAN TX_OL, TX_TL and RX FIFOs of Xilinx AXI CAN Controller. ECC is an IP configuration option where counter registers are added in IP for 1bit/2bit ECC errors. 'xlnx,has-ecc' is an optional property and added to Xilinx AXI CAN Controller node if ECC block enabled in the HW Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Srinivas Goud <srinivas.goud@amd.com> Link: https://lore.kernel.org/all/20240213-xilinx_ecc-v8-1-8d75f8b80771@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-16	drm/buddy: Modify duplicate list_splice_tail call	Arunpravin Paneer Selvam
	Remove the duplicate list_splice_tail call when the total_allocated < size condition is true. Cc: <stable@vger.kernel.org> # 6.7+ Fixes: 8746c6c9dfa3 ("drm/buddy: Fix alloc_range() error handling code") Reported-by: Bert Karwatzki <spasswolf@web.de> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240216100048.4101-1-Arunpravin.PaneerSelvam@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-02-16	net: phy: aquantia: add AQR113 PHY ID	Christian Marangi
	Add Aquantia AQR113 PHY ID. Aquantia AQR113 is just a chip size variant of the already supported AQR133C where the only difference is the PHY ID and the hw chip size. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net: ti: icssg-prueth: Remove duplicate cleanup calls in emac_ndo_stop()	Diogo Ivo
	Remove the duplicate calls to prueth_emac_stop() and prueth_cleanup_tx_chns() in emac_ndo_stop(). Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net/sched: act_mirred: don't override retval if we already lost the skb	Jakub Kicinski
	If we're redirecting the skb, and haven't called tcf_mirred_forward(), yet, we need to tell the core to drop the skb by setting the retcode to SHOT. If we have called tcf_mirred_forward(), however, the skb is out of our hands and returning SHOT will lead to UaF. Move the retval override to the error path which actually need it. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Fixes: e5cf1baf92cb ("act_mirred: use TC_ACT_REINSERT when possible") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net/sched: act_mirred: use the backlog for mirred ingress	Jakub Kicinski
	The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog for nested calls to mirred ingress") hangs our testing VMs every 10 or so runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by lockdep. The problem as previously described by Davide (see Link) is that if we reverse flow of traffic with the redirect (egress -> ingress) we may reach the same socket which generated the packet. And we may still be holding its socket lock. The common solution to such deadlocks is to put the packet in the Rx backlog, rather than run the Rx path inline. Do that for all egress -> ingress reversals, not just once we started to nest mirred calls. In the past there was a concern that the backlog indirection will lead to loss of error reporting / less accurate stats. But the current workaround does not seem to address the issue. Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions") Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Suggested-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	tcp: Spelling s/curcuit/circuit/	Geert Uytterhoeven
	Fix a misspelling of "circuit". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net: ethernet: adi: requires PHYLIB support	Randy Dunlap
	This driver uses functions that are supplied by the Kconfig symbol PHYLIB, so select it to ensure that they are built as needed. When CONFIG_ADIN1110=y and CONFIG_PHYLIB=m, there are multiple build (linker) errors that are resolved by this Kconfig change: ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_open': drivers/net/ethernet/adi/adin1110.c:933: undefined reference to `phy_start' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_probe_netdevs': drivers/net/ethernet/adi/adin1110.c:1603: undefined reference to `get_phy_device' ld: drivers/net/ethernet/adi/adin1110.c:1609: undefined reference to `phy_connect' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy': drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect' ld: drivers/net/ethernet/adi/adin1110.o: in function `devm_mdiobus_alloc': include/linux/phy.h:455: undefined reference to `devm_mdiobus_alloc_size' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_register_mdiobus': drivers/net/ethernet/adi/adin1110.c:529: undefined reference to `__devm_mdiobus_register' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_stop': drivers/net/ethernet/adi/adin1110.c:958: undefined reference to `phy_stop' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy': drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_adjust_link': drivers/net/ethernet/adi/adin1110.c:1077: undefined reference to `phy_print_status' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_ioctl': drivers/net/ethernet/adi/adin1110.c:790: undefined reference to `phy_do_ioctl' ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf60): undefined reference to `phy_ethtool_get_link_ksettings' ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf68): undefined reference to `phy_ethtool_set_link_ksettings' Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402070626.eZsfVHG5-lkp@intel.com/ Cc: Lennart Franzen <lennart@lfdomain.com> Cc: Alexandru Tachici <alexandru.tachici@analog.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Reviewed-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	dccp/tcp: Unhash sk from ehash for tb2 alloc failure after check_estalblished().	Kuniyuki Iwashima
	syzkaller reported a warning [0] in inet_csk_destroy_sock() with no repro. WARN_ON(inet_sk(sk)->inet_num && !inet_csk(sk)->icsk_bind_hash); However, the syzkaller's log hinted that connect() failed just before the warning due to FAULT_INJECTION. [1] When connect() is called for an unbound socket, we search for an available ephemeral port. If a bhash bucket exists for the port, we call __inet_check_established() or __inet6_check_established() to check if the bucket is reusable. If reusable, we add the socket into ehash and set inet_sk(sk)->inet_num. Later, we look up the corresponding bhash2 bucket and try to allocate it if it does not exist. Although it rarely occurs in real use, if the allocation fails, we must revert the changes by check_established(). Otherwise, an unconnected socket could illegally occupy an ehash entry. Note that we do not put tw back into ehash because sk might have already responded to a packet for tw and it would be better to free tw earlier under such memory presure. [0]: WARNING: CPU: 0 PID: 350830 at net/ipv4/inet_connection_sock.c:1193 inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) Modules linked in: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) Code: 41 5c 41 5d 41 5e e9 2d 4a 3d fd e8 28 4a 3d fd 48 89 ef e8 f0 cd 7d ff 5b 5d 41 5c 41 5d 41 5e e9 13 4a 3d fd e8 0e 4a 3d fd <0f> 0b e9 61 fe ff ff e8 02 4a 3d fd 4c 89 e7 be 03 00 00 00 e8 05 RSP: 0018:ffffc9000b21fd38 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000009e78 RCX: ffffffff840bae40 RDX: ffff88806e46c600 RSI: ffffffff840bb012 RDI: ffff88811755cca8 RBP: ffff88811755c880 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000000009e78 R11: 0000000000000000 R12: ffff88811755c8e0 R13: ffff88811755c892 R14: ffff88811755c918 R15: 0000000000000000 FS: 00007f03e5243800(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32f21000 CR3: 0000000112ffe001 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ? inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) dccp_close (net/dccp/proto.c:1078) inet_release (net/ipv4/af_inet.c:434) __sock_release (net/socket.c:660) sock_close (net/socket.c:1423) __fput (fs/file_table.c:377) __fput_sync (fs/file_table.c:462) __x64_sys_close (fs/open.c:1557 fs/open.c:1539 fs/open.c:1539) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f03e53852bb Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 43 c9 f5 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 c9 f5 ff 8b 44 RSP: 002b:00000000005dfba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f03e53852bb RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000167c R10: 0000000008a79680 R11: 0000000000000293 R12: 00007f03e4e43000 R13: 00007f03e4e43170 R14: 00007f03e4e43178 R15: 00007f03e4e43170 </TASK> [1]: FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 0 PID: 350833 Comm: syz-executor.1 Not tainted 6.7.0-12272-g2121c43f88f5 #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153) should_failslab (mm/slub.c:3748) kmem_cache_alloc (mm/slub.c:3763 mm/slub.c:3842 mm/slub.c:3867) inet_bind2_bucket_create (net/ipv4/inet_hashtables.c:135) __inet_hash_connect (net/ipv4/inet_hashtables.c:1100) dccp_v4_connect (net/dccp/ipv4.c:116) __inet_stream_connect (net/ipv4/af_inet.c:676) inet_stream_connect (net/ipv4/af_inet.c:747) __sys_connect_file (net/socket.c:2048 (discriminator 2)) __sys_connect (net/socket.c:2065) __x64_sys_connect (net/socket.c:2072) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f03e5284e5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007f03e4641cc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f03e5284e5d RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 000000000000000b R14: 00007f03e52e5530 R15: 0000000000000000 </TASK> Reported-by: syzkaller <syzkaller@googlegroups.com> Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	Merge branch 'bridge-mdb-events'	David S. Miller
	Tobias Waldekranz says: ==================== net: bridge: switchdev: Ensure MDB events are delivered exactly once When a device is attached to a bridge, drivers will request a replay of objects that were created before the device joined the bridge, that are still of interest to the joining port. Typical examples include FDB entries and MDB memberships on other ports ("foreign interfaces") or on the bridge itself. Conversely when a device is detached, the bridge will synthesize deletion events for all those objects that are still live, but no longer applicable to the device in question. This series eliminates two races related to the synching and unsynching phases of a bridge's MDB with a joining or leaving device, that would cause notifications of such objects to be either delivered twice (1/2), or not at all (2/2). A similar race to the one solved by 1/2 still remains for the FDB. This is much harder to solve, due to the lockless operation of the FDB's rhashtable, and is therefore knowingly left out of this series. v1 -> v2: - Squash the previously separate addition of switchdev_port_obj_act_is_deferred into first consumer. - Use ether_addr_equal to compare MAC addresses. - Document switchdev_port_obj_act_is_deferred (renamed from switchdev_port_obj_is_deferred in v1, to indicate that we also match on the action). - Delay allocations of MDB objects until we know they're needed. - Use non-RCU version of the hash list iterator, now that the MDB is not scanned while holding the RCU read lock. - Add Fixes tag to commit message v2 -> v3: - Fix unlocking in error paths - Access RCU protected port list via mlock_dereference, since MDB is guaranteed to remain constant for the duration of the scan. v3 -> v4: - Limit the search for exiting deferred events in 1/2 to only apply to additions, since the problem does not exist in the deletion case. - Add 2/2, to plug a related race when unoffloading an indirectly associated device. v4 -> v5: - Fix grammatical errors in kerneldoc of switchdev_port_obj_act_is_deferred ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net: bridge: switchdev: Ensure deferred event delivery on unoffload	Tobias Waldekranz
	When unoffloading a device, it is important to ensure that all relevant deferred events are delivered to it before it disassociates itself from the bridge. Before this change, this was true for the normal case when a device maps 1:1 to a net_bridge_port, i.e. br0 / swp0 When swp0 leaves br0, the call to switchdev_deferred_process() in del_nbp() makes sure to process any outstanding events while the device is still associated with the bridge. In the case when the association is indirect though, i.e. when the device is attached to the bridge via an intermediate device, like a LAG... br0 / lag0 / swp0 ...then detaching swp0 from lag0 does not cause any net_bridge_port to be deleted, so there was no guarantee that all events had been processed before the device disassociated itself from the bridge. Fix this by always synchronously processing all deferred events before signaling completion of unoffloading back to the driver. Fixes: 4e51bf44a03a ("net: bridge: move the switchdev object replay helpers to "push" mode") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net: bridge: switchdev: Skip MDB replays of deferred events on offload	Tobias Waldekranz
	Before this change, generation of the list of MDB events to replay would race against the creation of new group memberships, either from the IGMP/MLD snooping logic or from user configuration. While new memberships are immediately visible to walkers of br->mdb_list, the notification of their existence to switchdev event subscribers is deferred until a later point in time. So if a replay list was generated during a time that overlapped with such a window, it would also contain a replay of the not-yet-delivered event. The driver would thus receive two copies of what the bridge internally considered to be one single event. On destruction of the bridge, only a single membership deletion event was therefore sent. As a consequence of this, drivers which reference count memberships (at least DSA), would be left with orphan groups in their hardware database when the bridge was destroyed. This is only an issue when replaying additions. While deletion events may still be pending on the deferred queue, they will already have been removed from br->mdb_list, so no duplicates can be generated in that scenario. To a user this meant that old group memberships, from a bridge in which a port was previously attached, could be reanimated (in hardware) when the port joined a new bridge, without the new bridge's knowledge. For example, on an mv88e6xxx system, create a snooping bridge and immediately add a port to it: root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \ > ip link set dev x3 up master br0 And then destroy the bridge: root@infix-06-0b-00:~$ ip link del dev br0 root@infix-06-0b-00:~$ mvls atu ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a DEV:0 Marvell 88E6393X 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . . 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . . ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a root@infix-06-0b-00:~$ The two IPv6 groups remain in the hardware database because the port (x3) is notified of the host's membership twice: once via the original event and once via a replay. Since only a single delete notification is sent, the count remains at 1 when the bridge is destroyed. Then add the same port (or another port belonging to the same hardware domain) to a new bridge, this time with snooping disabled: root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \ > ip link set dev x3 up master br1 All multicast, including the two IPv6 groups from br0, should now be flooded, according to the policy of br1. But instead the old memberships are still active in the hardware database, causing the switch to only forward traffic to those groups towards the CPU (port 0). Eliminate the race in two steps: 1. Grab the write-side lock of the MDB while generating the replay list. This prevents new memberships from showing up while we are generating the replay list. But it leaves the scenario in which a deferred event was already generated, but not delivered, before we grabbed the lock. Therefore: 2. Make sure that no deferred version of a replay event is already enqueued to the switchdev deferred queue, before adding it to the replay list, when replaying additions. Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net/iucv: fix the allocation size of iucv_path_table array	Alexander Gordeev
	iucv_path_table is a dynamically allocated array of pointers to struct iucv_path items. Yet, its size is calculated as if it was an array of struct iucv_path items. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	Merge branch 'ionic-xdp-support'	David S. Miller
	Shannon Nelson says: ==================== ionic: add XDP support This patchset is new support in ionic for XDP processing, including basic XDP on Rx packets, TX and REDIRECT, and frags for jumbo frames. Since ionic has not yet been converted to use the page_pool APIs, this uses the simple MEM_TYPE_PAGE_ORDER0 buffering. There are plans to convert the driver in the near future. v4: - removed "inline" from short utility functions - changed to use "goto err_out" in ionic_xdp_register_rxq_info() - added "continue" to reduce nesting in ionic_xdp_queues_config() - used xdp_prog in ionic_rx_clean() to flag whether or not to sync the rx buffer after calling ionix_xdp_run() - swapped order of XDP_TX and XDP_REDIRECT cases in ionic_xdp_run() to make patch 6 a little cleaner v3: https://lore.kernel.org/netdev/20240210004827.53814-1-shannon.nelson@amd.com/ - removed budget==0 patch, sent it separately to net v2: https://lore.kernel.org/netdev/20240208005725.65134-1-shannon.nelson@amd.com/ - added calls to txq_trans_cond_update() - added a new patch to catch NAPI budget==0 v1: https://lore.kernel.org/netdev/20240130013042.11586-1-shannon.nelson@amd.com/ RFC: https://lore.kernel.org/netdev/20240118192500.58665-1-shannon.nelson@amd.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	ionic: implement xdp frags support	Shannon Nelson
	Add support for using scatter-gather / frags in XDP in both Rx and Tx paths. Co-developed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	ionic: add ndo_xdp_xmit	Shannon Nelson
	When our ndo_xdp_xmit is called we mark the buffer with XDP_REDIRECT so we know to return it to the XDP stack for cleaning. Co-developed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	ionic: Add XDP_REDIRECT support	Shannon Nelson
	The XDP_REDIRECT packets are given to the XDP stack and we drop the use of the related page: it will get freed by the driver that ends up doing the Tx. Because we have some hardware configurations with limited queue resources, we use the existing datapath Tx queues rather than creating and managing a separate set of xdp_tx queues. Co-developed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	ionic: Add XDP_TX support	Shannon Nelson
	The XDP_TX packets get fed back into the Rx queue's partnered Tx queue as an xdp_frame. Co-developed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	ionic: Add XDP packet headroom	Shannon Nelson
	If an xdp program is loaded, add headroom at the beginning of the frame to allow for editing and insertions that an XDP program might need room for, and tailroom used later for XDP frame tracking. These are only needed in the first Rx buffer in a packet, not for any trailing frags. Co-developed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	ionic: add initial framework for XDP support	Shannon Nelson
	Set up the basics for running Rx packets through XDP programs. Add new queue setup and teardown steps for adding/removing an XDP program, and add the call to run the XDP on a packet. The XDP frame size needs to be the MTU plus standard ethernet header, plus head room for XDP scribblings and tail room for a struct skb_shared_info. Also, at this point, we don't support XDP frags, only a single contiguous Rx buffer. This means that our page splitting is not very useful, so when XDP is in use we need to use the full Rx buffer size and not do sharing. Co-developed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	ionic: use dma range APIs	Shannon Nelson
	Convert Rx datapath handling to use the DMA range APIs in preparation for adding XDP handling. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	ionic: add helpers for accessing buffer info	Shannon Nelson
	These helpers clean up some of the code around DMA mapping and other buffer references, and will be used in the next few patches for the XDP support. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	ionic: set adminq irq affinity	Shannon Nelson
	We claim to have the AdminQ on our irq0 and thus cpu id 0, but we need to be sure we set the affinity hint to try to keep it there. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net/iucv: fix virtual vs physical address confusion	Alexander Gordeev
	Fix virtual vs physical address confusion. This does not fix a bug since virtual and physical address spaces are currently the same. Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	Merge branch 'ravb-rutime-PM-support'	David S. Miller
	Claudiu Beznea says: ==================== net: ravb: Add runtime PM support (part 2) Series adds runtime PM support for the ravb driver. This is a continuation of [1]. There are 5 more preparation patches (patches 1-5) and patch 6 adds runtime PM support. Patches in this series were part of [2]. Changes in v4: - remove unnecessary code from patch 4/6 - improve the code in patch 5/6 Changes in v3: - fixed typos - added patch "net: ravb: Move the update of ndev->features to ravb_set_features()" - changes title of patch "net: ravb: Do not apply RX checksum settings to hardware if the interface is down" from v2 into "net: ravb: Do not apply features to hardware if the interface is down", changed patch description and updated the patch - collected tags Changes in v2: - address review comments - in patch 4/5 take into account the latest changes introduced in ravb_set_features_gbeth() Changes since [2]: - patch 1/5 is new - use pm_runtime_get_noresume() and pm_runtime_active() in patches 3/5, 4/5 - fixed higlighted typos in patch 4/5 [1] https://lore.kernel.org/all/20240202084136.3426492-1-claudiu.beznea.uj@bp.renesas.com/ [2] https://lore.kernel.org/all/20240105082339.1468817-1-claudiu.beznea.uj@bp.renesas.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net: ravb: Add runtime PM support	Claudiu Beznea
	Add runtime PM support for the ravb driver. As the driver is used by different IP variants, with different behaviors, to be able to have the runtime PM support available for all devices, the preparatory commits moved all the resources parsing and allocations in the driver's probe function and kept the settings for ravb_open(). This is due to the fact that on some IP variants-platforms tuples disabling/enabling the clocks will switch the IP to the reset operation mode where register contents is lost and reconfiguration needs to be done. For this the rabv_open() function enables the clocks, switches the IP to configuration mode, applies all the register settings and switches the IP to the operational mode. At the end of ravb_open() IP is ready to send/receive data. In ravb_close() necessary reverts are done (compared with ravb_open()), the IP is switched to reset mode and clocks are disabled. The ethtool APIs or IOCTLs that might execute while the interface is down are either cached (and applied in ravb_open()) or rejected (as at that time the IP is in reset mode). Keeping the IP in the reset mode also increases the power saved (according to the hardware manual). Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net: ravb: Do not apply features to hardware if the interface is down	Claudiu Beznea
	Do not apply features to hardware if the interface is down. In case runtime PM is enabled, and while the interface is down, the IP will be in reset mode (as for some platforms disabling the clocks will switch the IP to reset mode, which will lead to losing register contents) and applying settings in reset mode is not an option. Instead, cache the features and apply them in ravb_open() through ravb_emac_init(). To avoid accessing the hardware while the interface is down pm_runtime_active() check was introduced. Along with it the device runtime PM usage counter has been incremented to avoid disabling the device clocks while the check is in progress (if any). Commit prepares for the addition of runtime PM. Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net: ravb: Move the update of ndev->features to ravb_set_features()	Claudiu Beznea
	Commit c2da9408579d ("ravb: Add Rx checksum offload support for GbEth") introduced support for setting GbEth features. With this the IP-specific features update functions update the ndev->features individually. Next commits add runtime PM support for the ravb driver. The runtime PM implementation will enable/disable the IP clocks on the ravb_open()/ravb_close() functions. Accessing the IP registers with clocks disabled blocks the system. The ravb_set_features() function could be executed when the Ethernet interface is closed so we need to ensure we don't access IP registers while the interface is down when runtime PM support will be in place. For these, move the update of ndev->features to ravb_set_features(). In this way we update the ndev->features only when the IP-specific features set function returns success and we can avoid code duplication when introducing runtime PM registers protection. Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net: ravb: Return cached statistics if the interface is down	Claudiu Beznea
	Return the cached statistics in case the interface is down. There should be no drawback to this, as cached statistics are updated in ravb_close(). In order to avoid accessing the IP registers while the IP is runtime suspended pm_runtime_active() check was introduced. The device runtime PM usage counter has been incremented to avoid disabling the device clocks while the check is in progress (if any). The commit prepares the code for the addition of runtime PM support. Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net: ravb: Keep the reverse order of operations in ravb_close()	Claudiu Beznea
	Keep the reverse order of operations in ravb_close() when compared with ravb_open(). This is the recommended configuration sequence. Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	net: ravb: Get rid of the temporary variable irq	Claudiu Beznea
	The 4th argument of ravb_setup_irq() is used to save the IRQ number that will be further used by the driver code. Not all ravb_setup_irqs() calls need to save the IRQ number. The previous code used to pass a dummy variable as the 4th argument in case the IRQ is not needed for further usage. That is not necessary as the code from ravb_setup_irq() can detect by itself if the IRQ needs to be saved. Thus, get rid of the code that is not needed. Reported-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	selftests: bonding: make sure new active is not null	Hangbin Liu
	One of Jakub's tests[1] shows that there may be period all ports are down and no active slave. This makes the new_active_slave null and the test fails. Add a check to make sure the new active is not null. [ 189.051966] br0: port 2(s1) entered disabled state [ 189.317881] bond0: (slave eth1): link status definitely down, disabling slave [ 189.318487] bond0: (slave eth2): making interface the new active one [ 190.435430] br0: port 4(s2) entered disabled state [ 190.773786] bond0: (slave eth0): link status definitely down, disabling slave [ 190.774204] bond0: (slave eth2): link status definitely down, disabling slave [ 190.774715] bond0: now running without any active interface! [ 190.877760] bond0: (slave eth0): link status definitely up [ 190.878098] bond0: (slave eth0): making interface the new active one [ 190.878495] bond0: active interface up! [ 191.802872] br0: port 4(s2) entered blocking state [ 191.803157] br0: port 4(s2) entered forwarding state [ 191.813756] bond0: (slave eth2): link status definitely up [ 192.847095] br0: port 2(s1) entered blocking state [ 192.847396] br0: port 2(s1) entered forwarding state [ 192.853740] bond0: (slave eth1): link status definitely up # TEST: prio (active-backup ns_ip6_target primary_reselect 1) [FAIL] # Current active slave is null but not eth0 [1] https://netdev-3.bots.linux.dev/vmksft-bonding/results/464481/1-bond-options-sh/stdout Fixes: 45bf79bc56c4 ("selftests: bonding: reduce garp_test/arp_validate test time") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16	Merge tag 'drm-msm-fixes-2024-02-15' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/msm into drm-fixes Fixes for v6.8-rc5 GPU: - dmabuf vmap fix - a610 UBWC corruption fix (incorrect hbb) - revert a commit that was making GPU recovery unreliable - tlb invalidation fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGszDSiw66+a=ttBr-hat+zrcBtfc_cZ4LQqXu89DJ0UeQ@mail.gmail.com
2024-02-16	Merge tag 'amd-drm-fixes-6.8-2024-02-15-2' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.8-2024-02-15-2: amdgpu: - PSR fixes - Suspend/resume fixes - Link training fix - Aspect ratio fix - DCN 3.5 fixes - VCN 4.x fix - GFX 11 fix - Misc display fixes - Misc small fixes amdkfd: - Cache size reporting fix - SIMD distribution fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240215192452.11805-1-alexander.deucher@amd.com
2024-02-16	Merge tag 'drm-xe-fixes-2024-02-15' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix an out-of-bounds shift. - Fix the display code thinking xe uses shmem - Fix a warning about index out-of-bound - Fix a clang-16 compilation warning Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zc4GpcrbFVqdK9Ws@fedora
2024-02-16	Merge tag 'drm-intel-fixes-2024-02-15' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Fix for #10172: Blank screen on JSL Chromebooks. Stable fix to limit DP SST link rate to <=8.1Gbps. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zc37W27F5OvoeSkG@jlahtine-mobl.ger.corp.intel.com
2024-02-15	net: dsa: remove OF-based MDIO bus registration from DSA core	Arınç ÜNAL
	The code block under the "!ds->user_mii_bus && ds->ops->phy_read" check under dsa_switch_setup() populates ds->user_mii_bus. The use of ds->user_mii_bus is inappropriate when the MDIO bus of the switch is described on the device tree [1]. For this reason, use this code block only for switches [with MDIO bus] probed on platform_data, and OF which the switch MDIO bus isn't described on the device tree. Therefore, remove OF-based MDIO bus registration as it's useless for these cases. These subdrivers which control switches [with MDIO bus] probed on OF, will lose the ability to register the MDIO bus OF-based: drivers/net/dsa/b53/b53_common.c drivers/net/dsa/lan9303-core.c drivers/net/dsa/vitesse-vsc73xx-core.c These subdrivers let the DSA core driver register the bus: - ds->ops->phy_read() and ds->ops->phy_write() are present. - ds->user_mii_bus is not populated. The commit fe7324b93222 ("net: dsa: OF-ware slave_mii_bus") which brought OF-based MDIO bus registration on the DSA core driver is reasonably recent and, in this time frame, there have been no device trees in the Linux repository that started describing the MDIO bus, or dt-bindings defining the MDIO bus for the switches these subdrivers control. So I don't expect any devices to be affected. The logic we encourage is that all subdrivers should register the switch MDIO bus on their own [2]. And, for subdrivers which control switches [with MDIO bus] probed on OF, this logic must be followed to support all cases properly: No switch MDIO bus defined: Populate ds->user_mii_bus, register the MDIO bus, set the interrupts for PHYs if "interrupt-controller" is defined at the switch node. This case should only be covered for the switches which their dt-bindings documentation didn't document the MDIO bus from the start. This is to keep supporting the device trees that do not describe the MDIO bus on the device tree but the MDIO bus is being used nonetheless. Switch MDIO bus defined: Don't populate ds->user_mii_bus, register the MDIO bus, set the interrupts for PHYs if ["interrupt-controller" is defined at the switch node and "interrupts" is defined at the PHY nodes under the switch MDIO bus node]. Switch MDIO bus defined but explicitly disabled: If the device tree says status = "disabled" for the MDIO bus, we shouldn't need an MDIO bus at all. Instead, just exit as early as possible and do not call any MDIO API. After all subdrivers that control switches with MDIO buses are made to register the MDIO buses on their own, we will be able to get rid of dsa_switch_ops :: phy_read() and :: phy_write(), and the code block for registering the MDIO bus on the DSA core driver. Link: https://lore.kernel.org/netdev/20231213120656.x46fyad6ls7sqyzv@skbuf/ [1] Link: https://lore.kernel.org/netdev/20240103184459.dcbh57wdnlox6w7d@skbuf/ [2] Suggested-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20240213-for-netnext-dsa-mdio-bus-v2-1-0ff6f4823a9e@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-16	Merge tag 'drm-misc-fixes-2024-02-15' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A suspend/resume error fix for ivpu, a couple of scheduler fixes for nouveau, a patch to support large page arrays in prime, a uninitialized variable fix in crtc, a locking fix in rockchip/vop2 and a buddy allocator error reporting fix. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/b4ffqzigtfh6cgzdpwuk6jlrv3dnk4hu6etiizgvibysqgtl2p@42n2gdfdd5eu
2024-02-15	smb: Fix regression in writes when non-standard maximum write size negotiated	Steve French
	The conversion to netfs in the 6.3 kernel caused a regression when maximum write size is set by the server to an unexpected value which is not a multiple of 4096 (similarly if the user overrides the maximum write size by setting mount parm "wsize", but sets it to a value that is not a multiple of 4096). When negotiated write size is not a multiple of 4096 the netfs code can skip the end of the final page when doing large sequential writes, causing data corruption. This section of code is being rewritten/removed due to a large netfs change, but until that point (ie for the 6.3 kernel until now) we can not support non-standard maximum write sizes. Add a warning if a user specifies a wsize on mount that is not a multiple of 4096 (and round down), also add a change where we round down the maximum write size if the server negotiates a value that is not a multiple of 4096 (we also have to check to make sure that we do not round it down to zero). Reported-by: R. Diez" <rdiez-2006@rd10.de> Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") Suggested-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Acked-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Cc: stable@vger.kernel.org # v6.3+ Cc: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-15	Merge branch 'fix-the-read-of-vsyscall-page-through-bpf'	Alexei Starovoitov
	Hou Tao says: ==================== Fix the read of vsyscall page through bpf From: Hou Tao <houtao1@huawei.com> Hi, As reported by syzboot [1] and [2], when trying to read vsyscall page by using bpf_probe_read_kernel() or bpf_probe_read(), oops may happen. Thomas Gleixner had proposed a test patch [3], but it seems that no formal patch is posted after about one month [4], so I post it instead and add an Originally-by tag in patch #2. Patch #1 makes is_vsyscall_vaddr() being a common helper. Patch #2 fixes the problem by disallowing vsyscall page read for copy_from_kernel_nofault(). Patch #3 adds one test case to ensure the read of vsyscall page through bpf is rejected. Please see individual patches for more details. Comments are always welcome. [1]: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com/ [2]: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com/ [3]: https://lore.kernel.org/bpf/87r0jwquhv.ffs@tglx/ [4]: https://lore.kernel.org/bpf/e24b125c-8ff4-9031-6c53-67ff2e01f316@huaweicloud.com/ Change Log: v3: * rephrase commit message for patch #1 & #2 (Sohil) * reword comments in copy_from_kernel_nofault_allowed() (Sohil) * add Rvb tag for patch #1 and Acked-by tag for patch #3 (Sohil, Yonghong) v2: https://lore.kernel.org/bpf/20240126115423.3943360-1-houtao@huaweicloud.com/ * move is_vsyscall_vaddr to asm/vsyscall.h instead (Sohil) * elaborate on the reason for disallowing of vsyscall page read in copy_from_kernel_nofault_allowed() (Sohil) * update the commit message of patch #2 to more clearly explain how the oops occurs. (Sohil) * update the commit message of patch #3 to explain the expected return values of various bpf helpers (Yonghong) v1: https://lore.kernel.org/bpf/20240119073019.1528573-1-houtao@huaweicloud.com/ ==================== Link: https://lore.kernel.org/r/20240202103935.3154011-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-15	selftest/bpf: Test the read of vsyscall page under x86-64	Hou Tao
	Under x86-64, when using bpf_probe_read_kernel{_str}() or bpf_probe_read{_str}() to read vsyscall page, the read may trigger oops, so add one test case to ensure that the problem is fixed. Beside those four bpf helpers mentioned above, testing the read of vsyscall page by using bpf_probe_read_user{_str} and bpf_copy_from_user{_task}() as well. The test case passes the address of vsyscall page to these six helpers and checks whether the returned values are expected: 1) For bpf_probe_read_kernel{_str}()/bpf_probe_read{_str}(), the expected return value is -ERANGE as shown below: bpf_probe_read_kernel_common copy_from_kernel_nofault // false, return -ERANGE copy_from_kernel_nofault_allowed 2) For bpf_probe_read_user{_str}(), the expected return value is -EFAULT as show below: bpf_probe_read_user_common copy_from_user_nofault // false, return -EFAULT __access_ok 3) For bpf_copy_from_user(), the expected return value is -EFAULT: // return -EFAULT bpf_copy_from_user copy_from_user _copy_from_user // return false access_ok 4) For bpf_copy_from_user_task(), the expected return value is -EFAULT: // return -EFAULT bpf_copy_from_user_task access_process_vm // return 0 vma_lookup() // return 0 expand_stack() The occurrence of oops depends on the availability of CPU SMAP [1] feature and there are three possible configurations of vsyscall page in the boot cmd-line: vsyscall={xonly\|none\|emulate}, so there are a total of six possible combinations. Under all these combinations, the test case runs successfully. [1]: https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20240202103935.3154011-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-15	x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()	Hou Tao
	When trying to use copy_from_kernel_nofault() to read vsyscall page through a bpf program, the following oops was reported: BUG: unable to handle page fault for address: ffffffffff600000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:copy_from_kernel_nofault+0x6f/0x110 ...... Call Trace: <TASK> ? copy_from_kernel_nofault+0x6f/0x110 bpf_probe_read_kernel+0x1d/0x50 bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d trace_call_bpf+0xc5/0x1c0 perf_call_bpf_enter.isra.0+0x69/0xb0 perf_syscall_enter+0x13e/0x200 syscall_trace_enter+0x188/0x1c0 do_syscall_64+0xb5/0xe0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 </TASK> ...... ---[ end trace 0000000000000000 ]--- The oops is triggered when: 1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall page and invokes copy_from_kernel_nofault() which in turn calls __get_user_asm(). 2) Because the vsyscall page address is not readable from kernel space, a page fault exception is triggered accordingly. 3) handle_page_fault() considers the vsyscall page address as a user space address instead of a kernel space address. This results in the fix-up setup by bpf not being applied and a page_fault_oops() is invoked due to SMAP. Considering handle_page_fault() has already considered the vsyscall page address as a userspace address, fix the problem by disallowing vsyscall page read for copy_from_kernel_nofault(). Originally-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: syzbot+72aa0161922eba61b50e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240202103935.3154011-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-15	x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h	Hou Tao
	Move is_vsyscall_vaddr() into asm/vsyscall.h to make it available for copy_from_kernel_nofault_allowed() in arch/x86/mm/maccess.c. Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20240202103935.3154011-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-15	Merge branch 'for-thermal-genetlink-family-bind-unbind-callbacks'	Jakub Kicinski
	Stanislaw Gruszka says: ==================== thermal/netlink/intel_hfi: Enable HFI feature only when required The patchset introduces a new genetlink family bind/unbind callbacks and thermal/netlink notifications, which allow drivers to send netlink multicast events based on the presence of actual user-space consumers. This functionality optimizes resource usage by allowing disabling of features when not needed. v1: https://lore.kernel.org/linux-pm/20240131120535.933424-1-stanislaw.gruszka@linux.intel.com// v2: https://lore.kernel.org/linux-pm/20240206133605.1518373-1-stanislaw.gruszka@linux.intel.com/ v3: https://lore.kernel.org/linux-pm/20240209120625.1775017-1-stanislaw.gruszka@linux.intel.com/ ==================== Link: https://lore.kernel.org/r/20240212161615.161935-1-stanislaw.gruszka@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15	genetlink: Add per family bind/unbind callbacks	Stanislaw Gruszka
	Add genetlink family bind()/unbind() callbacks when adding/removing multicast group to/from netlink client socket via setsockopt() or bind() syscall. They can be used to track if consumers of netlink multicast messages emerge or disappear. Thus, a client implementing callbacks, can now send events only when there are active consumers, preventing unnecessary work when none exist. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240212161615.161935-2-stanislaw.gruszka@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15	configs/debug: add NET debug config	Matthieu Baerts (NGI0)
	The debug.config file is really great to easily enable a bunch of general debugging features on a CI-like setup. But it would be great to also include core networking debugging config. A few CI's validating features from the Net tree also enable a few other debugging options on top of debug.config. A small selection is quite generic for the whole net tree. They validate some assumptions in different parts of the core net tree. As suggested by Jakub Kicinski in [1], having them added to this debug.config file would help other CIs using network features to find bugs in this area. Note that the two REFCNT configs also select REF_TRACKER, which doesn't seem to be an issue. Link: https://lore.kernel.org/netdev/20240202093148.33bd2b14@kernel.org/T/ [1] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240212-kconfig-debug-enable-net-v1-1-fb026de8174c@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-16	zonefs: Improve error handling	Damien Le Moal
	Write error handling is racy and can sometime lead to the error recovery path wrongly changing the inode size of a sequential zone file to an incorrect value which results in garbage data being readable at the end of a file. There are 2 problems: 1) zonefs_file_dio_write() updates a zone file write pointer offset after issuing a direct IO with iomap_dio_rw(). This update is done only if the IO succeed for synchronous direct writes. However, for asynchronous direct writes, the update is done without waiting for the IO completion so that the next asynchronous IO can be immediately issued. However, if an asynchronous IO completes with a failure right before the i_truncate_mutex lock protecting the update, the update may change the value of the inode write pointer offset that was corrected by the error path (zonefs_io_error() function). 2) zonefs_io_error() is called when a read or write error occurs. This function executes a report zone operation using the callback function zonefs_io_error_cb(), which does all the error recovery handling based on the current zone condition, write pointer position and according to the mount options being used. However, depending on the zoned device being used, a report zone callback may be executed in a context that is different from the context of __zonefs_io_error(). As a result, zonefs_io_error_cb() may be executed without the inode truncate mutex lock held, which can lead to invalid error processing. Fix both problems as follows: - Problem 1: Perform the inode write pointer offset update before a direct write is issued with iomap_dio_rw(). This is safe to do as partial direct writes are not supported (IOMAP_DIO_PARTIAL is not set) and any failed IO will trigger the execution of zonefs_io_error() which will correct the inode write pointer offset to reflect the current state of the one on the device. - Problem 2: Change zonefs_io_error_cb() into zonefs_handle_io_error() and call this function directly from __zonefs_io_error() after obtaining the zone information using blkdev_report_zones() with a simple callback function that copies to a local stack variable the struct blk_zone obtained from the device. This ensures that error handling is performed holding the inode truncate mutex. This change also simplifies error handling for conventional zone files by bypassing the execution of report zones entirely. This is safe to do because the condition of conventional zones cannot be read-only or offline and conventional zone files are always fully mapped with a constant file size. Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
2024-02-15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/dev.c 9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()") 723de3ebef03 ("net: free altname using an RCU callback") net/unix/garbage.c 11498715f266 ("af_unix: Remove io_uring code for GC.") 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.") drivers/net/ethernet/renesas/ravb_main.c ed4adc07207d ("net: ravb: Count packets instead of descriptors in GbEth RX path" ) c2da9408579d ("ravb: Add Rx checksum offload support for GbEth") net/mptcp/protocol.c bdd70eb68913 ("mptcp: drop the push_pending field") 28e5c1380506 ("mptcp: annotate lockless accesses around read-mostly fields") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15	md: Don't suspend the array for interrupted reshape	Yu Kuai
	md_start_sync() will suspend the array if there are spares that can be added or removed from conf, however, if reshape is still in progress, this won't happen at all or data will be corrupted(remove_and_add_spares won't be called from md_choose_sync_action for reshape), hence there is no need to suspend the array if reshape is not done yet. Meanwhile, there is a potential deadlock for raid456: 1) reshape is interrupted; 2) set one of the disk WantReplacement, and add a new disk to the array, however, recovery won't start until the reshape is finished; 3) then issue an IO across reshpae position, this IO will wait for reshape to make progress; 4) continue to reshape, then md_start_sync() found there is a spare disk that can be added to conf, mddev_suspend() is called; Step 4 and step 3 is waiting for each other, deadlock triggered. Noted this problem is found by code review, and it's not reporduced yet. Fix this porblem by don't suspend the array for interrupted reshape, this is safe because conf won't be changed until reshape is done. Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240201092559.910982-6-yukuai1@huaweicloud.com