summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-15ndisc: ndisc_send_redirect() cleanupEric Dumazet
ndisc_send_redirect() is always called under rcu_read_lock(). It can use dev_net_rcu() and avoid one redundant rcu_read_lock()/rcu_read_unlock() pair. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214140705.2105890-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-15net/sched: cls_api: fix error handling causing NULL dereferencePierre Riteau
tcf_exts_miss_cookie_base_alloc() calls xa_alloc_cyclic() which can return 1 if the allocation succeeded after wrapping. This was treated as an error, with value 1 returned to caller tcf_exts_init_ex() which sets exts->actions to NULL and returns 1 to caller fl_change(). fl_change() treats err == 1 as success, calling tcf_exts_validate_ex() which calls tcf_action_init() with exts->actions as argument, where it is dereferenced. Example trace: BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 114 PID: 16151 Comm: handler114 Kdump: loaded Not tainted 5.14.0-503.16.1.el9_5.x86_64 #1 RIP: 0010:tcf_action_init+0x1f8/0x2c0 Call Trace: tcf_action_init+0x1f8/0x2c0 tcf_exts_validate_ex+0x175/0x190 fl_change+0x537/0x1120 [cls_flower] Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action") Signed-off-by: Pierre Riteau <pierre@stackhpc.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250213223610.320278-1-pierre@stackhpc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-15Merge tag 'gpio-fixes-for-v6.14-rc3-take2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix interrupt handling issues in gpio-bcm-kona - add an ACPI quirk for Acer Nitro ANV14 fixing an issue with spurious wake up events - add missing return value checks to gpio-stmpe - fix a crash in error path in gpiochip_get_ngpios() * tag 'gpio-fixes-for-v6.14-rc3-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: Fix crash on error in gpiochip_get_ngpios() gpio: stmpe: Check return value of stmpe_reg_read in stmpe_gpio_irq_sync_unlock gpiolib: acpi: Add a quirk for Acer Nitro ANV14 gpio: bcm-kona: Add missing newline to dev_err format string gpio: bcm-kona: Make sure GPIO bits are unlocked when requesting IRQ gpio: bcm-kona: Fix GPIO lock/unlock for banks above bank 0
2025-02-15io_uring: prevent opcode speculationPavel Begunkov
sqe->opcode is used for different tables, make sure we santitise it against speculations. Cc: stable@vger.kernel.org Fixes: d3656344fea03 ("io_uring: add lookup table for various opcode needs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Li Zetao <lizetao1@huawei.com> Link: https://lore.kernel.org/r/7eddbf31c8ca0a3947f8ed98271acc2b4349c016.1739568408.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-15kbuild: fix linux-headers package build when $(CC) cannot link userspaceMasahiro Yamada
Since commit 5f73e7d0386d ("kbuild: refactor cross-compiling linux-headers package"), the linux-headers Debian package fails to build when $(CC) cannot build userspace applications, for example, when using toolchains installed by the 0day bot. The host programs in the linux-headers package should be rebuilt using the disto's cross-compiler, ${DEB_HOST_GNU_TYPE}-gcc instead of $(CC). Hence, the variable 'CC' must be expanded in this shell script instead of in the top-level Makefile. Commit f354fc88a72a ("kbuild: install-extmod-build: add missing quotation marks for CC variable") was not a correct fix because CC="ccache gcc" should be unrelated when rebuilding userspace tools. Fixes: 5f73e7d0386d ("kbuild: refactor cross-compiling linux-headers package") Reported-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Closes: https://lore.kernel.org/linux-kbuild/CAK7LNARb3xO3ptBWOMpwKcyf3=zkfhMey5H2KnB1dOmUwM79dA@mail.gmail.com/T/#t Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-02-15tools: fix annoying "mkdir -p ..." logs when building tools in parallelMasahiro Yamada
When CONFIG_OBJTOOL=y or CONFIG_DEBUG_INFO_BTF=y, parallel builds show awkward "mkdir -p ..." logs. $ make -j16 [ snip ] mkdir -p /home/masahiro/ref/linux/tools/objtool && make O=/home/masahiro/ref/linux subdir=tools/objtool --no-print-directory -C objtool mkdir -p /home/masahiro/ref/linux/tools/bpf/resolve_btfids && make O=/home/masahiro/ref/linux subdir=tools/bpf/resolve_btfids --no-print-directory -C bpf/resolve_btfids Defining MAKEFLAGS=<value> on the command line wipes out command line switches from the resultant MAKEFLAGS definition, even though the command line switches are active. [1] MAKEFLAGS puts all single-letter options into the first word, and that word will be empty if no single-letter options were given. [2] However, this breaks if MAKEFLAGS=<value> is given on the command line. The tools/ and tools/% targets set MAKEFLAGS=<value> on the command line, which breaks the following code in tools/scripts/Makefile.include: short-opts := $(firstword -$(MAKEFLAGS)) If MAKEFLAGS really needs modification, it should be done through the environment variable, as follows: MAKEFLAGS=<value> $(MAKE) ... That said, I question whether modifying MAKEFLAGS is necessary here. The only flag we might want to exclude is --no-print-directory, as the tools build system changes the working directory. However, people might find the "Entering/Leaving directory" logs annoying. I simply removed the offending MAKEFLAGS=<value>. [1]: https://savannah.gnu.org/bugs/?62469 [2]: https://www.gnu.org/software/make/manual/make.html#Testing-Flags Fixes: ea01fa9f63ae ("tools: Connect to the kernel build system") Fixes: a50e43332756 ("perf tools: Honor parallel jobs") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Daniel Xu <dxu@dxuuu.xyz>
2025-02-15ALSA: hda/cirrus: Reduce codec resume timeVitaly Rodionov
This patch reduces the resume time by half and introduces an option to include a delay after a single write operation before continuing. Signed-off-by: Vitaly Rodionov <vitalyr@opensource.cirrus.com> Link: https://patch.msgid.link/20250214162354.2675652-2-vitalyr@opensource.cirrus.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-02-15ALSA: hda/cirrus: Correct the full scale volume set logicVitaly Rodionov
This patch corrects the full-scale volume setting logic. On certain platforms, the full-scale volume bit is required. The current logic mistakenly sets this bit and incorrectly clears reserved bit 0, causing the headphone output to be muted. Fixes: 342b6b610ae2 ("ALSA: hda/cs8409: Fix Full Scale Volume setting for all variants") Signed-off-by: Vitaly Rodionov <vitalyr@opensource.cirrus.com> Link: https://patch.msgid.link/20250214210736.30814-1-vitalyr@opensource.cirrus.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-02-14Merge tag 'alpha-fixes-v6.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha Pull alpha fixes from Matt Turner: "A few changes for alpha, including some important fixes for kernel stack alignment" * tag 'alpha-fixes-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha: alpha: Use str_yes_no() helper in pci_dac_dma_supported() alpha: Replace one-element array with flexible array member alpha: align stack for page fault and user unaligned trap handlers alpha: make stack 16-byte aligned (most cases) alpha: replace hardcoded stack offsets with autogenerated ones
2025-02-14geneve: Fix use-after-free in geneve_find_dev().Kuniyuki Iwashima
syzkaller reported a use-after-free in geneve_find_dev() [0] without repro. geneve_configure() links struct geneve_dev.next to net_generic(net, geneve_net_id)->geneve_list. The net here could differ from dev_net(dev) if IFLA_NET_NS_PID, IFLA_NET_NS_FD, or IFLA_TARGET_NETNSID is set. When dev_net(dev) is dismantled, geneve_exit_batch_rtnl() finally calls unregister_netdevice_queue() for each dev in the netns, and later the dev is freed. However, its geneve_dev.next is still linked to the backend UDP socket netns. Then, use-after-free will occur when another geneve dev is created in the netns. Let's call geneve_dellink() instead in geneve_destroy_tunnels(). [0]: BUG: KASAN: slab-use-after-free in geneve_find_dev drivers/net/geneve.c:1295 [inline] BUG: KASAN: slab-use-after-free in geneve_configure+0x234/0x858 drivers/net/geneve.c:1343 Read of size 2 at addr ffff000054d6ee24 by task syz.1.4029/13441 CPU: 1 UID: 0 PID: 13441 Comm: syz.1.4029 Not tainted 6.13.0-g0ad9617c78ac #24 dc35ca22c79fb82e8e7bc5c9c9adafea898b1e3d Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x38/0x50 arch/arm64/kernel/stacktrace.c:466 (C) __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0xbc/0x108 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0x16c/0x6f0 mm/kasan/report.c:489 kasan_report+0xc0/0x120 mm/kasan/report.c:602 __asan_report_load2_noabort+0x20/0x30 mm/kasan/report_generic.c:379 geneve_find_dev drivers/net/geneve.c:1295 [inline] geneve_configure+0x234/0x858 drivers/net/geneve.c:1343 geneve_newlink+0xb8/0x128 drivers/net/geneve.c:1634 rtnl_newlink_create+0x23c/0x868 net/core/rtnetlink.c:3795 __rtnl_newlink net/core/rtnetlink.c:3906 [inline] rtnl_newlink+0x1054/0x1630 net/core/rtnetlink.c:4021 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6911 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2543 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6938 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:713 [inline] __sock_sendmsg net/socket.c:728 [inline] ____sys_sendmsg+0x410/0x6f8 net/socket.c:2568 ___sys_sendmsg+0x178/0x1d8 net/socket.c:2622 __sys_sendmsg net/socket.c:2654 [inline] __do_sys_sendmsg net/socket.c:2659 [inline] __se_sys_sendmsg net/socket.c:2657 [inline] __arm64_sys_sendmsg+0x12c/0x1c8 net/socket.c:2657 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600 Allocated by task 13247: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x30/0x68 mm/kasan/common.c:68 kasan_save_alloc_info+0x44/0x58 mm/kasan/generic.c:568 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x84/0xa0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __do_kmalloc_node mm/slub.c:4298 [inline] __kmalloc_node_noprof+0x2a0/0x560 mm/slub.c:4304 __kvmalloc_node_noprof+0x9c/0x230 mm/util.c:645 alloc_netdev_mqs+0xb8/0x11a0 net/core/dev.c:11470 rtnl_create_link+0x2b8/0xb50 net/core/rtnetlink.c:3604 rtnl_newlink_create+0x19c/0x868 net/core/rtnetlink.c:3780 __rtnl_newlink net/core/rtnetlink.c:3906 [inline] rtnl_newlink+0x1054/0x1630 net/core/rtnetlink.c:4021 rtnetlink_rcv_msg+0x61c/0x918 net/core/rtnetlink.c:6911 netlink_rcv_skb+0x1dc/0x398 net/netlink/af_netlink.c:2543 rtnetlink_rcv+0x34/0x50 net/core/rtnetlink.c:6938 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x618/0x838 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x5fc/0x8b0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:713 [inline] __sock_sendmsg net/socket.c:728 [inline] ____sys_sendmsg+0x410/0x6f8 net/socket.c:2568 ___sys_sendmsg+0x178/0x1d8 net/socket.c:2622 __sys_sendmsg net/socket.c:2654 [inline] __do_sys_sendmsg net/socket.c:2659 [inline] __se_sys_sendmsg net/socket.c:2657 [inline] __arm64_sys_sendmsg+0x12c/0x1c8 net/socket.c:2657 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x90/0x278 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x13c/0x250 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x54/0x70 arch/arm64/kernel/syscall.c:151 el0_svc+0x4c/0xa8 arch/arm64/kernel/entry-common.c:744 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:762 el0t_64_sync+0x198/0x1a0 arch/arm64/kernel/entry.S:600 Freed by task 45: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x30/0x68 mm/kasan/common.c:68 kasan_save_free_info+0x58/0x70 mm/kasan/generic.c:582 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x48/0x68 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2353 [inline] slab_free mm/slub.c:4613 [inline] kfree+0x140/0x420 mm/slub.c:4761 kvfree+0x4c/0x68 mm/util.c:688 netdev_release+0x94/0xc8 net/core/net-sysfs.c:2065 device_release+0x98/0x1c0 kobject_cleanup lib/kobject.c:689 [inline] kobject_release lib/kobject.c:720 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x2b0/0x438 lib/kobject.c:737 netdev_run_todo+0xe5c/0xfc8 net/core/dev.c:11185 rtnl_unlock+0x20/0x38 net/core/rtnetlink.c:151 cleanup_net+0x4fc/0x8c0 net/core/net_namespace.c:648 process_one_work+0x700/0x1398 kernel/workqueue.c:3236 process_scheduled_works kernel/workqueue.c:3317 [inline] worker_thread+0x8c4/0xe10 kernel/workqueue.c:3398 kthread+0x4bc/0x608 kernel/kthread.c:464 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:862 The buggy address belongs to the object at ffff000054d6e000 which belongs to the cache kmalloc-cg-4k of size 4096 The buggy address is located 3620 bytes inside of freed 4096-byte region [ffff000054d6e000, ffff000054d6f000) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x94d68 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 memcg:ffff000016276181 flags: 0x3fffe0000000040(head|node=0|zone=0|lastcpupid=0x1ffff) page_type: f5(slab) raw: 03fffe0000000040 ffff0000c000f500 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000040004 00000001f5000000 ffff000016276181 head: 03fffe0000000040 ffff0000c000f500 dead000000000122 0000000000000000 head: 0000000000000000 0000000000040004 00000001f5000000 ffff000016276181 head: 03fffe0000000003 fffffdffc1535a01 ffffffffffffffff 0000000000000000 head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff000054d6ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff000054d6ed80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff000054d6ee00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff000054d6ee80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff000054d6ef00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250213043354.91368-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14vsock/virtio: fix variables initialization during resumingJunnan Wu
When executing suspend to ram twice in a row, the `rx_buf_nr` and `rx_buf_max_nr` increase to three times vq->num_free. Then after virtqueue_get_buf and `rx_buf_nr` decreased in function virtio_transport_rx_work, the condition to fill rx buffer (rx_buf_nr < rx_buf_max_nr / 2) will never be met. It is because that `rx_buf_nr` and `rx_buf_max_nr` are initialized only in virtio_vsock_probe(), but they should be reset whenever virtqueues are recreated, like after a suspend/resume. Move the `rx_buf_nr` and `rx_buf_max_nr` initialization in virtio_vsock_vqs_init(), so we are sure that they are properly initialized, every time we initialize the virtqueues, either when we load the driver or after a suspend/resume. To prevent erroneous atomic load operations on the `queued_replies` in the virtio_transport_send_pkt_work() function which may disrupt the scheduling of vsock->rx_work when transmitting reply-required socket packets, this atomic variable must undergo synchronized initialization alongside the preceding two variables after a suspend/resume. Fixes: bd50c5dc182b ("vsock/virtio: add support for device suspend/resume") Link: https://lore.kernel.org/virtualization/20250207052033.2222629-1-junnan01.wu@samsung.com/ Co-developed-by: Ying Gao <ying01.gao@samsung.com> Signed-off-by: Ying Gao <ying01.gao@samsung.com> Signed-off-by: Junnan Wu <junnan01.wu@samsung.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250214012200.1883896-1-junnan01.wu@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14Merge branch 'bnxt_en-add-npar-1-2-and-tph-support'Jakub Kicinski
Michael Chan says: ==================== bnxt_en: Add NPAR 1.2 and TPH support The first patch adds NPAR 1.2 support. Patches 2 to 11 add TPH (TLP Processing Hints) support. These TPH driver patches are new revisions originally posted as part of the TPH PCI patch series. Additional driver refactoring has been done so that we can free and allocate RX completion ring and the TX rings if the channel is a combined channel. We also add napi_disable() and napi_enable() during queue_stop() and queue_start() respectively, and reset for error handling in queue_start(). v4: https://lore.kernel.org/20250208202916.1391614-1-michael.chan@broadcom.com v3: https://lore.kernel.org/20250204004609.1107078-1-michael.chan@broadcom.com v2: https://lore.kernel.org/20250116192343.34535-1-michael.chan@broadcom.com v1: https://lore.kernel.org/20250113063927.4017173-1-michael.chan@broadcom.com Discussion about adding napi_disable()/napi_enable(): https://lore.kernel.org/netdev/5336d624-8d8b-40a6-b732-b020e4a119a2@davidwei.uk/#t Previous driver series fixing rtnl_lock and empty release function: https://lore.kernel.org/netdev/20241115200412.1340286-1-wei.huang2@amd.com/ v5 of the PCI series using netdev_rx_queue_restart(): https://lore.kernel.org/netdev/20240916205103.3882081-5-wei.huang2@amd.com/ v1 of the PCI series using open/close: https://lore.kernel.org/netdev/20240509162741.1937586-9-wei.huang2@amd.com/ ==================== Link: https://patch.msgid.link/20250213011240.1640031-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Add TPH support in BNXT driverManoj Panicker
Add TPH support to the Broadcom BNXT device driver. This allows the driver to utilize TPH functions for retrieving and configuring Steering Tags when changing interrupt affinity. With compatible NIC firmware, network traffic will be tagged correctly with Steering Tags, resulting in significant memory bandwidth savings and other advantages as demonstrated by real network benchmarks on TPH-capable platforms. Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Co-developed-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-12-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Extend queue stop/start for TX ringsSomnath Kotur
In order to use queue_stop/queue_start to support the new Steering Tags, we need to free the TX ring and TX completion ring if it is a combined channel with TX/RX sharing the same NAPI. Otherwise TX completions will not have the updated Steering Tag. If TPH is not enabled, we just stop the TX ring without freeing the TX/TX cmpl rings. With that we can now add napi_disable() and napi_enable() during queue_stop()/ queue_start(). This will guarantee that NAPI will stop processing the completion entries in case there are additional pending entries in the completion rings after queue_stop(). There could be some NQEs sitting unprocessed while NAPI is disabled thereby leaving the NQ unarmed. Explicitly re-arm the NQ after napi_enable() in queue start so that NAPI will resume properly. Error handling in bnxt_queue_start() requires a reset. If a TX ring cannot be allocated or initialized properly, it will cause TX timeout. The reset will also free any partially allocated rings. We don't expect to hit this error path because re-allocating previously reserved and allocated rings with the same parameters should never fail. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-11-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Refactor TX ring free logicMichael Chan
Add a new bnxt_hwrm_tx_ring_free() function to handle freeing a HW transmit ring. The new function will also be used in the next patch to free the TX ring in queue_stop. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-10-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Reallocate RX completion ring for TPH supportSomnath Kotur
In order to program the correct Steering Tag during an IRQ affinity change, we need to free/re-allocate the RX completion ring during queue_restart. If TPH is enabled, call FW to free the Rx completion ring and clear the ring entries in queue_stop(). Re-allocate it in queue_start() if TPH is enabled. Note that TPH mode is not enabled in this patch and will be enabled later in the patch series. While modifying bnxt_queue_start(), remove the unnecessary zeroing of rxr->rx_next_cons. It gets overwritten by the clone in bnxt_queue_start(). Remove the rx_reset counter increment since restart is not reset. Add comment to clarify that the ring allocations in queue_start should never fail. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-9-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG ringsMichael Chan
Newer firmware can use the NQ ring ID associated with each RX/RX AGG ring to enable PCIe Steering Tags on P5_PLUS chips. When allocating RX/RX AGG rings, pass along NQ ring ID for the firmware to use. This information helps optimize DMA writes by directing them to the cache closer to the CPU consuming the data, potentially improving the processing speed. This change is backward-compatible with older firmware, which will simply disregard the information. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Refactor RX/RX AGG ring parameters setup for P5_PLUSMichael Chan
There is some common code for setting up RX and RX AGG ring allocation parameters for P5_PLUS chips. Refactor the logic into a new function. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Refactor bnxt_free_tx_rings() to free per TX ringSomnath Kotur
Modify bnxt_free_tx_rings() to free the skbs per TX ring. This will be useful later in the series. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Refactor completion ring free routineSomnath Kotur
Add a wrapper routine to free L2 completion rings. This will be useful later in the series. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Refactor TX ring allocation logicMichael Chan
Add a new bnxt_hwrm_tx_ring_alloc() function to handle allocating a transmit ring. This will be useful later in the series. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Refactor completion ring allocation logic for P5_PLUS chipsMichael Chan
Add a new bnxt_hwrm_cp_ring_alloc_p5() function to handle allocating one completion ring on P5_PLUS chips. This simplifies the existing code and will be useful later in the series. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14bnxt_en: Set NPAR 1.2 support when registering with firmwareMichael Chan
NPAR (Network interface card partitioning)[1] 1.2 adds a transparent VLAN tag for all packets between the NIC and the switch. Because of that, RX VLAN acceleration cannot be supported for any additional host configured VLANs. The driver has to acknowledge that it can support no RX VLAN acceleration and set the NPAR 1.2 supported flag when registering with the FW. Otherwise, the FW call will fail and the driver will abort on these NPAR 1.2 NICs with this error: bnxt_en 0000:26:00.0 (unnamed net_device) (uninitialized): hwrm req_type 0x1d seq id 0xb error 0x2 [1] https://techdocs.broadcom.com/us/en/storage-and-ethernet-connectivity/ethernet-nic-controllers/bcm957xxx/adapters/introduction/features/network-partitioning-npar.html Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14net/mlx4_core: Avoid impossible mlx4_db_alloc() order valueKees Cook
GCC can see that the value range for "order" is capped, but this leads it to consider that it might be negative, leading to a false positive warning (with GCC 15 with -Warray-bounds -fdiagnostics-details): ../drivers/net/ethernet/mellanox/mlx4/alloc.c:691:47: error: array subscript -1 is below array bounds of 'long unsigned int *[2]' [-Werror=array-bounds=] 691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); | ~~~~~~~~~~~^~~ 'mlx4_alloc_db_from_pgdir': events 1-2 691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (2) out of array bounds here | (1) when the condition is evaluated to true In file included from ../drivers/net/ethernet/mellanox/mlx4/mlx4.h:53, from ../drivers/net/ethernet/mellanox/mlx4/alloc.c:42: ../include/linux/mlx4/device.h:664:33: note: while referencing 'bits' 664 | unsigned long *bits[2]; | ^~~~ Switch the argument to unsigned int, which removes the compiler needing to consider negative values. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250210174504.work.075-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14net: phy: c45: improve handling of disabled EEE modes in generic ethtool ↵Heiner Kallweit
functions Currently disabled EEE modes are shown as supported in ethtool. Change this by filtering them out when populating data->supported in genphy_c45_ethtool_get_eee. Disabled EEE modes are silently filtered out by genphy_c45_write_eee_adv. This is planned to be removed, therefore ensure in genphy_c45_ethtool_set_eee that disabled EEE modes are removed from the user space provided EEE advertisement. For now keep the current behavior to do this silently. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/5187c86d-9a5a-482c-974f-cc103ce9738c@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14ice: Fix signedness bug in ice_init_interrupt_scheme()Dan Carpenter
If pci_alloc_irq_vectors() can't allocate the minimum number of vectors then it returns -ENOSPC so there is no need to check for that in the caller. In fact, because pf->msix.min is an unsigned int, it means that any negative error codes are type promoted to high positive values and treated as success. So here, the "return -ENOMEM;" is unreachable code. Check for negatives instead. Now that we're only dealing with error codes, it's easier to propagate the error code from pci_alloc_irq_vectors() instead of hardcoding -ENOMEM. Fixes: 79d97b8cf9a8 ("ice: remove splitting MSI-X between features") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/b16e4f01-4c85-46e2-b602-fce529293559@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14net: remove phylink_pcs .neg_mode booleanRussell King (Oracle)
As all PCS are using the neg_mode parameter rather than the legacy an_mode, remove the ability to use the legacy an_mode. We remove the tests in the phylink code, unconditionally passing the PCS neg_mode parameter to PCS methods, and remove setting the flag from drivers. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tidPn-0040hd-2R@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14documentation: networking: Add NAPI configJoe Damato
Document the existence of persistent per-NAPI configuration space and the API that drivers can opt into. Update stale documentation which suggested that NAPI IDs cannot be queried from userspace. Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20250213191535.38792-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14Merge branch 'net-phy-clean-up-phy-h'Jakub Kicinski
Heiner Kallweit says: ==================== net: phy: clean up phy.h This series is a starting point to clean up phy.h and remove definitions which are phylib-internal. ==================== Link: https://patch.msgid.link/d14f8a69-dc21-4ff7-8401-574ffe2f4bc5@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14net: phy: remove helper phy_is_internalHeiner Kallweit
Helper phy_is_internal() is just used in two places phylib-internally. So let's remove it from the API. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/f3f35265-80a9-4ed7-ad78-ae22c21e288b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14net: phy: stop exporting phy_queue_state_machineHeiner Kallweit
phy_queue_state_machine() isn't used outside phy.c, so stop exporting it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/16986d3d-7baf-4b02-a641-e2916d491264@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14net: phy: stop exporting feature arrays which aren't used outside phylibHeiner Kallweit
Stop exporting feature arrays which aren't used outside phylib. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/01886672-4880-4ca8-b7b0-94d40f6e0ec5@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14net: phy: remove fixup-related definitions from phy.h which are not used ↵Heiner Kallweit
outside phylib Certain fixup-related definitions aren't used outside phy_device.c. So make them private and remove them from phy.h. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/ea6fde13-9183-4c7c-8434-6c0eb64fc72c@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14Merge branch 'net-phy-realtek-improve-mmd-register-access-for-internal-phy-s'Jakub Kicinski
Heiner Kallweit says: ==================== net: phy: realtek: improve MMD register access for internal PHY's The integrated PHYs on chip versions from RTL8168g allow to address MDIO_MMD_VEND2 registers. All c22 standard registers are mapped to MDIO_MMD_VEND2 registers. So far the paging mechanism is used to address PHY registers. Add support for c45 ops to address MDIO_MMD_VEND2 registers directly, w/o the paging. ==================== Link: https://patch.msgid.link/c6a969ef-fd7f-48d6-8c48-4bc548831a8d@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14net: phy: realtek: switch from paged to MMD ops in rtl822x functionsHeiner Kallweit
The MDIO bus provided by r8169 for the internal PHY's now supports c45 ops for the MDIO_MMD_VEND2 device. So we can switch to standard MMD ops here. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/81416f95-0fac-4225-87b4-828e3738b8ed@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14net: phy: realtek: improve mmd register access for internal PHY'sHeiner Kallweit
r8169 provides the MDIO bus for the internal PHY's. It has been extended with c45 access functions for addressing MDIO_MMD_VEND2 registers. So we can switch from paged access to directly addressing the MDIO_MMD_VEND2 registers. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/a5f2333c-dda9-48ad-9801-77049766e632@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14r8169: add PHY c45 ops for MDIO_MMD_VENDOR2 registersHeiner Kallweit
The integrated PHYs on chip versions from RTL8168g allow to address MDIO_MMD_VEND2 registers. All c22 standard registers are mapped to MDIO_MMD_VEND2 registers. So far the paging mechanism is used to address PHY registers. Add support for c45 ops to address MDIO_MMD_VEND2 registers directly, w/o the paging. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/d6f97eaa-0f13-468f-89cb-75a41087bc4a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14Merge tag 'pci-v6.14-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Update a BUILD_BUG_ON() usage that works on current compilers, but breaks compilation on gcc 5.3.1 (Alex Williamson) - Avoid use of FLR for Mediatek MT7922 WiFi; the device previously worked after a long timeout and fallback to SBR, but after a recent RRS change it doesn't work at all after FLR (Bjorn Helgaas) * tag 'pci-v6.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: Avoid FLR for Mediatek MT7922 WiFi PCI: Fix BUILD_BUG_ON usage for old gcc
2025-02-14Merge tag 'kvm-x86-fixes-6.14-rcN' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM fixes for 6.14 part 1 - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference. - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1. - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath.
2025-02-14x86/sev: Fix broken SNP support with KVM module built-inAshish Kalra
Fix issues with enabling SNP host support and effectively SNP support which is broken with respect to the KVM module being built-in. SNP host support is enabled in snp_rmptable_init() which is invoked as device_initcall(). SNP check on IOMMU is done during IOMMU PCI init (IOMMU_PCI_INIT stage). And for that reason snp_rmptable_init() is currently invoked via device_initcall() and cannot be invoked via subsys_initcall() as core IOMMU subsystem gets initialized via subsys_initcall(). Now, if kvm_amd module is built-in, it gets initialized before SNP host support is enabled in snp_rmptable_init() : [ 10.131811] kvm_amd: TSC scaling supported [ 10.136384] kvm_amd: Nested Virtualization enabled [ 10.141734] kvm_amd: Nested Paging enabled [ 10.146304] kvm_amd: LBR virtualization supported [ 10.151557] kvm_amd: SEV enabled (ASIDs 100 - 509) [ 10.156905] kvm_amd: SEV-ES enabled (ASIDs 1 - 99) [ 10.162256] kvm_amd: SEV-SNP enabled (ASIDs 1 - 99) [ 10.171508] kvm_amd: Virtual VMLOAD VMSAVE supported [ 10.177052] kvm_amd: Virtual GIF supported ... ... [ 10.201648] kvm_amd: in svm_enable_virtualization_cpu And then svm_x86_ops->enable_virtualization_cpu() (svm_enable_virtualization_cpu) programs MSR_VM_HSAVE_PA as following: wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa); So VM_HSAVE_PA is non-zero before SNP support is enabled on all CPUs. snp_rmptable_init() gets invoked after svm_enable_virtualization_cpu() as following : ... [ 11.256138] kvm_amd: in svm_enable_virtualization_cpu ... [ 11.264918] SEV-SNP: in snp_rmptable_init This triggers a #GP exception in snp_rmptable_init() when snp_enable() is invoked to set SNP_EN in SYSCFG MSR: [ 11.294289] unchecked MSR access error: WRMSR to 0xc0010010 (tried to write 0x0000000003fc0000) at rIP: 0xffffffffaf5d5c28 (native_write_msr+0x8/0x30) ... [ 11.294404] Call Trace: [ 11.294482] <IRQ> [ 11.294513] ? show_stack_regs+0x26/0x30 [ 11.294522] ? ex_handler_msr+0x10f/0x180 [ 11.294529] ? search_extable+0x2b/0x40 [ 11.294538] ? fixup_exception+0x2dd/0x340 [ 11.294542] ? exc_general_protection+0x14f/0x440 [ 11.294550] ? asm_exc_general_protection+0x2b/0x30 [ 11.294557] ? __pfx_snp_enable+0x10/0x10 [ 11.294567] ? native_write_msr+0x8/0x30 [ 11.294570] ? __snp_enable+0x5d/0x70 [ 11.294575] snp_enable+0x19/0x20 [ 11.294578] __flush_smp_call_function_queue+0x9c/0x3a0 [ 11.294586] generic_smp_call_function_single_interrupt+0x17/0x20 [ 11.294589] __sysvec_call_function+0x20/0x90 [ 11.294596] sysvec_call_function+0x80/0xb0 [ 11.294601] </IRQ> [ 11.294603] <TASK> [ 11.294605] asm_sysvec_call_function+0x1f/0x30 ... [ 11.294631] arch_cpu_idle+0xd/0x20 [ 11.294633] default_idle_call+0x34/0xd0 [ 11.294636] do_idle+0x1f1/0x230 [ 11.294643] ? complete+0x71/0x80 [ 11.294649] cpu_startup_entry+0x30/0x40 [ 11.294652] start_secondary+0x12d/0x160 [ 11.294655] common_startup_64+0x13e/0x141 [ 11.294662] </TASK> This #GP exception is getting triggered due to the following errata for AMD family 19h Models 10h-1Fh Processors: Processor may generate spurious #GP(0) Exception on WRMSR instruction: Description: The Processor will generate a spurious #GP(0) Exception on a WRMSR instruction if the following conditions are all met: - the target of the WRMSR is a SYSCFG register. - the write changes the value of SYSCFG.SNPEn from 0 to 1. - One of the threads that share the physical core has a non-zero value in the VM_HSAVE_PA MSR. The document being referred to above: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/57095-PUB_1_01.pdf To summarize, with kvm_amd module being built-in, KVM/SVM initialization happens before host SNP is enabled and this SVM initialization sets VM_HSAVE_PA to non-zero, which then triggers a #GP when SYSCFG.SNPEn is being set and this will subsequently cause SNP_INIT(_EX) to fail with INVALID_CONFIG error as SYSCFG[SnpEn] is not set on all CPUs. Essentially SNP host enabling code should be invoked before KVM initialization, which is currently not the case when KVM is built-in. Add fix to call snp_rmptable_init() early from iommu_snp_enable() directly and not invoked via device_initcall() which enables SNP host support before KVM initialization with kvm_amd module built-in. Add additional handling for `iommu=off` or `amd_iommu=off` options. Note that IOMMUs need to be enabled for SNP initialization, therefore, if host SNP support is enabled but late IOMMU initialization fails then that will cause PSP driver's SNP_INIT to fail as IOMMU SNP sanity checks in SNP firmware will fail with invalid configuration error as below: [ 9.723114] ccp 0000:23:00.1: sev enabled [ 9.727602] ccp 0000:23:00.1: psp enabled [ 9.732527] ccp 0000:a2:00.1: enabling device (0000 -> 0002) [ 9.739098] ccp 0000:a2:00.1: no command queues available [ 9.745167] ccp 0000:a2:00.1: psp enabled [ 9.805337] ccp 0000:23:00.1: SEV-SNP: failed to INIT rc -5, error 0x3 [ 9.866426] ccp 0000:23:00.1: SEV API:1.53 build:5 Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Acked-by: Joerg Roedel <jroedel@suse.de> Message-ID: <138b520fb83964782303b43ade4369cd181fdd9c.1739226950.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-14KVM: SVM: Ensure PSP module is initialized if KVM module is built-inSean Christopherson
The kernel's initcall infrastructure lacks the ability to express dependencies between initcalls, whereas the modules infrastructure automatically handles dependencies via symbol loading. Ensure the PSP SEV driver is initialized before proceeding in sev_hardware_setup() if KVM is built-in as the dependency isn't handled by the initcall infrastructure. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-ID: <f78ddb64087df27e7bcb1ae0ab53f55aa0804fab.1739226950.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-14crypto: ccp: Add external API interface for PSP module initializationSean Christopherson
KVM is dependent on the PSP SEV driver and PSP SEV driver needs to be loaded before KVM module. In case of module loading any dependent modules are automatically loaded but in case of built-in modules there is no inherent mechanism available to specify dependencies between modules and ensure that any dependent modules are loaded implicitly. Add a new external API interface for PSP module initialization which allows PSP SEV driver to be loaded explicitly if KVM is built-in. Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Message-ID: <15279ca0cad56a07cf12834ec544310f85ff5edc.1739226950.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-14Merge tag 'kvmarm-fixes-6.14-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.14, take #2 - Large set of fixes for vector handling, specially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours. - Fix use of kernel VAs at EL2 when emulating timers with nVHE. - Small set of pKVM improvements and cleanups.
2025-02-14Merge tag 'thermal-6.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "Fix a regression caused by an inadvertent change of the THERMAL_GENL_ATTR_CPU_CAPABILITY value in one of the recent thermal commits (Zhang Rui) and drop a stale piece of documentation (Daniel Lezcano)" * tag 'thermal-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/cpufreq_cooling: Remove structure member documentation thermal/netlink: Prevent userspace segmentation fault by adjusting UAPI header
2025-02-14Merge tag 'mmc-v6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC host fixes from Ulf Hansson: - mtk-sd: Fix register settings for hs400(es) mode - sdhci_am654: Revert patch for start-signal-voltage-switch * tag 'mmc-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: mtk-sd: Fix register settings for hs400(es) mode Revert "mmc: sdhci_am654: Add sdhci_am654_start_signal_voltage_switch"
2025-02-14Merge tag 'v6.14-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fix from Steve French: "SMB3 client multichannel fix" * tag 'v6.14-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: pick channels for individual subrequests
2025-02-14block/merge: remove unnecessary min() with UINT_MAXCaleb Sander Mateos
In bvec_split_segs(), max_bytes is an unsigned, so it must be less than or equal to UINT_MAX. Remove the unnecessary min(). Prior to commit 67927d220150 ("block/merge: count bytes instead of sectors"), the min() was with UINT_MAX >> 9, so it did have an effect. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250214193637.234702-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-14io-wq: backoff when retrying worker creationUday Shankar
When io_uring submission goes async for the first time on a given task, we'll try to create a worker thread to handle the submission. Creating this worker thread can fail due to various transient conditions, such as an outstanding signal in the forking thread, so we have retry logic with a limit of 3 retries. However, this retry logic appears to be too aggressive/fast - we've observed a thread blowing through the retry limit while having the same outstanding signal the whole time. Here's an excerpt of some tracing that demonstrates the issue: First, signal 26 is generated for the process. It ends up getting routed to thread 92942. 0) cbd-92284 /* signal_generate: sig=26 errno=0 code=-2 comm=psblkdASD pid=92934 grp=1 res=0 */ This causes create_io_thread in the signalled thread to fail with ERESTARTNOINTR, and thus a retry is queued. 13) task_th-92942 /* io_uring_queue_async_work: ring 000000007325c9ae, request 0000000080c96d8e, user_data 0x0, opcode URING_CMD, flags 0x8240001, normal queue, work 000000006e96dd3f */ 13) task_th-92942 io_wq_enqueue() { 13) task_th-92942 _raw_spin_lock(); 13) task_th-92942 io_wq_activate_free_worker(); 13) task_th-92942 _raw_spin_lock(); 13) task_th-92942 create_io_worker() { 13) task_th-92942 __kmalloc_cache_noprof(); 13) task_th-92942 __init_swait_queue_head(); 13) task_th-92942 kprobe_ftrace_handler() { 13) task_th-92942 get_kprobe(); 13) task_th-92942 aggr_pre_handler() { 13) task_th-92942 pre_handler_kretprobe(); 13) task_th-92942 /* create_enter: (create_io_thread+0x0/0x50) fn=0xffffffff8172c0e0 arg=0xffff888996bb69c0 node=-1 */ 13) task_th-92942 } /* aggr_pre_handler */ ... 13) task_th-92942 } /* copy_process */ 13) task_th-92942 } /* create_io_thread */ 13) task_th-92942 kretprobe_rethook_handler() { 13) task_th-92942 /* create_exit: (create_io_worker+0x8a/0x1a0 <- create_io_thread) arg1=0xfffffffffffffdff */ 13) task_th-92942 } /* kretprobe_rethook_handler */ 13) task_th-92942 queue_work_on() { ... The CPU is then handed to a kworker to process the queued retry: ------------------------------------------ 13) task_th-92942 => kworker-54154 ------------------------------------------ 13) kworker-54154 io_workqueue_create() { 13) kworker-54154 io_queue_worker_create() { 13) kworker-54154 task_work_add() { 13) kworker-54154 wake_up_state() { 13) kworker-54154 try_to_wake_up() { 13) kworker-54154 _raw_spin_lock_irqsave(); 13) kworker-54154 _raw_spin_unlock_irqrestore(); 13) kworker-54154 } /* try_to_wake_up */ 13) kworker-54154 } /* wake_up_state */ 13) kworker-54154 kick_process(); 13) kworker-54154 } /* task_work_add */ 13) kworker-54154 } /* io_queue_worker_create */ 13) kworker-54154 } /* io_workqueue_create */ And then we immediately switch back to the original task to try creating a worker again. This fails, because the original task still hasn't handled its signal. ----------------------------------------- 13) kworker-54154 => task_th-92942 ------------------------------------------ 13) task_th-92942 create_worker_cont() { 13) task_th-92942 kprobe_ftrace_handler() { 13) task_th-92942 get_kprobe(); 13) task_th-92942 aggr_pre_handler() { 13) task_th-92942 pre_handler_kretprobe(); 13) task_th-92942 /* create_enter: (create_io_thread+0x0/0x50) fn=0xffffffff8172c0e0 arg=0xffff888996bb69c0 node=-1 */ 13) task_th-92942 } /* aggr_pre_handler */ 13) task_th-92942 } /* kprobe_ftrace_handler */ 13) task_th-92942 create_io_thread() { 13) task_th-92942 copy_process() { 13) task_th-92942 task_active_pid_ns(); 13) task_th-92942 _raw_spin_lock_irq(); 13) task_th-92942 recalc_sigpending(); 13) task_th-92942 _raw_spin_lock_irq(); 13) task_th-92942 } /* copy_process */ 13) task_th-92942 } /* create_io_thread */ 13) task_th-92942 kretprobe_rethook_handler() { 13) task_th-92942 /* create_exit: (create_worker_cont+0x35/0x1b0 <- create_io_thread) arg1=0xfffffffffffffdff */ 13) task_th-92942 } /* kretprobe_rethook_handler */ 13) task_th-92942 io_worker_release(); 13) task_th-92942 queue_work_on() { 13) task_th-92942 clear_pending_if_disabled(); 13) task_th-92942 __queue_work() { 13) task_th-92942 } /* __queue_work */ 13) task_th-92942 } /* queue_work_on */ 13) task_th-92942 } /* create_worker_cont */ The pattern repeats another couple times until we blow through the retry counter, at which point we give up. All outstanding work is canceled, and the io_uring command which triggered all this is failed with ECANCELED: 13) task_th-92942 io_acct_cancel_pending_work() { ... 13) task_th-92942 /* io_uring_complete: ring 000000007325c9ae, req 0000000080c96d8e, user_data 0x0, result -125, cflags 0x0 extra1 0 extra2 0 */ Finally, the task gets around to processing its outstanding signal 26, but it's too late. 13) task_th-92942 /* signal_deliver: sig=26 errno=0 code=-2 sa_handler=59566a0 sa_flags=14000000 */ Try to address this issue by adding a small scaling delay when retrying worker creation. This should give the forking thread time to handle its signal in the above case. This isn't a particularly satisfying solution, as sufficiently paradoxical scheduling would still have us hitting the same issue, and I'm open to suggestions for something better. But this is likely to prevent this (already rare) issue from hitting in practice. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Link: https://lore.kernel.org/r/20250208-wq_retry-v2-1-4f6f5041d303@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-14Merge tag 'efi-fixes-for-v6.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "Take the newly introduced EFI_MEMORY_HOT_PLUGGABLE memory attribute into account when placing the kernel image in memory at boot. Otherwise, the presence of the kernel image could prevent such a memory region from being unplugged at runtime if it was 'cold plugged', i.e., already plugged in at boot time (and exposed via the EFI memory map). This should ensure that the new EFI_MEMORY_HOT_PLUGGABLE memory attribute is used consistently by Linux before it ever turns up in production, ensuring that we can make meaningful use of it without running the risk of regressing existing users" * tag 'efi-fixes-for-v6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: Use BIT_ULL() constants for memory attributes efi: Avoid cold plugged memory for placing the kernel
2025-02-14Merge branch 'net-phylink-xpcs-stmmac-support-pcs-eee-configuration'Jakub Kicinski
Russell King says: ==================== net: phylink,xpcs,stmmac: support PCS EEE configuration This series adds support for phylink managed EEE at the PCS level, allowing xpcs_config_eee() to be removed. Sadly, we still end up with a XPCS specific function to configure the clock multiplier. ==================== Link: https://patch.msgid.link/Z6naiPpxfxGr1Ic6@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>