summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-06iommu/vt-d: Don't issue ATS Invalidation request when device is disconnectedEthan Zhao
For those endpoint devices connect to system via hotplug capable ports, users could request a hot reset to the device by flapping device's link through setting the slot's link control register, as pciehp_ist() DLLSC interrupt sequence response, pciehp will unload the device driver and then power it off. thus cause an IOMMU device-TLB invalidation (Intel VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence target device to be sent and deadly loop to retry that request after ITE fault triggered in interrupt context. That would cause following continuous hard lockup warning and system hang [ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down [ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not present [ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144 [ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S OE kernel version xxxx [ 4223.822623] Hardware name: vendorname xxxx 666-106, BIOS 01.01.02.03.01 05/15/2023 [ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490 [ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 1 0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39 [ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093 [ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005 [ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340 [ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000 [ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200 [ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004 [ 4223.822626] FS: 0000000000000000(0000) GS:ffffa237ae400000(0000) knlGS:0000000000000000 [ 4223.822627] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 0000000000770ee0 [ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 4223.822628] PKRU: 55555554 [ 4223.822628] Call Trace: [ 4223.822628] qi_flush_dev_iotlb+0xb1/0xd0 [ 4223.822628] __dmar_remove_one_dev_info+0x224/0x250 [ 4223.822629] dmar_remove_one_dev_info+0x3e/0x50 [ 4223.822629] intel_iommu_release_device+0x1f/0x30 [ 4223.822629] iommu_release_device+0x33/0x60 [ 4223.822629] iommu_bus_notifier+0x7f/0x90 [ 4223.822630] blocking_notifier_call_chain+0x60/0x90 [ 4223.822630] device_del+0x2e5/0x420 [ 4223.822630] pci_remove_bus_device+0x70/0x110 [ 4223.822630] pciehp_unconfigure_device+0x7c/0x130 [ 4223.822631] pciehp_disable_slot+0x6b/0x100 [ 4223.822631] pciehp_handle_presence_or_link_change+0xd8/0x320 [ 4223.822631] pciehp_ist+0x176/0x180 [ 4223.822631] ? irq_finalize_oneshot.part.50+0x110/0x110 [ 4223.822632] irq_thread_fn+0x19/0x50 [ 4223.822632] irq_thread+0x104/0x190 [ 4223.822632] ? irq_forced_thread_fn+0x90/0x90 [ 4223.822632] ? irq_thread_check_affinity+0xe0/0xe0 [ 4223.822633] kthread+0x114/0x130 [ 4223.822633] ? __kthread_cancel_work+0x40/0x40 [ 4223.822633] ret_from_fork+0x1f/0x30 [ 4223.822633] Kernel panic - not syncing: Hard LOCKUP [ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S OE kernel version xxxx [ 4223.822634] Hardware name: vendorname xxxx 666-106, BIOS 01.01.02.03.01 05/15/2023 [ 4223.822634] Call Trace: [ 4223.822634] <NMI> [ 4223.822635] dump_stack+0x6d/0x88 [ 4223.822635] panic+0x101/0x2d0 [ 4223.822635] ? ret_from_fork+0x11/0x30 [ 4223.822635] nmi_panic.cold.14+0xc/0xc [ 4223.822636] watchdog_overflow_callback.cold.8+0x6d/0x81 [ 4223.822636] __perf_event_overflow+0x4f/0xf0 [ 4223.822636] handle_pmi_common+0x1ef/0x290 [ 4223.822636] ? __set_pte_vaddr+0x28/0x40 [ 4223.822637] ? flush_tlb_one_kernel+0xa/0x20 [ 4223.822637] ? __native_set_fixmap+0x24/0x30 [ 4223.822637] ? ghes_copy_tofrom_phys+0x70/0x100 [ 4223.822637] ? __ghes_peek_estatus.isra.16+0x49/0xa0 [ 4223.822637] intel_pmu_handle_irq+0xba/0x2b0 [ 4223.822638] perf_event_nmi_handler+0x24/0x40 [ 4223.822638] nmi_handle+0x4d/0xf0 [ 4223.822638] default_do_nmi+0x49/0x100 [ 4223.822638] exc_nmi+0x134/0x180 [ 4223.822639] end_repeat_nmi+0x16/0x67 [ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490 [ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 10 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39 [ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093 [ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005 [ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340 [ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000 [ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200 [ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004 [ 4223.822641] ? qi_submit_sync+0x2c0/0x490 [ 4223.822642] ? qi_submit_sync+0x2c0/0x490 [ 4223.822642] </NMI> [ 4223.822642] qi_flush_dev_iotlb+0xb1/0xd0 [ 4223.822642] __dmar_remove_one_dev_info+0x224/0x250 [ 4223.822643] dmar_remove_one_dev_info+0x3e/0x50 [ 4223.822643] intel_iommu_release_device+0x1f/0x30 [ 4223.822643] iommu_release_device+0x33/0x60 [ 4223.822643] iommu_bus_notifier+0x7f/0x90 [ 4223.822644] blocking_notifier_call_chain+0x60/0x90 [ 4223.822644] device_del+0x2e5/0x420 [ 4223.822644] pci_remove_bus_device+0x70/0x110 [ 4223.822644] pciehp_unconfigure_device+0x7c/0x130 [ 4223.822644] pciehp_disable_slot+0x6b/0x100 [ 4223.822645] pciehp_handle_presence_or_link_change+0xd8/0x320 [ 4223.822645] pciehp_ist+0x176/0x180 [ 4223.822645] ? irq_finalize_oneshot.part.50+0x110/0x110 [ 4223.822645] irq_thread_fn+0x19/0x50 [ 4223.822646] irq_thread+0x104/0x190 [ 4223.822646] ? irq_forced_thread_fn+0x90/0x90 [ 4223.822646] ? irq_thread_check_affinity+0xe0/0xe0 [ 4223.822646] kthread+0x114/0x130 [ 4223.822647] ? __kthread_cancel_work+0x40/0x40 [ 4223.822647] ret_from_fork+0x1f/0x30 [ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Such issue could be triggered by all kinds of regular surprise removal hotplug operation. like: 1. pull EP(endpoint device) out directly. 2. turn off EP's power. 3. bring the link down. etc. this patch aims to work for regular safe removal and surprise removal unplug. these hot unplug handling process could be optimized for fix the ATS Invalidation hang issue by calling pci_dev_is_disconnected() in function devtlb_invalidation_with_pasid() to check target device state to avoid sending meaningless ATS Invalidation request to iommu when device is gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1) For safe removal, device wouldn't be removed until the whole software handling process is done, it wouldn't trigger the hard lock up issue caused by too long ATS Invalidation timeout wait. In safe removal path, device state isn't set to pci_channel_io_perm_failure in pciehp_unconfigure_device() by checking 'presence' parameter, calling pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return false there, wouldn't break the function. For surprise removal, device state is set to pci_channel_io_perm_failure in pciehp_unconfigure_device(), means device is already gone (disconnected) call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return true to break the function not to send ATS Invalidation request to the disconnected device blindly, thus avoid to trigger further ITE fault, and ITE fault will block all invalidation request to be handled. furthermore retry the timeout request could trigger hard lockup. safe removal (present) & surprise removal (not present) pciehp_ist() pciehp_handle_presence_or_link_change() pciehp_disable_slot() remove_board() pciehp_unconfigure_device(presence) { if (!presence) pci_walk_bus(parent, pci_dev_set_disconnected, NULL); } this patch works for regular safe removal and surprise removal of ATS capable endpoint on PCIe switch downstream ports. Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface") Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Tested-by: Haorong Ye <yehaorong@bytedance.com> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> Link: https://lore.kernel.org/r/20240301080727.3529832-3-haifeng.zhao@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-06PCI: Make pci_dev_is_disconnected() helper public for other driversEthan Zhao
Make pci_dev_is_disconnected() public so that it can be called from Intel VT-d driver to quickly fix/workaround the surprise removal unplug hang issue for those ATS capable devices on PCIe switch downstream hotplug capable ports. Beside pci_device_is_present() function, this one has no config space space access, so is light enough to optimize the normal pure surprise removal and safe removal flow. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Tested-by: Haorong Ye <yehaorong@bytedance.com> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> Link: https://lore.kernel.org/r/20240301080727.3529832-2-haifeng.zhao@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-06Fix cpupower-frequency-info.1 man page typoJan Kratochvil
cpupower-frequency-info.1 man page type is incorrect for related-cpus. Fix it. utils/cpufreq-info.c {"related-cpus", no_argument, NULL, 'r'}, {"affected-cpus", no_argument, NULL, 'a'}, Fixed changelog before applying: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Jan Kratochvil <jan@jankratochvil.net> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-03-06Merge tag 'vfs-6.8-release.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Get rid of copy_mc flag in iov_iter which really only makes sense for the core dumping code so move it out of the generic iov iter code and make it coredump's problem. See the detailed commit description. - Revert fs/aio: Make io_cancel() generate completions again The initial fix here was predicated on the assumption that calling ki_cancel() didn't complete aio requests. However, that turned out to be wrong since the two drivers that actually make use of this set a cancellation function that performs the cancellation correctly. So revert this change. - Ensure that the test for IOCB_AIO_RW always happens before the read from ki_ctx. * tag 'vfs-6.8-release.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iov_iter: get rid of 'copy_mc' flag fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion Revert "fs/aio: Make io_cancel() generate completions again"
2024-03-06Merge tag 'arm-fixes-6.8-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "These should be the final fixes for the soc tree for 6.8, as usual they mostly deal wtih dts files: - Qualcomm fixes for pcie4 on sc8280xp, a revert of msm8996 mpm support, sm6115 interconnect and sm8650 gpio. - Two fixes for Tegra234 ethernet - A Makefile fix to actually build the allwinner based orange pi zero 2w device tree - Fixes for clocks and reset on imx8mp and a DSI display regression on imx7. The non-DT fixes are: - Firmware fixes addressing a kernel panic in op-tee and a minor regression in microchip/riscv. - A defconfig change to bring back backlight support after a Kconfig change" * tag 'arm-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: firmware: microchip: Fix over-requested allocation size tee: optee: Fix kernel panic caused by incorrect error handling Revert "arm64: dts: qcom: msm8996: Hook up MPM" arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed arm64: dts: imx8mp: Fix LDB clocks property arm64: dts: imx8mp: Fix TC9595 reset GPIO on DH i.MX8M Plus DHCOM SoM MAINTAINERS: Use a proper mailinglist for NXP i.MX development ARM: dts: imx7: remove DSI port endpoints arm64: dts: allwinner: h616: Add Orange Pi Zero 2W to Makefile ARM: imx_v6_v7_defconfig: Restore CONFIG_BACKLIGHT_CLASS_DEVICE arm64: tegra: Fix Tegra234 MGBE power-domains arm64: tegra: Set the correct PHY mode for MGBE arm64: dts: qcom: sm6115: Fix missing interconnect-names arm64: dts: qcom: sm8650-mtp: add gpio74 as reserved gpio arm64: dts: qcom: sm8650-qrd: add gpio74 as reserved gpio
2024-03-06Merge tag 'v6.8-p6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix potential use-after-frees in rk3288 and sun8i-ce" * tag 'v6.8-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: rk3288 - Fix use after free in unprepare crypto: sun8i-ce - Fix use after free in unprepare
2024-03-06bcache: move calculation of stripe_size and io_opt into bcache_device_initChristoph Hellwig
bcache currently calculates the stripe size for the non-cached_dev case directly in bcache_device_init, but for the cached_dev case it does it in the caller. Consolidate it in one places, which also enables setting the io_opt queue_limit before allocating the gendisk so that it can be passed in instead of changing the limit just after the allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20240226104826.283067-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06virtio_blk: Do not use disk_set_max_open/active_zones()Damien Le Moal
In virtblk_read_zoned_limits(), setting a zoned block device maximum number of open and active zones using the functions disk_set_max_open_zones() and disk_set_max_active_zones() is incorrect as setting the limits for the request queue is now done atomically when the gendisk is created (with blk_mq_alloc_disk()). The value set by the disk_set_max_open/active_zones() functions will be overwritten. Fix this by setting the maximum number of open and active zones directly in the queue_limits structure passed to virtblk_read_zoned_limits(). Fixes: 8b837256560c ("virtio_blk: pass queue_limits to blk_mq_alloc_disk") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240301192639.410183-2-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06aoe: fix the potential use-after-free problem in aoecmd_cfg_pktsChun-Yi Lee
This patch is against CVE-2023-6270. The description of cve is: A flaw was found in the ATA over Ethernet (AoE) driver in the Linux kernel. The aoecmd_cfg_pkts() function improperly updates the refcnt on `struct net_device`, and a use-after-free can be triggered by racing between the free on the struct and the access through the `skbtxq` global queue. This could lead to a denial of service condition or potential code execution. In aoecmd_cfg_pkts(), it always calls dev_put(ifp) when skb initial code is finished. But the net_device ifp will still be used in later tx()->dev_queue_xmit() in kthread. Which means that the dev_put(ifp) should NOT be called in the success path of skb initial code in aoecmd_cfg_pkts(). Otherwise tx() may run into use-after-free because the net_device is freed. This patch removed the dev_put(ifp) in the success path in aoecmd_cfg_pkts(), and added dev_put() after skb xmit in tx(). Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270 Fixes: 7562f876cd93 ("[NET]: Rework dev_base via list_head (v3)") Signed-off-by: Chun-Yi Lee <jlee@suse.com> Link: https://lore.kernel.org/r/20240305082048.25526-1-jlee@suse.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06block: move capacity validation to blkpg_do_ioctl()Li Lingfeng
Commit 6d4e80db4ebe ("block: add capacity validation in bdev_add_partition()") add check of partition's start and end sectors to prevent exceeding the size of the disk when adding partitions. However, there is still no check for resizing partitions now. Move the check to blkpg_do_ioctl() to cover resizing partitions. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305032132.548958-1-lilingfeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06block: prevent division by zero in blk_rq_stat_sum()Roman Smirnov
The expression dst->nr_samples + src->nr_samples may have zero value on overflow. It is necessary to add a check to avoid division by zero. Found by Linux Verification Center (linuxtesting.org) with Svace. Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20240305134509.23108-1-r.smirnov@omp.ru Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: atomically update queue limits in drbd_reconsider_queue_parametersChristoph Hellwig
Switch drbd_reconsider_queue_parameters to set up the queue parameters in an on-stack queue_limits structure and apply the atomically. Remove various helpers that have become so trivial that they can be folded into drbd_reconsider_queue_parameters. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305134041.137006-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: split out a drbd_discard_supported helperChristoph Hellwig
Add a helper to check if discard is supported for a given connection / backing device combination. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-7-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: don't set max_write_zeroes_sectors in decide_on_discard_supportChristoph Hellwig
fixup_write_zeroes always overrides the max_write_zeroes_sectors value a little further down the callchain, so don't bother to setup a limit in decide_on_discard_support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-6-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: merge drbd_setup_queue_param into drbd_reconsider_queue_parametersChristoph Hellwig
drbd_setup_queue_param is only called by drbd_reconsider_queue_parameters and there is no really clear boundary of responsibilities between the two. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-5-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: refactor the backing dev max_segments calculationChristoph Hellwig
Factor out a drbd_backing_dev_max_segments helper that checks the backing device limitation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-4-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: refactor drbd_reconsider_queue_parametersChristoph Hellwig
Split out a drbd_max_peer_bio_size helper for the peer I/O size, and condense the various checks to a nested min3(..., max())) instead of using a lot of local variables. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305134041.137006-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: pass the max_hw_sectors limit to blk_alloc_diskChristoph Hellwig
Pass a queue_limits structure with the max_hw_sectors limit to blk_alloc_disk instead of updating the limit on the allocated gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305134041.137006-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06sed-opal: Remove the ret variable from the functionLi kunyu
The ret variable in the function has not yet been effective and can be removed. Signed-off-by: Li kunyu <kunyu@nfschina.com> Link: https://lore.kernel.org/r/20240306101444.1244-1-kunyu@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06sed-opal: Remove unnecessary ‘0’ values from retLi kunyu
ret is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li kunyu <kunyu@nfschina.com> Link: https://lore.kernel.org/r/20240306100659.106521-1-kunyu@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06sed-opal: Remove unnecessary ‘0’ values from errLi zeming
err is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20240306100216.69340-1-zeming@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06sed-opal: Remove unnecessary ‘0’ values from errorLi zeming
error is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20240306095608.26839-1-zeming@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06block: make block_class constantRicardo B. Marliere
Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the block_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305-class_cleanup-block-v1-1-130bb27b9c72@marliere.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06Merge tag 'md-6.9-20240305' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.9/block Pull MD fixes from Song: "This set fixes two issues: 1. dmraid regression since 6.7 kernels. This issue was initially reported in [1]. This set of fix has been reviewed and tested by md and dm folks. 2. raid5 hang since 6.7 kernel, reported in [2]. We haven't got a better fix for this issue yet. This revert is a workaround. It has been applied to 6.7 stable kernels [3], and proved to be affective. We will look more into this issue for a better fix. [1] https://lore.kernel.org/linux-raid/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/ [2] https://lore.kernel.org/linux-raid/20240123005700.9302-1-dan@danm.net/ [3] 87165c64fe1a in linux-6.7.y branch." * tag 'md-6.9-20240305' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: dm-raid: fix lockdep waring in "pers->hot_add_disk" dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape dm-raid: add a new helper prepare_suspend() in md_personality md/dm-raid: don't call md_reap_sync_thread() directly dm-raid: really frozen sync_thread during suspend md: add a new helper reshape_interrupted() md: export helper md_is_rdwr() md: export helpers to stop sync_thread md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d""
2024-03-06dasd: use the atomic queue limits APIChristoph Hellwig
Pass the constant limits directly to blk_mq_alloc_disk, set the nonrot flag there as well, and then use the commit API to change the transfer size and logical block size dependent values. This relies on the assumption that no I/O can be pending before the devices moves into the ready state and doesn't need extra freezing for changes to the queue limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240228133742.806274-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06dasd: move queue setup to common codeChristoph Hellwig
Most of the code in setup_blk_queue is shared between all disciplines. Move it to common code and leave a method to query the maximum number of transferable blocks, and a flag to indicate discard support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240228133742.806274-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06dasd: cleamup dasd_state_basic_to_readyChristoph Hellwig
Reflow dasd_state_basic_to_ready a bit to make it easier to modify. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240228133742.806274-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06block: Fix page refcounts for unaligned buffers in __bio_release_pages()Tony Battersby
Fix an incorrect number of pages being released for buffers that do not start at the beginning of a page. Fixes: 1b151e2435fc ("block: Remove special-casing of compound pages") Cc: stable@vger.kernel.org Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Tested-by: Greg Edwards <gedwards@ddn.com> Link: https://lore.kernel.org/r/86e592a9-98d4-4cff-a646-0c0084328356@cybernetics.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06Revert "drm/udl: Add ARGB8888 as a format"Douglas Anderson
This reverts commit 95bf25bb9ed5dedb7fb39f76489f7d6843ab0475. Apparently there was a previous discussion about emulation of formats and it was decided XRGB8888 was the only format to support for legacy userspace [1]. Remove ARGB8888. Userspace needs to be fixed to accept XRGB8888. [1] https://lore.kernel.org/r/60dc7697-d7a0-4bf4-a22e-32f1bbb792c2@suse.de Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240306063721.1.I4a32475190334e1fa4eef4700ecd2787a43c94b5@changeid
2024-03-06phy: qcom-qmp-combo: fix type-c switch registrationJohan Hovold
Due to a long-standing issue in driver core, drivers may not probe defer after having registered child devices to avoid triggering a probe deferral loop (see fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")). Move registration of the typec switch to after looking up clocks and other resources. Note that PHY creation can in theory also trigger a probe deferral when a 'phy' supply is used. This does not seem to affect the QMP PHY driver but the PHY subsystem should be reworked to address this (i.e. by separating initialisation and registration of the PHY). Fixes: 2851117f8f42 ("phy: qcom-qmp-combo: Introduce orientation switching") Cc: stable@vger.kernel.org # 6.5 Cc: Bjorn Andersson <quic_bjorande@quicinc.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Vinod Koul <vkoul@kernel.org> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20240217150228.5788-7-johan+linaro@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-03-06phy: qcom-qmp-combo: fix drm bridge registrationJohan Hovold
Due to a long-standing issue in driver core, drivers may not probe defer after having registered child devices to avoid triggering a probe deferral loop (see fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")). This could potentially also trigger a bug in the DRM bridge implementation which does not expect bridges to go away even if device links may avoid triggering this (when enabled). Move registration of the DRM aux bridge to after looking up clocks and other resources. Note that PHY creation can in theory also trigger a probe deferral when a 'phy' supply is used. This does not seem to affect the QMP PHY driver but the PHY subsystem should be reworked to address this (i.e. by separating initialisation and registration of the PHY). Fixes: 35921910bbd0 ("phy: qcom: qmp-combo: switch to DRM_AUX_BRIDGE") Fixes: 1904c3f578dc ("phy: qcom-qmp-combo: Introduce drm_bridge") Cc: stable@vger.kernel.org # 6.5 Cc: Bjorn Andersson <quic_bjorande@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Vinod Koul <vkoul@kernel.org> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20240217150228.5788-6-johan+linaro@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-03-06nvme: clear caller pointer on identify failureKeith Busch
The memory allocated for the identification is freed on failure. Set it to NULL so the caller doesn't have a pointer to that freed address. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-06regulator: lp8788-buck: fix copy and paste bug in lp8788_dvs_gpio_request()Dan Carpenter
"gpio2" as intended here, not "gpio1". Fixes: 95daa868f22b ("regulator: lp8788-buck: Fully convert to GPIO descriptors") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://msgid.link/r/19f62cc2-bdcf-46f7-a5c5-971ef05e1ea7@moroto.mountain Signed-off-by: Mark Brown <broonie@kernel.org>
2024-03-06nvme: host: fix double-free of struct nvme_id_ns in ns_update_nuse()Shin'ichiro Kawasaki
When nvme_identify_ns() fails, it frees the pointer to the struct nvme_id_ns before it returns. However, ns_update_nuse() calls kfree() for the pointer even when nvme_identify_ns() fails. This results in KASAN double-free, which was observed with blktests nvme/045 with proposed patches [1] on the kernel v6.8-rc7. Fix the double-free by skipping kfree() when nvme_identify_ns() fails. Link: https://lore.kernel.org/linux-block/20240304161303.19681-1-dwagner@suse.de/ [1] Fixes: a1a825ab6a60 ("nvme: add csi, ms and nuse to sysfs") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-06timer/migration: Fix quick check reporting late expiryFrederic Weisbecker
When a CPU is the last active in the hierarchy and it tries to enter into idle, the quick check looking up the next event towards cpuidle heuristics may report a too late expiry, such as in the following scenario: [GRP1:0] migrator = NONE active = NONE nextevt = T0:0, T0:1 / \ [GRP0:0] [GRP0:1] migrator = NONE migrator = NONE active = NONE active = NONE nextevt = T0, T1 nextevt = T2 / \ / \ 0 1 2 3 idle idle idle idle 0) The whole system is idle, and CPU 0 was the last migrator. CPU 0 has a timer (T0), CPU 1 has a timer (T1) and CPU 2 has a timer (T2). The expire order is T0 < T1 < T2. [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:0(i), T0:1 / \ [GRP0:0] [GRP0:1] migrator = CPU0 migrator = NONE active = CPU0 active = NONE nextevt = T0(i), T1 nextevt = T2 / \ / \ 0 1 2 3 active idle idle idle 1) CPU 0 becomes active. The (i) means a now ignored timer. [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:1 / \ [GRP0:0] [GRP0:1] migrator = CPU0 migrator = NONE active = CPU0 active = NONE nextevt = T1 nextevt = T2 / \ / \ 0 1 2 3 active idle idle idle 2) CPU 0 handles remote. No timer actually expired but ignored timers have been cleaned out and their sibling's timers haven't been propagated. As a result the top level's next event is T2 and not T1. 3) CPU 0 tries to enter idle without any global timer enqueued and calls tmigr_quick_check(). The expiry of T2 is returned instead of the expiry of T1. When the quick check returns an expiry that is too late, the cpuidle governor may pick up a C-state that is too deep. This may be result into undesired CPU wake up latency if the next timer is actually close enough. Fix this with assuming that expiries aren't sorted top-down while performing the quick check. Pick up instead the earliest encountered one while walking up the hierarchy. 7ee988770326 ("timers: Implement the hierarchical pull model") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240305002822.18130-1-frederic@kernel.org
2024-03-06drm/i915/panelreplay: Move out psr_init_dpcd() from init_connector()Animesh Manna
Move psr_init_dpcd() from init-connector to connector-detect function. The dpcd probe for checking panel replay capability for external dp connector is causing delay during boot which can be optimized by moving dpcd probe to connector specific detect(). v1: Initial version. v2: Add details in commit description. [Jani] Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10284 Signed-off-by: Animesh Manna <animesh.manna@intel.com> Fixes: cceeaa312d39 ("drm/i915/panelreplay: Enable panel replay dpcd initialization for DP") Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240229043716.4065760-1-animesh.manna@intel.com (cherry picked from commit 1cca19bf296fae0636a637b48d195ac6b4d430c9) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2024-03-06x86/topology: Ignore non-present APIC IDs in a present packageThomas Gleixner
Borislav reported that one of his systems has a broken MADT table which advertises eight present APICs and 24 non-present APICs in the same package. The non-present ones are considered hot-pluggable by the topology evaluation code, which is obviously bogus as there is no way to hot-plug within the same package. As the topology evaluation code accounts for hot-pluggable CPUs in a package, the maximum number of cores per package is computed wrong, which in turn causes the uncore performance counter driver to access non-existing MSRs. It will probably confuse other entities which rely on the maximum number of cores and threads per package too. Cure this by ignoring hot-pluggable APIC IDs within a present package. In theory it would be reasonable to just do this unconditionally, but then there is this thing called reality^Wvirtualization which ruins everything. Virtualization is the only existing user of "physical" hotplug and the virtualization tools allow the above scenario. Whether that is actually in use or not is unknown. As it can be argued that the virtualization case is not affected by the issues which exposed the reported problem, allow the bogosity if the kernel determined that it is running in a VM for now. Fixes: 89b0f15f408f ("x86/cpu/topology: Get rid of cpuinfo::x86_max_cores") Reported-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/87a5nbvccx.ffs@tglx
2024-03-06firewire: ohci: prevent leak of left-over IRQ on unbindEdmund Raile
Commit 5a95f1ded28691e6 ("firewire: ohci: use devres for requested IRQ") also removed the call to free_irq() in pci_remove(), leading to a leftover irq of devm_request_irq() at pci_disable_msi() in pci_remove() when unbinding the driver from the device remove_proc_entry: removing non-empty directory 'irq/136', leaking at least 'firewire_ohci' Call Trace: ? remove_proc_entry+0x19c/0x1c0 ? __warn+0x81/0x130 ? remove_proc_entry+0x19c/0x1c0 ? report_bug+0x171/0x1a0 ? console_unlock+0x78/0x120 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? remove_proc_entry+0x19c/0x1c0 unregister_irq_proc+0xf4/0x120 free_desc+0x3d/0xe0 ? kfree+0x29f/0x2f0 irq_free_descs+0x47/0x70 msi_domain_free_locked.part.0+0x19d/0x1d0 msi_domain_free_irqs_all_locked+0x81/0xc0 pci_free_msi_irqs+0x12/0x40 pci_disable_msi+0x4c/0x60 pci_remove+0x9d/0xc0 [firewire_ohci 01b483699bebf9cb07a3d69df0aa2bee71db1b26] pci_device_remove+0x37/0xa0 device_release_driver_internal+0x19f/0x200 unbind_store+0xa1/0xb0 remove irq with devm_free_irq() before pci_disable_msi() also remove it in fail_msi: of pci_probe() as this would lead to an identical leak Cc: stable@vger.kernel.org Fixes: 5a95f1ded28691e6 ("firewire: ohci: use devres for requested IRQ") Signed-off-by: Edmund Raile <edmund.raile@proton.me> Link: https://lore.kernel.org/r/20240229144723.13047-2-edmund.raile@proton.me Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-03-06drm/i915/dp: Fix connector DSC HW state readoutImre Deak
The DSC HW state of DP connectors is read out during driver loading and system resume in intel_modeset_update_connector_atomic_state(). This function is called for all connectors though and so the state of DSI connectors will also get updated incorrectly, triggering a WARN there wrt. the DSC decompression AUX device. Fix the above by moving the DSC state readout to a new DP connector specific sync_state() hook. This is anyway the logical place to update the connector object's state vs. the connector's atomic state. Fixes: b2608c6b3212 ("drm/i915/dp_mst: Enable MST DSC decompression for all streams") Reported-and-tested-by: Drew Davenport <ddavenport@chromium.org> Closes: https://lore.kernel.org/all/Zb0q8IDVXS0HxJyj@chromium.org Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240205132631.1588577-1-imre.deak@intel.com (cherry picked from commit a62e145981500996ea76af3d740ce0c0d74c5be0) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2024-03-06drm/i915/selftests: Fix dependency of some timeouts on HZJanusz Krzysztofik
Third argument of i915_request_wait() accepts a timeout value in jiffies. Most users pass either a simple HZ based expression, or a result of msecs_to_jiffies(), or MAX_SCHEDULE_TIMEOUT, or a very small number not exceeding 4 if applicable as that value. However, there is one user -- intel_selftest_wait_for_rq() -- that passes a WAIT_FOR_RESET_TIME symbol, defined as a large constant value that most probably represents a desired timeout in ms. While that usage results in the intended value of timeout on usual x86_64 kernel configurations, it is not portable across different architectures and custom kernel configs. Rename the symbol to clearly indicate intended units and convert it to jiffies before use. Fixes: 3a4bfa091c46 ("drm/i915/selftest: Fix workarounds selftest for GuC submission") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Rahul Kumar Singh <rahul.kumar.singh@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240222113347.648945-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 6ee3f54b880c91ab2e244eb4ffd4bfed37832b25) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2024-03-06inet: Add getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERTJuntong Deng
Currently getsockopt does not support IP_ROUTER_ALERT and IPV6_ROUTER_ALERT, and we are unable to get the values of these two socket options through getsockopt. This patch adds getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERT. Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-06thermal: core: remove unnecessary check in trip_point_hyst_store()Dan Carpenter
This code was shuffled around a bit recently. We no longer need to check the value of "ret" because we know it's zero. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-06fanotify: Fix misspelling of "writable"Vicki Pfau
Several file system notification system headers have "writable" misspelled as "writtable" in the comments. This patch fixes it in the fanotify header. Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240306020831.1404033-3-vi@endrift.com>
2024-03-06fsnotify: Fix misspelling of "writable"Vicki Pfau
Several file system notification system headers have "writable" misspelled as "writtable" in the comments. This patch fixes it in the fsnotify header. Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240306020831.1404033-2-vi@endrift.com>
2024-03-06inotify: Fix misspelling of "writable"Vicki Pfau
Several file system notification system headers have "writable" misspelled as "writtable" in the comments. This patch fixes it in the inotify header. Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240306020831.1404033-1-vi@endrift.com>
2024-03-06Merge branch 'ynl-small-recv'David S. Miller
Jakub Kicinski says: ==================== tools: ynl: add --dbg-small-recv for easier kernel testing When testing netlink dumps I usually hack some user space up to constrain its user space buffer size (iproute2, ethtool or ynl). Netlink will try to fill the messages up, so since these apps use large buffers by default, the dumps are rarely fragmented. I was hoping to figure out a way to create a selftest for dump testing, but so far I have no idea how to do that in a useful and generic way. Until someone does that, make manual dump testing easier with YNL. Create a special option for limiting the buffer size, so I don't have to make the same edits each time, and maybe others will benefit, too :) Example: $ ./cli.py [...] --dbg-small-recv >/dev/null Recv: read 3712 bytes, 29 messages nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 [...] nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 Recv: read 3968 bytes, 31 messages nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 [...] nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 Recv: read 532 bytes, 5 messages nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 [...] nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 nl_len = 20 (4) nl_flags = 0x2 nl_type = 3 Now let's make the DONE not fit in the last message: $ ./cli.py [...] --dbg-small-recv 4499 >/dev/null Recv: read 3712 bytes, 29 messages nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 [...] nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 Recv: read 4480 bytes, 35 messages nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 [...] nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 Recv: read 20 bytes, 1 messages nl_len = 20 (4) nl_flags = 0x2 nl_type = 3 A real test would also have to check the messages are complete and not duplicated. That part has to be done manually right now. Note that the first message is always conservatively sized by the kernel. Still, I think this is good enough to be useful. v2: - patch 2: - move the recv_size setting up - change the default to 0 so that cli.py doesn't have to worry what the "unset" value is v1: https://lore.kernel.org/all/20240301230542.116823-1-kuba@kernel.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-06tools: ynl: add --dbg-small-recv for easier kernel testingJakub Kicinski
Most "production" netlink clients use large buffers to make dump efficient, which means that handling of dump continuation in the kernel is not very well tested. Add an option for debugging / testing handling of dumps. It enables printing of extra netlink-level debug and lowers the recv() buffer size in one go. When used without any argument (--dbg-small-recv) it picks a very small default (4000), explicit size can be set, too (--dbg-small-recv 5000). Example: $ ./cli.py [...] --dbg-small-recv Recv: read 3712 bytes, 29 messages nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 [...] nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 Recv: read 3968 bytes, 31 messages nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 [...] nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 Recv: read 532 bytes, 5 messages nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 [...] nl_len = 128 (112) nl_flags = 0x0 nl_type = 19 nl_len = 20 (4) nl_flags = 0x2 nl_type = 3 (the [...] are edits to shorten the commit message). Note that the first message of the dump is sized conservatively by the kernel. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-06tools: ynl: support debug printing messagesJakub Kicinski
For manual debug, allow printing the netlink level messages to stderr. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-06tools: ynl: allow setting recv() sizeJakub Kicinski
Make the size of the buffer we use for recv() configurable. The details of the buffer sizing in netlink are somewhat arcane, we could spend a lot of time polishing this API. Let's just leave some hopefully helpful comments for now. This is a for-developers-only feature, anyway. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-06tools: ynl: move the new line in NlMsg __repr__Jakub Kicinski
We add the new line even if message has no error or extack, which leads to print(nl_msg) ending with two new lines. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>