summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-21intel_idle: Add a "Long HLT" C1 state for the VM guest modeArjan van de Ven
intel_idle will, for the bare metal case, usually have one or more deep power states that have the CPUIDLE_FLAG_TLB_FLUSHED flag set. When a state with this flag is selected by the cpuidle framework, it will also flush the TLBs as part of entering this state. The benefit of doing this is that the kernel does not need to wake the cpu out of this deep power state just to flush the TLBs... for which the latency can be very high due to the exit latency of deep power states. In a VM guest currently, this benefit of avoiding the wakeup does not exist, while the problem (long exit latency) is even more severe. Linux will need to wake up a vCPU (causing the host to either come out of a deep C state, or the VMM to have to deschedule something else to schedule the vCPU) which can take a very long time.. adding a lot of latency to tlb flush operations (including munmap and others). To solve this, add a "Long HLT" C state to the state table for the VM guest case that has the CPUIDLE_FLAG_TLB_FLUSHED flag set. The result of that is that for long idle periods (where the VMM is likely to do things that cause large latency) the cpuidle framework will flush the TLBs (and avoid the wakeups), while for short/quick idle durations, the existing behavior is retained. Now, there is still only "hlt" available in the guest, but for long idle, the host can go to a deeper state (say C6). There is a reasonable debate one can have to what to set for the exit_latency and break even point for this "Long HLT" state. The good news is that intel_idle has these values available for the underlying CPU (even when mwait is not exposed). The solution thus is to just use the latency and break even of the deepest state from the bare metal CPU. This is under the assumption that this is a pretty reasonable estimate of what the VMM would do to cause latency. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-21cpufreq: intel_pstate: Fix energy_performance_preference for passiveTero Kristo
If the intel_pstate driver is set to passive mode, then writing the same value to the energy_performance_preference sysfs twice will fail. This is caused by the wrong return value used (index of the matched energy_perf_string), instead of the length of the passed in parameter. Fix by forcing the internal return value to zero when the same preference is passed in by user. This same issue is not present when active mode is used for the driver. Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled") Reported-by: Niklas Neronin <niklas.neronin@intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-21Merge tag 'spi-fix-v6.4-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "One last fix for SPI, just a simple fix for incorrect handling of probe deferral for DMA in the Qualcomm GENI driver" * tag 'spi-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-geni-qcom: correctly handle -EPROBE_DEFER from dma_request_chan()
2023-06-21Merge tag 'regulator-fix-v6.4-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "One simple fix for v6.4, some incorrectly specified bitfield masks in the PCA9450 driver" * tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: pca9450: Fix LDO3OUT and LDO4OUT MASK
2023-06-21Merge tag 'regmap-fix-v6.4-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "One more fix for v6.4 The earlier fix to take account of the register data size when limiting raw register writes exposed the fact that the Intel AVMM bus was incorrectly specifying too low a limit on the maximum data transfer, it is only capable of transmitting one register so had set a transfer size limit that couldn't fit both the value and the the register address into a single message" * tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: spi-avmm: Fix regmap_bus max_raw_write
2023-06-21cpufreq: amd-pstate: Add a kernel config option to set default modeMario Limonciello
Users are having more success with amd-pstate since the introduction of EPP and Guided modes. To expose the driver to more users by default introduce a kernel configuration option for setting the default mode. Users can use an integer to map out which default mode they want to use in lieu of a kernel command line option. This will default to EPP, but only if: 1) The CPU supports an MSR. 2) The system profile is identified 3) The system profile is identified as a non-server by the FADT. Link: https://gitlab.freedesktop.org/hadess/power-profiles-daemon/-/merge_requests/121 Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Co-developed-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-21cpufreq: amd-pstate: Set a fallback policy based on preferred_profileMario Limonciello
If a user's configuration doesn't explicitly specify the cpufreq scaling governor then the code currently explicitly falls back to 'powersave'. This default is fine for notebooks and desktops, but servers and undefined machines should default to 'performance'. Look at the 'preferred_profile' field from the FADT to set this policy accordingly. Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#fixed-acpi-description-table-fadt Acked-by: Huang Rui <ray.huang@amd.com> Suggested-by: Wyes Karny <Wyes.Karny@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-21ACPI: CPPC: Add definition for undefined FADT preferred PM profile valueMario Limonciello
In the event a new preferred PM profile value is introduced it's best for code to be able to defensively guard against it so that the wrong settings don't get applied on a new system that uses this profile but ancient kernels. Acked-by: Huang Rui <ray.huang@amd.com> Suggested-by: Gautham Ranjal Shenoy <gautham.shenoy@amd.com> Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#fixed-acpi-description-table-fadt Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-21docs: consolidate storage interfacesCosta Shulyupin
to make the page more organized as requested Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230618062937.481280-1-costa.shul@redhat.com
2023-06-21Documentation: update git configuration for Link: tagJohannes Berg
The latest version of git (2.41.0) changed the spelling of Message-Id to Message-ID. Adjust the perl script here to accept both spellings. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Tested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230619115533.981f6abaca01.I1960c39b1d61e8514afcef4806a450a209133187@changeid
2023-06-21arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safeMark Rutland
When patching kernel alternatives, we need to be careful not to execute kernel code which is itself subject to patching. In general, if code is executed after the instructions in memory have been patched but prior to the cache maintenance and barriers completing, it could lead to UNPREDICTABLE results. As our regular cache maintenance routines are patched with alternatives, we have a clean_dcache_range_nopatch() function which is *intended* to avoid patchable code and therefore supposed to be safe in the middle of patching alternatives. Unfortunately, it's not marked as 'noinstr', and so can be instrumented with patchable code. Additionally, it calls read_sanitised_ftr_reg() (which may be instrumented with patchable code) to find the sanitized value of CTR_EL0.DminLine, and is therefore not safe to call during patching. Luckily, since commit: 675b0563d6b26aa9 ("arm64: cpufeature: expose arm64_ftr_reg struct for CTR_EL0") ... we can read the sanitised CTR_EL0 value directly, and avoid the call to read_sanitised_ftr_reg(). This patch marks clean_dcache_range_nopatch() as noinstr, and has it read the sanitized CTR_EL0 value directly, avoiding the issues above. As a bonus, this is also an optimization. As read_sanitised_ftr_reg() performs a binary search to find the CTR_EL0 value, reading the value directly avoids this binary search per applied alternative, avoiding some unnecessary work. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230616103150.1238132-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-21blk-mq: don't insert passthrough request into sw queueMing Lei
In case of real io scheduler, q->elevator is set, so blk_mq_run_hw_queue() may just check if scheduler queue has request to dispatch, see __blk_mq_sched_dispatch_requests(). Then IO hang may be caused because all passthorugh requests may stay in sw queue. And any passthrough request should have been inserted to hctx->dispatch always. Reported-by: Guangwu Zhang <guazhang@redhat.com> Fixes: d97217e7f024 ("blk-mq: don't queue plugged passthrough requests into scheduler") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230621132208.1142318-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-21bsg: make bsg_class a static const structureIvan Orlov
Now that the driver core allows for struct class to be in read-only memory, move the bsg_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-scsi@vger.kernel.org Cc: linux-block@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230620180129.645646-8-gregkh@linuxfoundation.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-21ublk: make ublk_chr_class a static const structureIvan Orlov
Now that the driver core allows for struct class to be in read-only memory, move the ublk_chr_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Ming Lei <ming.lei@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230620180129.645646-7-gregkh@linuxfoundation.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-21aoe: make aoe_class a static const structureIvan Orlov
Now that the driver core allows for struct class to be in read-only memory, move the aoe_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Justin Sanders <justin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230620180129.645646-6-gregkh@linuxfoundation.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-21block/rnbd: make all 'class' structures constIvan Orlov
Now that the driver core allows for struct class to be in read-only memory, making all 'class' structures to be declared at build time placing them into read-only memory, instead of having to be dynamically allocated at load time. Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com> Cc: Jack Wang <jinpu.wang@ionos.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20230620180129.645646-5-gregkh@linuxfoundation.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-21block: fix the exclusive open mask in disk_scan_partitionsChristoph Hellwig
FMODE_EXEC has nothing to do with exclusive opens, and even is of the wrong type. We need to check for BLK_OPEN_EXCL here. Fixes: 985958b8584c ("block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230621124914.185992-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-21io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdrJens Axboe
Rather than assign the user pointer to msghdr->msg_control, assign it to msghdr->msg_control_user to make sparse happy. They are in a union so the end result is the same, but let's avoid new sparse warnings and squash this one. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/ Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-21io_uring/net: disable partial retries for recvmsg with cmsgJens Axboe
We cannot sanely handle partial retries for recvmsg if we have cmsg attached. If we don't, then we'd just be overwriting the initial cmsg header on retries. Alternatively we could increment and handle this appropriately, but it doesn't seem worth the complication. Move the MSG_WAITALL check into the non-multishot case while at it, since MSG_WAITALL is explicitly disabled for multishot anyway. Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/ Cc: stable@vger.kernel.org # 5.10+ Reported-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-21io_uring/net: clear msg_controllen on partial sendmsg retryJens Axboe
If we have cmsg attached AND we transferred partial data at least, clear msg_controllen on retry so we don't attempt to send that again. Cc: stable@vger.kernel.org # 5.10+ Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries") Reported-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-21Documentation/arm64: Add ptdump documentationChaitanya S Prakash
ptdump is a debugfs interface used to dump the kernel page tables. It provides a comprehensive overview about the kernel's virtual memory layout, page table entries and associated page attributes. A document detailing how to enable ptdump in the kernel and analyse its output has been added. Changes in V2: - Corrected command to cat /sys/kernel/debug/kernel_page_tables Changes in V1: https://lore.kernel.org/all/20230613064845.1882177-1-chaitanyas.prakash@arm.com/ Cc: Will Deacon <will@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> CC: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Link: https://lore.kernel.org/r/20230619083802.76092-1-chaitanyas.prakash@arm.com [catalin.marinas@arm.com: various minor fixups; sorted index.rst in alphabetical order] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-21Merge tag 'asoc-fix-v6.4-rc7' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fix for v6.4 A fix for a typoed iterator in the Intel Soundwire driver, fairly simple on inspection though not reviewed by Intel.
2023-06-21Merge branch irq/misc-6.5 into irq/irqchip-nextMarc Zyngier
* irq/misc-6.5: : . : Misc cleanups: : : - Add a number of missing prototypes : - Mark global symbol as static where needed : - Drop some now useless non-DT code paths : - Add a missing interrupt mapping to the STM32 irqchip : - Silence another STM32 warning when building with W=1 : - Fix the jcore-aic driver that actually never worked... : . Revert "irqchip/mxs: Include linux/irqchip/mxs.h" irqchip/jcore-aic: Fix missing allocation of IRQ descriptors irqchip/stm32-exti: Fix warning on initialized field overwritten irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype irqchip/mxs: Include linux/irqchip/mxs.h irqchip/clps711x: Remove unused clps711x_intc_init() function irqchip/mmp: Remove non-DT codepath irqchip/ftintc010: Mark all function static irqdomain: Include internals.h for function prototypes Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-06-21Revert "irqchip/mxs: Include linux/irqchip/mxs.h"Marc Zyngier
This reverts commit 5b7e5676209120814dbb9fec8bc3769f0f7a7958. Although including linux/irqchip/mxs.h is technically correct, this clashes with the parallel removal of this include file with 32bit ARM modernizing the low level irq handling as part of 5bb578a0c1b8 ("ARM: 9298/1: Drop custom mdesc->handle_irq()"). As such, this patch is not only unnecessary, it also breaks compilation in -next. Revert it. Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Shawn Guo <shawnguo@kernel.org>
2023-06-21arm64: hibernate: remove WARN_ON in save_processor_stateSong Shuai
During hibernation or restoration, freeze_secondary_cpus checks num_online_cpus via BUG_ON, and the subsequent save_processor_state also does the checking with WARN_ON. In the case of CONFIG_PM_SLEEP_SMP=n, freeze_secondary_cpus is not defined, but the sole possible condition to disable CONFIG_PM_SLEEP_SMP is !SMP where num_online_cpus is always 1. We also don't have to check it in save_processor_state. So remove the unnecessary checking in save_processor_state. Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230609075049.2651723-3-songshuaishuai@tinylab.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-21ALSA: hda/realtek: Add quirk for ASUS ROG GV601VLuke D. Jones
Adds the required quirk to enable the Cirrus amp and correct pins on the ASUS ROG GV601V series. While this works if the related _DSD properties are made available, these aren't included in the ACPI of these laptops (yet). Signed-off-by: Luke D. Jones <luke@ljones.dev> Link: https://lore.kernel.org/r/20230621085715.5382-1-luke@ljones.dev Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-06-21kselftest/arm64: Log signal code and address for unexpected signalsMark Brown
If we get an unexpected signal during a signal test log a bit more data to aid diagnostics. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230620-arm64-selftest-log-wrong-signal-v1-1-3fe29bdaaf38@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-21bpf: Force kprobe multi expected_attach_type for kprobe_multi linkJiri Olsa
We currently allow to create perf link for program with expected_attach_type == BPF_TRACE_KPROBE_MULTI. This will cause crash when we call helpers like get_attach_cookie or get_func_ip in such program, because it will call the kprobe_multi's version (current->bpf_ctx context setup) of those helpers while it expects perf_link's current->bpf_ctx context setup. Making sure that we use BPF_TRACE_KPROBE_MULTI expected_attach_type only for programs attaching through kprobe_multi link. Fixes: ca74823c6e16 ("bpf: Add cookie support to programs attached with kprobe multi link") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230618131414.75649-1-jolsa@kernel.org
2023-06-21bpf/btf: Accept function names that contain dotsFlorent Revest
When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn, pahole creates BTF_KIND_FUNC entries for these and this makes the BTF metadata validation fail because they contain a dot. In a dramatic turn of event, this BTF verification failure can cause the netfilter_bpf initialization to fail, causing netfilter_core to free the netfilter_helper hashmap and netfilter_ftp to trigger a use-after-free. The risk of u-a-f in netfilter will be addressed separately but the existence of "asan.module_ctor" debug info under some build conditions sounds like a good enough reason to accept functions that contain dots in BTF. Although using only LLVM=1 is the recommended way to compile clang-based kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still try to support that combination according to Nick. To clarify: - > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended, but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue - <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in which case GNU as will be used Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Yonghong Song <yhs@meta.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/bpf/20230615145607.3469985-1-revest@chromium.org
2023-06-21Revert "virtio-blk: support completion batching for the IRQ path"Michael S. Tsirkin
This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13. This change appears to have broken things... We now see applications hanging during disk accesses. e.g. multi-port virtio-blk device running in h/w (FPGA) Host running a simple 'fio' test. [global] thread=1 direct=1 ioengine=libaio norandommap=1 group_reporting=1 bs=4K rw=read iodepth=128 runtime=1 numjobs=4 time_based [job0] filename=/dev/vda [job1] filename=/dev/vdb [job2] filename=/dev/vdc ... [job15] filename=/dev/vdp i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads This is repeatedly run in a loop. After a few, normally <10 seconds, fio hangs. With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging. Last message: fio-3.19 Starting 8 threads Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s] I think this means at the end of the run 1 queue was left incomplete. 'diskstats' (run while fio is hung) shows no outstanding transactions. e.g. $ cat /proc/diskstats ... 252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0 252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0 ... Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called. e.g. PF= 0 vq=0 1 2 3 [a]request_count - 839416590 813148916 105586179 84988123 [b]completion1_count - 839416590 813148916 105586179 84988123 [c]completion2_count - 0 0 0 0 PF= 1 vq=0 1 2 3 [a]request_count - 823335887 812516140 104582672 75856549 [b]completion1_count - 823335887 812516140 104582672 75856549 [c]completion2_count - 0 0 0 0 i.e. the issue is after the virtio-blk driver. This change was introduced in kernel 6.3.0. I am seeing this using 6.3.3. If I run with an earlier kernel (5.15), it does not occur. If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail. e.g. kernel 5.15 - this is OK virtio_blk.c,virtblk_done() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { blk_mq_complete_request(req); } kernel 6.3.3 - this fails virtio_blk.c,virtblk_handle_req() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { if (!blk_mq_complete_request_remote(req)) { if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed } } } If I do, kernel 6.3.3 - this is OK virtio_blk.c,virtblk_handle_req() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { if (!blk_mq_complete_request_remote(req)) { virtblk_request_done(req); //force this here... if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed } } } Perhaps you might like to fix/test/revert this change... Martin Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/ Cc: Suwan Kim <suwan.kim027@gmail.com> Tested-by: edliaw@google.com Reported-by: "Roberts, Martin" <martin.roberts@intel.com> Message-Id: <336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-21readdir: Replace one-element arrays with flexible-array membersGustavo A. R. Silva
One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element arrays with flexible-array members in multiple structures. Address the following -Wstringop-overflow warnings seen when built m68k architecture with m5307c3_defconfig configuration: In function '__put_user_fn', inlined from 'fillonedir' at fs/readdir.c:170:2: include/asm-generic/uaccess.h:49:35: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 49 | *(u8 __force *)to = *(u8 *)from; | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~ fs/readdir.c: In function 'fillonedir': fs/readdir.c:134:25: note: at offset 1 into destination object 'd_name' of size 1 134 | char d_name[1]; | ^~~~~~ In function '__put_user_fn', inlined from 'filldir' at fs/readdir.c:257:2: include/asm-generic/uaccess.h:49:35: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 49 | *(u8 __force *)to = *(u8 *)from; | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~ fs/readdir.c: In function 'filldir': fs/readdir.c:211:25: note: at offset 1 into destination object 'd_name' of size 1 211 | char d_name[1]; | ^~~~~~ This helps with the ongoing efforts to globally enable -Wstringop-overflow. This results in no differences in binary output. Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/312 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Message-Id: <ZJHiPJkNKwxkKz1c@work> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-20fsverity: improve documentation for builtin signature supportEric Biggers
fsverity builtin signatures (CONFIG_FS_VERITY_BUILTIN_SIGNATURES) aren't the only way to do signatures with fsverity, and they have some major limitations. Yet, more users have tried to use them, e.g. recently by https://github.com/ostreedev/ostree/pull/2640. In most cases this seems to be because users aren't sufficiently familiar with the limitations of this feature and what the alternatives are. Therefore, make some updates to the documentation to try to clarify the properties of this feature and nudge users in the right direction. Note that the Integrity Policy Enforcement (IPE) LSM, which is not yet upstream, is planned to use the builtin signatures. (This differs from IMA, which uses its own signature mechanism.) For that reason, my earlier patch "fsverity: mark builtin signatures as deprecated" (https://lore.kernel.org/r/20221208033548.122704-1-ebiggers@kernel.org), which marked builtin signatures as "deprecated", was controversial. This patch therefore stops short of marking the feature as deprecated. I've also revised the language to focus on better explaining the feature and what its alternatives are. Link: https://lore.kernel.org/r/20230620041937.5809-1-ebiggers@kernel.org Reviewed-by: Colin Walters <walters@verbum.org> Reviewed-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-20Revert "net: phy: dp83867: perform soft reset and retain established link"Francesco Dolcini
This reverts commit da9ef50f545f86ffe6ff786174d26500c4db737a. This fixes a regression in which the link would come up, but no communication was possible. The reverted commit was also removing a comment about DP83867_PHYCR_FORCE_LINK_GOOD, this is not added back in this commits since it seems that this is unrelated to the original code change. Closes: https://lore.kernel.org/all/ZGuDJos8D7N0J6Z2@francesco-nb.int.toradex.com/ Fixes: da9ef50f545f ("net: phy: dp83867: perform soft reset and retain established link") Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Praneeth Bajjuri <praneeth@ti.com> Link: https://lore.kernel.org/r/20230619154435.355485-1-francesco@dolcini.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20net: mdio: fix the wrong parametersJiawen Wu
PHY address and device address are passed in the wrong order. Cc: stable@vger.kernel.org Fixes: 4e4aafcddbbf ("net: mdio: Add dedicated C45 API to MDIO bus drivers") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230619094948.84452-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20Merge tag 'mm-hotfixes-stable-2023-06-20-12-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "19 hotfixes. 8 of these are cc:stable. This includes a wholesale reversion of the post-6.4 series 'make slab shrink lockless'. After input from Dave Chinner it has been decided that we should go a different way [1]" Link: https://lkml.kernel.org/r/ZH6K0McWBeCjaf16@dread.disaster.area [1] * tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: selftests/mm: fix cross compilation with LLVM mailmap: add entries for Ben Dooks nilfs2: prevent general protection fault in nilfs_clear_dirty_page() Revert "mm: vmscan: make global slab shrink lockless" Revert "mm: vmscan: make memcg slab shrink lockless" Revert "mm: vmscan: add shrinker_srcu_generation" Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless" Revert "mm: vmscan: hold write lock to reparent shrinker nr_deferred" Revert "mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers()" Revert "mm: shrinkers: convert shrinker_rwsem to mutex" nilfs2: fix buffer corruption due to concurrent device reads scripts/gdb: fix SB_* constants parsing scripts: fix the gfp flags header path in gfp-translate udmabuf: revert 'Add support for mapping hugepages (v4)' mm/khugepaged: fix iteration in collapse_file memfd: check for non-NULL file_seals in memfd_create() syscall mm/vmalloc: do not output a spurious warning when huge vmalloc() fails mm/mprotect: fix do_mprotect_pkey() limit check writeback: fix dereferencing NULL mapping->host on writeback_page_template
2023-06-20Merge tag 'acpi-6.4-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Fix a kernel crash during early resume from ACPI S3 that has been present since the 5.15 cycle when might_sleep() was added to down_timeout(), which in some configurations of the kernel caused an implicit preemption point to trigger at a wrong time" * tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep()
2023-06-20Merge tag 'thermal-6.4-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "Fix a regression introduced during the 6.3 cycle causing intel_soc_dts_iosf to report incorrect temperature values due to a coding mistake (Hans de Goede)" * tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/intel/intel_soc_dts_iosf: Fix reporting wrong temperatures
2023-06-20Merge tag 'trace-v6.4-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix MAINTAINERS file to point to proper mailing list for rtla and rv The mailing list pointed to linux-trace-devel instead of linux-trace-kernel. The former is for the tracing libraries and the latter is for anything in the Linux kernel tree. The wrong mailing list was used because linux-trace-kernel did not exist when rtla and rv were created. - User events: - Fix matching of dynamic events to their user events When user writes to dynamic_events file, a lookup of the registered dynamic events is made, but there were some cases that a match could be incorrectly made. - Add auto cleanup of user events Have the user events automatically get removed when the last reference (file descriptor) is closed. This was asked for to prevent leaks of user events hanging around needing admins to clean them up. - Add persistent logic (but not let user space use it yet) In some cases, having a persistent user event (one that does not get cleaned up automatically) is useful. But there's still debates about how to expose this to user space. The infrastructure is added, but the API is not. - Update the selftests Update the user event selftests to reflect the above changes" * tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/user_events: Document auto-cleanup and remove dyn_event refs selftests/user_events: Adapt dyn_test to non-persist events selftests/user_events: Ensure auto cleanup works as expected tracing/user_events: Add auto cleanup and future persist flag tracing/user_events: Track refcount consistently via put/get tracing/user_events: Store register flags on events tracing/user_events: Remove user_ns walk for groups selftests/user_events: Add perf self-test for empty arguments events selftests/user_events: Clear the events after perf self-test selftests/user_events: Add ftrace self-test for empty arguments events tracing/user_events: Fix the incorrect trace record for empty arguments events tracing: Modify print_fields() for fields output order tracing/user_events: Handle matching arguments that is null from dyn_events tracing/user_events: Prevent same name but different args event tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list
2023-06-20Merge tag 'for-6.4-rc7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more regression fix for an assertion failure that uncovered a nasty problem with stripe calculations. This is caused by a u32 overflow when there are enough devices. The fstests require 6 so this hasn't been caught, I was able to hit it with 8. The fix is minimal and only adds u64 casts, we'll clean that up later. I did various additional tests to be sure" * tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix u32 overflows when left shifting stripe_nr
2023-06-20netfilter: nf_tables: Fix for deleting base chains with payloadPhil Sutter
When deleting a base chain, iptables-nft simply submits the whole chain to the kernel, including the NFTA_CHAIN_HOOK attribute. The new code added by fixed commit then turned this into a chain update, destroying the hook but not the chain itself. Detect the situation by checking if the chain type is either netdev or inet/ingress. Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nfnetlink_osf: fix module autoloadPablo Neira Ayuso
Move the alias from xt_osf to nfnetlink_osf. Fixes: f9324952088f ("netfilter: nfnetlink_osf: extract nfnetlink_subsystem code from xt_osf.c") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: drop module reference after updating chainPablo Neira Ayuso
Otherwise the module reference counter is leaked. Fixes b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: disallow timeout for anonymous setsPablo Neira Ayuso
Never used from userspace, disallow these parameters. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: disallow updates of anonymous setsPablo Neira Ayuso
Disallow updates of set timeout and garbage collection parameters for anonymous sets. Fixes: 123b99619cca ("netfilter: nf_tables: honor set timeout and garbage collection updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: reject unbound chain set before commit phasePablo Neira Ayuso
Use binding list to track set transaction and to check for unbound chains before entering the commit phase. Bail out if chain binding remain unused before entering the commit step. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: reject unbound anonymous set before commit phasePablo Neira Ayuso
Add a new list to track set transaction and to check for unbound anonymous sets before entering the commit phase. Bail out at the end of the transaction handling if an anonymous set remains unbound. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: disallow element updates of bound anonymous setsPablo Neira Ayuso
Anonymous sets come with NFT_SET_CONSTANT from userspace. Although API allows to create anonymous sets without NFT_SET_CONSTANT, it makes no sense to allow to add and to delete elements for bound anonymous sets. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: fix underflow in object reference counterPablo Neira Ayuso
Since ("netfilter: nf_tables: drop map element references from preparation phase"), integration with commit protocol is better, therefore drop the workaround that b91d90368837 ("netfilter: nf_tables: fix leaking object reference count") provides. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nft_set_pipapo: .walk does not deal with generationsPablo Neira Ayuso
The .walk callback iterates over the current active set, but it might be useful to iterate over the next generation set. Use the generation mask to determine what set view (either current or next generation) is use for the walk iteration. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: drop map element references from preparation phasePablo Neira Ayuso
set .destroy callback releases the references to other objects in maps. This is very late and it results in spurious EBUSY errors. Drop refcount from the preparation phase instead, update set backend not to drop reference counter from set .destroy path. Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the reference counter because the transaction abort path releases the map references for each element since the set is unbound. The abort path also deals with releasing reference counter for new elements added to unbound sets. Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>