summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-06block: better split mq vs non-mq code in add_disk_fwnodeChristoph Hellwig
Add a big conditional for blk-mq vs not mq at the beginning of add_disk_fwnode so that elevator_init_mq is only called for blk-mq disks, and add checks that the right methods or set or not set based on the queue type. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250106083531.799976-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-06block: add a dma mapping iteratorChristoph Hellwig
blk_rq_map_sg is maze of nested loops. Untangle it by creating an iterator that returns [paddr,len] tuples for DMA mapping, and then implement the DMA logic on top of this. This not only removes code at the source level, but also generates nicer binary code: $ size block/blk-merge.o.* text data bss dec hex filename 10001 432 0 10433 28c1 block/blk-merge.o.new 10317 468 0 10785 2a21 block/blk-merge.o.old Last but not least it will be used as a building block for a new DMA mapping helper that doesn't rely on struct scatterlist. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250106081609.798289-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-06block: use page_to_phys in bvec_physChristoph Hellwig
Use page_to_phys instead of open coding it now that it is available in an architecture independent way. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250106081437.798213-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-04block: remove blk_rq_bio_prepChristoph Hellwig
There is not real point in a helper just to assign three values to four fields, especially when the surrounding code is working on the neighbor fields directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20250103073417.459715-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-04block: remove bio_add_pc_pageChristoph Hellwig
Lift bio_split_rw_at into blk_rq_append_bio so that it validates the hardware limits. With this all passthrough callers can simply add bio_add_page to build the bio and delay checking for exceeding of limits to this point instead of doing it for each page. While this looks like adding a new expensive loop over all bio_vecs, blk_rq_append_bio is already doing that just to counter the number of segments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20250103073417.459715-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-03ps3disk: Do not use dev->bounce_size before it is setGeert Uytterhoeven
dev->bounce_size is only initialized after it is used to set the queue limits. Fix this by using BOUNCE_SIZE instead. Fixes: a7f18b74dbe17162 ("ps3disk: pass queue_limits to blk_mq_alloc_disk") Reported-by: Philipp Hortmann <philipp.g.hortmann@gmail.com> Closes: https://lore.kernel.org/39256db9-3d73-4e86-a49b-300dfd670212@gmail.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/06988f959ea6885b8bd7fb3b9059dd54bc6bbad7.1735894216.git.geert+renesas@glider.be Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-03block: retry call probe after request_module in blk_request_moduleYang Erkun
Set kernel config: CONFIG_BLK_DEV_LOOP=m CONFIG_BLK_DEV_LOOP_MIN_COUNT=0 Do latter: mknod loop0 b 7 0 exec 4<> loop0 Before commit e418de3abcda ("block: switch gendisk lookup to a simple xarray"), lookup_gendisk will first use base_probe to load module loop, and then the retry will call loop_probe to prepare the loop disk. Finally open for this disk will success. However, after this commit, we lose the retry logic, and open will fail with ENXIO. Block device autoloading is deprecated and will be removed soon, but maybe we should keep open success until we really remove it. So, give a retry to fix it. Fixes: e418de3abcda ("block: switch gendisk lookup to a simple xarray") Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241209110435.3670985-1-yangerkun@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-02kyber: constify sysfs attributesThomas Weißschuh
The elevator core now allows instances of 'struct elv_fs_entry' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250102-sysfs-const-attr-elevator-v1-4-9837d2058c60@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-02block, bfq: constify sysfs attributesThomas Weißschuh
The elevator core now allows instances of 'struct elv_fs_entry' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250102-sysfs-const-attr-elevator-v1-3-9837d2058c60@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-02block: mq-deadline: Constify sysfs attributesThomas Weißschuh
The elevator core now allows instances of 'struct elv_fs_entry' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250102-sysfs-const-attr-elevator-v1-2-9837d2058c60@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-02elevator: Enable const sysfs attributesThomas Weißschuh
The elevator core does not need to modify the sysfs attributes added by the elevators. Reflect this in the types, so the attributes can be moved into read-only memory. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250102-sysfs-const-attr-elevator-v1-1-9837d2058c60@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blk-zoned: Split queue_zone_wplugs_show()Bart Van Assche
Reduce the indentation level of the code in queue_zone_wplugs_show() by moving the body of the loop in that function into a new function. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241217210310.645966-5-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blk-zoned: Improve the queue reference count strategy documentationBart Van Assche
For the blk_queue_exit() calls, document where the corresponding code can be found that increases q->q_usage_counter. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241217210310.645966-4-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blk-zoned: Document locking assumptionsBart Van Assche
Document which functions expect that their callers must hold a lock. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241217210310.645966-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blk-zoned: Minimize #include directivesBart Van Assche
Only include those header files that are necessary. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241217210310.645966-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23rust: block: fix use of BLK_MQ_F_SHOULD_MERGEAndreas Hindborg
BLK_MQ_F_SHOULD_MERGE has was removed [1] and is now in effect by default. So remove the flag from tag sets of Rust block device drivers. Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de [1] Fixes: 9377b95cda73 ("block: remove BLK_MQ_F_SHOULD_MERGE") Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org> Link: https://lore.kernel.org/r/20241220-merge-flag-fix-v1-1-41b7778dac06@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: remove BLK_MQ_F_SHOULD_MERGEChristoph Hellwig
BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely process passthrough commands (bsg-lib, ufs tmf, various nvme admin queues) and thus don't even check the flag. Remove it to simplify the driver interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blk-mq: remove unused queue mapping helpersDaniel Wagner
There are no users left of the pci and virtio queue mapping helpers. Thus remove them. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-8-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23virtio: blk/scsi: replace blk_mq_virtio_map_queues with blk_mq_map_hw_queuesDaniel Wagner
Replace all users of blk_mq_virtio_map_queues with the more generic blk_mq_map_hw_queues. This in preparation to retire blk_mq_virtio_map_queues. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-7-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23nvme: replace blk_mq_pci_map_queues with blk_mq_map_hw_queuesDaniel Wagner
Replace all users of blk_mq_pci_map_queues with the more generic blk_mq_map_hw_queues. This in preparation to retire blk_mq_pci_map_queues. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-6-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23scsi: replace blk_mq_pci_map_queues with blk_mq_map_hw_queuesDaniel Wagner
Replace all users of blk_mq_pci_map_queues with the more generic blk_mq_map_hw_queues. This in preparation to retire blk_mq_pci_map_queues. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-5-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blk-mq: introduce blk_mq_map_hw_queuesDaniel Wagner
blk_mq_pci_map_queues and blk_mq_virtio_map_queues will create a CPU to hardware queue mapping based on affinity information. These two function share common code and only differ on how the affinity information is retrieved. Also, those functions are located in the block subsystem where it doesn't really fit in. They are virtio and pci subsystem specific. Thus introduce provide a generic mapping function which uses the irq_get_affinity callback from bus_type. Originally idea from Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-4-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23virtio: hookup irq_get_affinity callbackDaniel Wagner
struct bus_type has a new callback for retrieving the IRQ affinity for a device. Hook this callback up for virtio based devices. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-3-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23PCI: hookup irq_get_affinity callbackDaniel Wagner
struct bus_type has a new callback for retrieving the IRQ affinity for a device. Hook this callback up for PCI based devices. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-2-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23driver core: bus: add irq_get_affinity callback to bus_typeDaniel Wagner
Introducing a callback in struct bus_type so that a subsystem can hook up the getters directly. This approach avoids exposing random getters in any subsystems APIs. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-1-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23null_blk: Remove accesses to page->indexMatthew Wilcox (Oracle)
Use page->private to store the index instead of page->index. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241216160849.31739-1-willy@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: rnull: Initialize the module in placeBenoît du Garreau
Using `InPlaceModule` avoids an allocation and an indirection. Signed-off-by: Benoît du Garreau <benoit@dugarreau.fr> Acked-by: Andreas Hindborg <a.hindborg@kernel.org> Link: https://lore.kernel.org/r/20241204-rnull_in_place-v1-1-efe3eafac9fb@dugarreau.fr Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blktrace: remove redundant return at end of functionColin Ian King
A recent change added return 0 before an existing return statement at the end of function blk_trace_setup. The final return is now redundant, so remove it. Fixes: 64d124798244 ("blktrace: move copy_[to|from]_user() out of ->debugfs_lock") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20241204150450.399005-1-colin.i.king@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: Delete bio_set_prio()John Garry
Since commit 43b62ce3ff0a ("block: move bio io prio to a new field"), macro bio_set_prio() does nothing but set bio->bi_ioprio. All other places just set bio->bi_ioprio directly, so replace bio_set_prio() remaining callsites with setting bio->bi_ioprio directly and delete that macro. Signed-off-by: John Garry <john.g.garry@oracle.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241202111957.2311683-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: Delete bio_prio()John Garry
Since commit 43b62ce3ff0a ("block: move bio io prio to a new field"), macro bio_prio() does nothing but return the value in bio->bi_ioprio. Most other places just read bio->bi_ioprio directly, so replace bi_ioprio() callsites with reading bio->bi_ioprio directly and delete that macro. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241202111957.2311683-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blktrace: move copy_[to|from]_user() out of ->debugfs_lockMing Lei
Move copy_[to|from]_user() out of ->debugfs_lock and cut the dependency between mm->mmap_lock and q->debugfs_lock, then we avoids lots of lockdep false positive warning. Obviously ->debug_lock isn't needed for copy_[to|from]_user(). The only behavior change is to call blk_trace_remove() in case of setup failure handling by re-grabbing ->debugfs_lock, and this way is just fine since we do cover concurrent setup() & remove(). Reported-by: syzbot+91585b36b538053343e4@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-block/67450fd4.050a0220.1286eb.0007.GAE@google.com/ Closes: https://lore.kernel.org/linux-block/6742e584.050a0220.1cc393.0038.GAE@google.com/ Closes: https://lore.kernel.org/linux-block/6742a600.050a0220.1cc393.002e.GAE@google.com/ Closes: https://lore.kernel.org/linux-block/67420102.050a0220.1cc393.0019.GAE@google.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241128125029.4152292-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blktrace: don't centralize grabbing q->debugfs_mutex in blk_trace_ioctlMing Lei
Call each handler directly and the handler do grab q->debugfs_mutex, prepare for killing dependency between ->debug_mutex and ->mmap_lock. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241128125029.4152292-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23null_blk: Add rotational feature supportDamien Le Moal
To facilitate testing of kernel functions related to the rotational feature (BLK_FEAT_ROTATIONAL) of a block device (e.g. NVMe rotational bit support), add the rotational boolean configfs attribute and module parameter to the null_blk driver. If set, a null block device will report being a rotational device through it queue limits features with the BLK_FEAT_ROTATIONAL flag. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20241126000956.95983-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: track queue dying state automatically for modeling queue freeze lockdepMing Lei
Now we only verify the outmost freeze & unfreeze in current context in case that !q->mq_freeze_depth, so it is reliable to save queue lying state when we want to lock the freeze queue since the state is one per-task variable now. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: don't verify queue freeze manually in elevator_init_mq()Ming Lei
Now blk_freeze_queue_start() can track disk state automatically, and it isn't necessary to verify queue freeze manually in elevator_init_mq() any more. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: track disk DEAD state automatically for modeling queue freeze lockdepMing Lei
Now we only verify the outmost freeze & unfreeze in current context in case that !q->mq_freeze_depth, so it is reliable to save disk DEAD state when we want to lock the freeze queue since the state is one per-task variable now. Doing this way can kill lots of false positive when freeze queue is called before adding disk[1]. [1] https://lore.kernel.org/linux-block/6741f6b2.050a0220.1cc393.0017.GAE@google.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: remove unnecessary check in blk_unfreeze_check_owner()Ming Lei
The following check of 'q->mq_freeze_owner != current' covers the previous one, so remove the unnecessary check. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-22Linux 6.13-rc4v6.13-rc4Linus Torvalds
2024-12-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM x86 fixes from Paolo Bonzini: - Disable AVIC on SNP-enabled systems that don't allow writes to the virtual APIC page, as such hosts will hit unexpected RMP #PFs in the host when running VMs of any flavor. - Fix a WARN in the hypercall completion path due to KVM trying to determine if a guest with protected register state is in 64-bit mode (KVM's ABI is to assume such guests only make hypercalls in 64-bit mode). - Allow the guest to write to supported bits in MSR_AMD64_DE_CFG to fix a regression with Windows guests, and because KVM's read-only behavior appears to be entirely made up. - Treat TDP MMU faults as spurious if the faulting access is allowed given the existing SPTE. This fixes a benign WARN (other than the WARN itself) due to unexpectedly replacing a writable SPTE with a read-only SPTE. - Emit a warning when KVM is configured with ignore_msrs=1 and also to hide the MSRs that the guest is looking for from the kernel logs. ignore_msrs can trick guests into assuming that certain processor features are present, and this in turn leads to bogus bug reports. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: let it be known that ignore_msrs is a bad idea KVM: VMX: don't include '<linux/find.h>' directly KVM: x86/mmu: Treat TDP MMU faults as spurious if access is already allowed KVM: SVM: Allow guest writes to set MSR_AMD64_DE_CFG bits KVM: x86: Play nice with protected guests in complete_hypercall_exit() KVM: SVM: Disable AVIC on SNP-enabled system without HvInUseWrAllowed feature
2024-12-22Merge tag 'kvm-x86-fixes-6.13-rcN' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 fixes for 6.13: - Disable AVIC on SNP-enabled systems that don't allow writes to the virtual APIC page, as such hosts will hit unexpected RMP #PFs in the host when running VMs of any flavor. - Fix a WARN in the hypercall completion path due to KVM trying to determine if a guest with protected register state is in 64-bit mode (KVM's ABI is to assume such guests only make hypercalls in 64-bit mode). - Allow the guest to write to supported bits in MSR_AMD64_DE_CFG to fix a regression with Windows guests, and because KVM's read-only behavior appears to be entirely made up. - Treat TDP MMU faults as spurious if the faulting access is allowed given the existing SPTE. This fixes a benign WARN (other than the WARN itself) due to unexpectedly replacing a writable SPTE with a read-only SPTE.
2024-12-22KVM: x86: let it be known that ignore_msrs is a bad ideaPaolo Bonzini
When running KVM with ignore_msrs=1 and report_ignored_msrs=0, the user has no clue that that the guest is being lied to. This may cause bug reports such as https://gitlab.com/qemu-project/qemu/-/issues/2571, where enabling a CPUID bit in QEMU caused Linux guests to try reading MSR_CU_DEF_ERR; and being lied about the existence of MSR_CU_DEF_ERR caused the guest to assume other things about the local APIC which were not true: Sep 14 12:02:53 kernel: mce: [Firmware Bug]: Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly. Sep 14 12:02:53 kernel: unchecked MSR access error: RDMSR from 0x852 at rIP: 0xffffffffb548ffa7 (native_read_msr+0x7/0x40) Sep 14 12:02:53 kernel: Call Trace: ... Sep 14 12:02:53 kernel: native_apic_msr_read+0x20/0x30 Sep 14 12:02:53 kernel: setup_APIC_eilvt+0x47/0x110 Sep 14 12:02:53 kernel: mce_amd_feature_init+0x485/0x4e0 ... Sep 14 12:02:53 kernel: [Firmware Bug]: cpu 0, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu Without reported_ignored_msrs=0 at least the host kernel log will contain enough information to avoid going on a wild goose chase. But if reports about individual MSR accesses are being silenced too, at least complain loudly the first time a VM is started. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22KVM: VMX: don't include '<linux/find.h>' directlyWolfram Sang
The header clearly states that it does not want to be included directly, only via '<linux/bitmap.h>'. Replace the include accordingly. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Message-ID: <20241217070539.2433-2-wsa+renesas@sang-engineering.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22Merge tag 'devicetree-fixes-for-6.13-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Disable #address-cells/#size-cells warning on coreboot (Chromebooks) platforms - Add missing root #address-cells/#size-cells in default empty DT - Fix uninitialized variable in of_irq_parse_one() - Fix interrupt-map cell length check in of_irq_parse_imap_parent() - Fix refcount handling in __of_get_dma_parent() - Fix error path in of_parse_phandle_with_args_map() - Fix dma-ranges handling with flags cells - Drop explicit fw_devlink handling of 'interrupt-parent' - Fix "compression" typo in fixed-partitions binding - Unify "fsl,liodn" property type definitions * tag 'devicetree-fixes-for-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: Add coreboot firmware to excluded default cells list of/irq: Fix using uninitialized variable @addr_len in API of_irq_parse_one() of/irq: Fix interrupt-map cell length check in of_irq_parse_imap_parent() of: Fix refcount leakage for OF node returned by __of_get_dma_parent() of: Fix error path in of_parse_phandle_with_args_map() dt-bindings: mtd: fixed-partitions: Fix "compression" typo of: Add #address-cells/#size-cells in the device-tree root empty node dt-bindings: Unify "fsl,liodn" type definitions of: address: Preserve the flags portion on 1:1 dma-ranges mapping of/unittest: Add empty dma-ranges address translation tests of: property: fw_devlink: Do not use interrupt-parent directly
2024-12-21Merge tag 'soc-fixes-6.13-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "Two more small fixes, correcting the cacheline size on Raspberry Pi 5 and fixing a logic mistake in the microchip mpfs firmware driver" * tag 'soc-fixes-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: dts: broadcom: Fix L2 linesize for Raspberry Pi 5 firmware: microchip: fix UL_IAP lock check in mpfs_auto_update_state()
2024-12-21Merge tag 'mm-hotfixes-stable-2024-12-21-12-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "25 hotfixes. 16 are cc:stable. 19 are MM and 6 are non-MM. The usual bunch of singletons and doubletons - please see the relevant changelogs for details" * tag 'mm-hotfixes-stable-2024-12-21-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits) mm: huge_memory: handle strsep not finding delimiter alloc_tag: fix set_codetag_empty() when !CONFIG_MEM_ALLOC_PROFILING_DEBUG alloc_tag: fix module allocation tags populated area calculation mm/codetag: clear tags before swap mm/vmstat: fix a W=1 clang compiler warning mm: convert partially_mapped set/clear operations to be atomic nilfs2: fix buffer head leaks in calls to truncate_inode_pages() vmalloc: fix accounting with i915 mm/page_alloc: don't call pfn_to_page() on possibly non-existent PFN in split_large_buddy() fork: avoid inappropriate uprobe access to invalid mm nilfs2: prevent use of deleted inode zram: fix uninitialized ZRAM not releasing backing device zram: refuse to use zero sized block device as backing device mm: use clear_user_(high)page() for arch with special user folio handling mm: introduce cpu_icache_is_aliasing() across all architectures mm: add RCU annotation to pte_offset_map(_lock) mm: correctly reference merged VMA mm: use aligned address in copy_user_gigantic_page() mm: use aligned address in clear_gigantic_page() mm: shmem: fix ShmemHugePages at swapout ...
2024-12-21staging: gpib: Fix allyesconfig build failuresSteven Rostedt
My tests run an allyesconfig build and it failed with the following errors: LD [M] samples/kfifo/dma-example.ko ld.lld: error: undefined symbol: nec7210_board_reset ld.lld: error: undefined symbol: nec7210_read ld.lld: error: undefined symbol: nec7210_write It appears that some modules call the function nec7210_board_reset() that is defined in nec7210.c. In an allyesconfig build, these other modules are built in. But the file that holds nec7210_board_reset() has: obj-m += nec7210.o Where that "-m" means it only gets built as a module. With the other modules built in, they have no access to nec7210_board_reset() and the build fails. This isn't the only function. After fixing that one, I hit another: ld.lld: error: undefined symbol: push_gpib_event ld.lld: error: undefined symbol: gpib_match_device_path Where push_gpib_event() was also used outside of the file it was defined in, and that file too only was built as a module. Since the directory that nec7210.c is only traversed when CONFIG_GPIB_NEC7210 is set, and the directory with gpib_common.c is only traversed when CONFIG_GPIB_COMMON is set, use those configs as the option to build those modules. When it is an allyesconfig, then they will both be built in and their functions will be available to the other modules that are also built in. Fixes: 3ba84ac69b53e ("staging: gpib: Add nec7210 GPIB chip driver") Fixes: 9dde4559e9395 ("staging: gpib: Add GPIB common core driver") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-21Merge tag 'kbuild-fixes-v6.13-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Remove stale code in usr/include/headers_check.pl - Fix issues in the user-mode-linux Debian package - Fix false-positive "export twice" errors in modpost * tag 'kbuild-fixes-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: modpost: distinguish same module paths from different dump files kbuild: deb-pkg: Do not install maint scripts for arch 'um' kbuild: deb-pkg: add debarch for ARCH=um kbuild: Drop support for include/asm-<arch> in headers_check.pl
2024-12-21Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull BPF fixes from Daniel Borkmann: - Fix inlining of bpf_get_smp_processor_id helper for !CONFIG_SMP systems (Andrea Righi) - Fix BPF USDT selftests helper code to use asm constraint "m" for LoongArch (Tiezhu Yang) - Fix BPF selftest compilation error in get_uprobe_offset when PROCMAP_QUERY is not defined (Jerome Marchand) - Fix BPF bpf_skb_change_tail helper when used in context of BPF sockmap to handle negative skb header offsets (Cong Wang) - Several fixes to BPF sockmap code, among others, in the area of socket buffer accounting (Levi Zim, Zijian Zhang, Cong Wang) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test bpf_skb_change_tail() in TC ingress selftests/bpf: Introduce socket_helpers.h for TC tests selftests/bpf: Add a BPF selftest for bpf_skb_change_tail() bpf: Check negative offsets in __bpf_skb_min_len() tcp_bpf: Fix copied value in tcp_bpf_sendmsg skmsg: Return copied bytes in sk_msg_memcopy_from_iter tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress() selftests/bpf: Fix compilation error in get_uprobe_offset() selftests/bpf: Use asm constraint "m" for LoongArch bpf: Fix bpf_get_smp_processor_id() on !CONFIG_SMP
2024-12-21Merge tag 'media/v6.13-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - fix a clang build issue with mediatec vcodec - add missing variable initialization to dib3000mb write function * tag 'media/v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: mediatek: vcodec: mark vdec_vp9_slice_map_counts_eob_coef noinline media: dvb-frontends: dib3000mb: fix uninit-value in dib3000_write_reg
2024-12-21Merge tag 'pci-v6.13-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI fixes from Krzysztof Wilczyński: "Two small patches that are important for fixing boot time hang on Intel JHL7540 'Titan Ridge' platforms equipped with a Thunderbolt controller. The boot time issue manifests itself when a PCI Express bandwidth control is unnecessarily enabled on the Thunderbolt controller downstream ports, which only supports a link speed of 2.5 GT/s in accordance with USB4 v2 specification (p. 671, sec. 11.2.1, "PCIe Physical Layer Logical Sub-block"). As such, there is no need to enable bandwidth control on such downstream port links, which also works around the issue. Both patches were tested by the original reporter on the hardware on which the failure origin golly manifested itself. Both fixes were proven to resolve the reported boot hang issue, and both patches have been in linux-next this week with no reported problems" * tag 'pci-v6.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/bwctrl: Enable only if more than one speed is supported PCI: Honor Max Link Speed when determining supported speeds