summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-09bcachefs: Dropped superblock write is no longer a fatal errorKent Overstreet
Just emit a warning if errors=continue or fix_safe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_trans_node_drop()Kent Overstreet
Factor out a small common helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_trans_unlock_write()Kent Overstreet
New helper for dropping all write locks; which is distinct from the helper the transaction commit path uses, which is faster and only touches updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: btree_node_unlock() can now drop write locksKent Overstreet
Prep work for reworking btree node locking during interior btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: six locks: write locks can now be held recursivelyKent Overstreet
This is needed for the interior update locking rework, where we'll be holding node write locks for the duration of the update - which is needed for synchronizing with online check_allocations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_fs_btree_gc_init()Kent Overstreet
Now returns errors, prep work for check_allocations_done_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Assert that btree write buffer only touches the right btreesKent Overstreet
More asserts, more better. Also, clean up the per-btree flags a bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_inum_path() now crosses subvolumes correctlyKent Overstreet
The dirent that points to a subvolume root is in the parent subvolume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_inum_path() no longer returns an error for disconnected inumsKent Overstreet
bch2_inum_path() should work even if the filesystem is corrupted - we don't want it to cause fsck to fail. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: btree_path_very_locks(): verify lock seqKent Overstreet
If the btree_path's lock seq is wrong, the next bch2_trans_relock() operation is guaranteed to fail and we take an unnecessary transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: fix bch2_btree_key_cache_drop()Kent Overstreet
When evicting, we shouldn't leave a pointer to the key cache entry lying around - that screws up btree path asserts we're adding. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_btree_node_write_trans()Kent Overstreet
Avoiding screwing up path->lock_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Fixes for snapshot_tree.master_subvolKent Overstreet
Ensure that snapshot_tree.master_subvol is cleared when we delete the master subvolume in a tree of snapshots, and allow for snapshot trees that don't have a master subvolume in fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Don't rely on snapshot_tree.master_subvol for reattachingKent Overstreet
Previously, fsck used the snapshot tree's master subvol for finding the root inode number - but the master subvol might have been deleting, and setting a new one should be a user operation; meaning we can't rely on it existing. Fortunately, for finding the root inode number in a tree of snapshots, finding any associated subvolume works. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_kvmalloc()Kent Overstreet
Add a version of kvmalloc() that doesn't have the INT_MAX limit; large filesystems do hit this. We'll want to get rid of the in-memory bucket gens array, but we're not there quite yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Fix assert for online fsckKent Overstreet
We can't check if we're racing with fsck ending until mark_lock is held. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Handle -BCH_ERR_need_mark_replicas in gcKent Overstreet
Locking considerations (possibly no longer relevant?) mean that when an accounting update needs a new superblock replicas entry to be created, it's deferred to the transaction commit error path. But accounting updates for gc/fcsk aren't done from the transaction commit path - so we need to handle -BCH_ERR_btree_insert_need_mark_replicas locally. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Write lock btree node in key cache fillsKent Overstreet
this addresses a key cache coherency bug Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: kill __bch2_btree_iter_flags()Kent Overstreet
bch2_btree_iter_flags() now takes a level parameter; this fixes a bug where using a node iterator on a leaf wouldn't set BTREE_ITER_with_key_cache, leading to fun cache coherency bugs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Drop redundant "read error" call from btree_gcKent Overstreet
The btree node read error path already calls topology error, so this is entirely redundant, and we're not specific enough about our error codes - this was triggering for bucket_ref_update() errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Drop racy warningKent Overstreet
Checking for writing past i_size after unlocking the folio and clearing the dirty bit is racy, and we already check it at the start. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: better check_bp_exists() error messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: add counter_flags for countersHongbo Li
In bcachefs, io_read and io_write counter record the amount of data which has been read and written. They increase in unit of sector, so to display correctly, they need to be shifted to the left by the size of a sector. Other counters like io_move, move_extent_{read, write, finish} also have this problem. In order to support different unit, we add extra column to mark the counter type by using TYPE_COUNTER and TYPE_SECTORS in BCH_PERSISTENT_COUNTERS(). Fixes: 1c6fdbd8f246 ("bcachefs: Initial commit") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bcachefs_metadata_version_autofix_errorsKent Overstreet
It's time to make self healing the default: change the error action for old filesystems to fix_safe, matching the default for current filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bcachefs_metadata_version_persistent_inode_cursorsKent Overstreet
Persistent cursors for inode allocation. A free inodes btree would add substantial overhead to inode allocation and freeing - a "next num to allocate" cursor is always going to be faster. We just need it to be persistent, to avoid scanning the inodes btree from the start on startup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-10Merge tag 'v6.13-rc6' into drm-nextDave Airlie
This backmerges Linux 6.13-rc6 this is need for the newer pulls. Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-01-09Merge tag '6.13-rc6-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Four ksmbd server fixes, most also for stable: - fix for reporting special file type more accurately when POSIX extensions negotiated - minor cleanup - fix possible incorrect creation path when dirname is not present. In some cases, Windows apps create files without checking if they exist. - fix potential NULL pointer dereference sending interim response" * tag '6.13-rc6-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: Implement new SMB3 POSIX type ksmbd: fix unexpectedly changed path in ksmbd_vfs_kern_path_locked ksmbd: Remove unneeded if check in ksmbd_rdma_capable_netdev() ksmbd: fix a missing return value check bug
2025-01-09net: phy: micrel: use helper phy_disable_eeeHeiner Kallweit
Use helper phy_disable_eee() instead of setting phylib-internal bitmap eee_broken_modes directly. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/5e19eebe-121e-4a41-b36d-a35631279dd8@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09Merge branch 'netconsole-selftest-for-userdata-overflow'Jakub Kicinski
Breno Leitao says: ==================== netconsole: selftest for userdata overflow Implement comprehensive testing for netconsole userdata entry handling, demonstrating correct behavior when creating maximum entries and preventing unauthorized overflow. Refactor existing test infrastructure to support modular, reusable helper functions that validate strict entry limit enforcement. Also, add a warning if update_userdata() sees more than MAX_USERDATA_ITEMS entries. This shouldn't happen and it is a bug that shouldn't be silently ignored. v2: https://lore.kernel.org/20250103-netcons_overflow_test-v2-0-a49f9be64c21@debian.org v1: https://lore.kernel.org/20241204-netcons_overflow_test-v1-0-a85a8d0ace21@debian.org ==================== Link: https://patch.msgid.link/20250108-netcons_overflow_test-v3-0-3d85eb091bec@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09netconsole: selftest: verify userdata entry limitBreno Leitao
Add a new selftest for netconsole that tests the userdata entry limit functionality. The test performs two key verifications: 1. Create MAX_USERDATA_ITEMS (16) userdata entries successfully 2. Confirm that attempting to create an additional userdata entry fails The selftest script uses the netcons library and checks the behavior by attempting to create entries beyond the maximum allowed limit. Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250108-netcons_overflow_test-v3-4-3d85eb091bec@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09netconsole: selftest: Delete all userdata keysBreno Leitao
Modify the cleanup function to remove all userdata keys created during the test, instead of just deleting a single predefined key. This ensures a more thorough cleanup of temporary resources. Move the KEY_PATH variable definition inside the set_user_data function to reduce global variables and improve encapsulation. The KEY_PATH variable is now dynamically created when setting user data. This change has no effect on the current test, while improving an upcoming test that would create several userdata entries. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250108-netcons_overflow_test-v3-3-3d85eb091bec@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09netconsole: selftest: Split the helpers from the selftestBreno Leitao
Split helper functions from the netconsole basic test into a separate library file to enable reuse across different netconsole tests. This change only moves the existing helper functions to lib/sh/lib_netcons.sh while preserving the same test functionality. The helpers provide common functions for: - Setting up network namespaces and interfaces - Managing netconsole dynamic targets - Setting user data - Handling test dependencies - Cleanup operations Do not make any change in the code, other than the mechanical separation. Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250108-netcons_overflow_test-v3-2-3d85eb091bec@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09netconsole: Warn if MAX_USERDATA_ITEMS limit is exceededBreno Leitao
netconsole configfs helpers doesn't allow the creation of more than MAX_USERDATA_ITEMS items. Add a warning when netconsole userdata update function attempts sees more than MAX_USERDATA_ITEMS entries. Replace silent ignore mechanism with WARN_ON_ONCE() to highlight potential misuse during development and debugging. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250108-netcons_overflow_test-v3-1-3d85eb091bec@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09ipv4: route: fix drop reason being overridden in ip_route_input_slowAntoine Tenart
When jumping to 'martian_destination' a drop reason is always set but that label falls-through the 'e_nobufs' one, overriding the value. The behavior was introduced by the mentioned commit. The logic went from, goto martian_destination; ... martian_destination: ... e_inval: err = -EINVAL; goto out; e_nobufs: err = -ENOBUFS; goto out; to, reason = ...; goto martian_destination; ... martian_destination: ... e_nobufs: reason = SKB_DROP_REASON_NOMEM; goto out; A 'goto out' is clearly missing now after 'martian_destination' to avoid overriding the drop reason. Fixes: 5b92112acd8e ("net: ip: make ip_route_input_slow() return drop reasons") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Cc: Menglong Dong <menglong8.dong@gmail.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250108165725.404564-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()Sudheer Kumar Doredla
CPSW ALE has 75-bit ALE entries stored across three 32-bit words. The cpsw_ale_get_field() and cpsw_ale_set_field() functions support ALE field entries spanning up to two words at the most. The cpsw_ale_get_field() and cpsw_ale_set_field() functions work as expected when ALE field spanned across word1 and word2, but fails when ALE field spanned across word2 and word3. For example, while reading the ALE field spanned across word2 and word3 (i.e. bits 62 to 64), the word3 data shifted to an incorrect position due to the index becoming zero while flipping. The same issue occurred when setting an ALE entry. This issue has not been seen in practice but will be an issue in the future if the driver supports accessing ALE fields spanning word2 and word3 Fix the methods to handle getting/setting fields spanning up to two words. Fixes: b685f1a58956 ("net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()") Signed-off-by: Sudheer Kumar Doredla <s-doredla@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Link: https://patch.msgid.link/20250108172433.311694-1-s-doredla@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS requestXiang Zhang
The ISCSI_UEVENT_GET_HOST_STATS request is already handled in iscsi_get_host_stats(). This fix ensures that redundant responses are skipped in iscsi_if_rx(). - On success: send reply and stats from iscsi_get_host_stats() within if_recv_msg(). - On error: fall through. Signed-off-by: Xiang Zhang <hawkxiang.cpp@gmail.com> Link: https://lore.kernel.org/r/20250107022432.65390-1-hawkxiang.cpp@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-01-09scsi: core: Fix command pass through retry regressionMike Christie
scsi_check_passthrough() is always called, but it doesn't check for if a command completed successfully. As a result, if a command was successful and the caller used SCMD_FAILURE_RESULT_ANY to indicate what failures it wanted to retry, we will end up retrying the command. This will cause delays during device discovery because of the command being sent multiple times. For some USB devices it can also cause the wrong device size to be used. This patch adds a check for if the command was successful. If it is we return immediately instead of trying to match a failure. Fixes: 994724e6b3f0 ("scsi: core: Allow passthrough to request midlayer retries") Reported-by: Kris Karas <bugs-a21@moonlit-rail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219652 Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20250107010220.7215-1-michael.christie@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-01-10Merge tag 'drm-etnaviv-next-2025-01-08' of ↵Dave Airlie
https://git.pengutronix.de/git/lst/linux into drm-next - cleanups - add fdinfo memory support - add explicit reset handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas Stach <l.stach@pengutronix.de> Link: https://patchwork.freedesktop.org/patch/msgid/41c1e476c6014010247d164ac8d21bd6f922cce1.camel@pengutronix.de
2025-01-10hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id()Roman Kisel
The Top-Level Functional Specification for Hyper-V, Section 3.6 [1, 2], disallows overlapping of the input and output hypercall areas, and hv_vtl_apicid_to_vp_id() overlaps them. Use the output hypercall page of the current vCPU for the hypercall. [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface [2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs Reported-by: Michael Kelley <mhklinux@outlook.com> Closes: https://lore.kernel.org/lkml/SN6PR02MB4157B98CD34781CC87A9D921D40D2@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Link: https://lore.kernel.org/r/20250108222138.1623703-6-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250108222138.1623703-6-romank@linux.microsoft.com>
2025-01-10hyperv: Do not overlap the hvcall IO areas in get_vtl()Roman Kisel
The Top-Level Functional Specification for Hyper-V, Section 3.6 [1, 2], disallows overlapping of the input and output hypercall areas, and get_vtl(void) does overlap them. Use the output hypercall page of the current vCPU for the hypercall. [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface [2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs Fixes: 8387ce06d70b ("x86/hyperv: Set Virtual Trust Level in VMBus init message") Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/20250108222138.1623703-5-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250108222138.1623703-5-romank@linux.microsoft.com>
2025-01-10hyperv: Enable the hypercall output page for the VTL modeRoman Kisel
Due to the hypercall page not being allocated in the VTL mode, the code resorts to using a part of the input page. Allocate the hypercall output page in the VTL mode thus enabling it to use it for output and share code with dom0. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Link: https://lore.kernel.org/r/20250108222138.1623703-4-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250108222138.1623703-4-romank@linux.microsoft.com>
2025-01-10hv_balloon: Fallback to generic_online_page() for non-HV hot added memJacob Pan
The Hyper-V balloon driver installs a custom callback for handling page onlining operations performed by the memory hotplug subsystem. This custom callback is global, and overrides the default callback (generic_online_page) that Linux otherwise uses. The custom callback properly handles memory that is hot-added by the balloon driver as part of a Hyper-V hot-add region. But memory can also be hot-added directly by a device driver for a vPCI device, particularly GPUs. In such a case, the custom callback installed by the balloon driver runs, but won't find the page in its hot-add region list and doesn't online it, which could cause driver initialization failures. Fix this by having the balloon custom callback run generic_online_page() when the page isn't part of a Hyper-V hot-add region, thereby doing the default Linux behavior. This allows device driver hot-adds to work properly. Similar cases are handled the same way in the virtio-mem driver. Suggested-by: Vikram Sethi <vsethi@nvidia.com> Tested-by: Michael Frohlich <mfrohlich@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> Link: https://lore.kernel.org/r/20250107180918.1053933-1-jacob.pan@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250107180918.1053933-1-jacob.pan@linux.microsoft.com>
2025-01-10Drivers: hv: vmbus: Log on missing offers if anyJohn Starks
When resuming from hibernation, log any channels that were present before hibernation but now are gone. In general, the boot-time devices configured for a resuming VM should be the same as the devices in the VM at the time of hibernation. It's uncommon for the configuration to have been changed such that offers are missing. Changing the configuration violates the rules for hibernation anyway. The cleanup of missing channels is not straight-forward and dependent on individual device driver functionality and implementation, so it can be added in future with separate changes. Signed-off-by: John Starks <jostarks@microsoft.com> Co-developed-by: Naman Jain <namjain@linux.microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250102130712.1661-3-namjain@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250102130712.1661-3-namjain@linux.microsoft.com>
2025-01-10Drivers: hv: vmbus: Wait for boot-time offers during boot and resumeNaman Jain
Channel offers are requested during VMBus initialization and resume from hibernation. Add support to wait for all boot-time channel offers to be delivered and processed before returning from vmbus_request_offers. This is in analogy to a PCI bus not returning from probe until it has scanned all devices on the bus. Without this, user mode can race with VMBus initialization and miss channel offers. User mode has no way to work around this other than sleeping for a while, since there is no way to know when VMBus has finished processing boot-time offers. With this added functionality, remove earlier logic which keeps track of count of offered channels post resume from hibernation. Once all offers delivered message is received, no further boot-time offers are going to be received. Consequently, logic to prevent suspend from happening after previous resume had missing offers, is also removed. Co-developed-by: John Starks <jostarks@microsoft.com> Signed-off-by: John Starks <jostarks@microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250102130712.1661-2-namjain@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250102130712.1661-2-namjain@linux.microsoft.com>
2025-01-10uio_hv_generic: Add a check for HV_NIC for send, receive buffers setupNaman Jain
Receive and send buffer allocation was originally introduced to support DPDK's networking use case. These buffer sizes were further increased to meet DPDK performance requirements. However, these large buffers are unnecessary for any other UIO use cases. Restrict the allocation of receive and send buffers only for HV_NIC device type, saving 47 MB of memory per device. While at it, fix some of the syntax related issues in the touched code which are reported by "--strict" option of checkpatch. Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250102145243.2088-1-namjain@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250102145243.2088-1-namjain@linux.microsoft.com>
2025-01-10iommu/hyper-v: Don't assume cpu_possible_mask is denseMichael Kelley
Current code gets the APIC IDs for CPUs numbered 255 and lower. This code assumes cpu_possible_mask is dense, which is not true in the general case per [1]. If cpu_possible_mask contains holes, num_possible_cpus() is less than nr_cpu_ids, so some CPUs might get skipped. Furthermore, getting the APIC ID of a CPU that isn't in cpu_possible_mask is invalid. However, the configurations that Hyper-V provides to guest VMs on x86 hardware, in combination with how x86 code assigns Linux CPU numbers, *does* always produce a dense cpu_possible_mask. So the dense assumption is not currently causing failures. But for robustness against future changes in how cpu_possible_mask is populated, update the code to no longer assume dense. The correct approach is to determine the range to scan based on nr_cpu_ids, and skip any CPUs that are not in the cpu_possible_mask. [1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241003035333.49261-4-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241003035333.49261-4-mhklinux@outlook.com>
2025-01-10Drivers: hv: Don't assume cpu_possible_mask is denseMichael Kelley
Current code allocates the hv_vp_index array with size num_possible_cpus(). This code assumes cpu_possible_mask is dense, which is not true in the general case per [1]. If cpu_possible_mask is sparse, the array might be indexed by a value beyond the size of the array. However, the configurations that Hyper-V provides to guest VMs on x86 and ARM64 hardware, in combination with how architecture specific code assigns Linux CPU numbers, *does* always produce a dense cpu_possible_mask. So the dense assumption is not currently causing failures. But for robustness against future changes in how cpu_possible_mask is populated, update the code to no longer assume dense. The correct approach is to allocate and initialize the array using size "nr_cpu_ids". While this leaves unused array entries corresponding to holes in cpu_possible_mask, the holes are assumed to be minimal and hence the amount of memory wasted by unused entries is minimal. Using nr_cpu_ids also reduces initialization time, in that the loop to initialize the array currently rescans cpu_possible_mask on each iteration. This is n-squared in the number of CPUs, which could be significant for large CPU counts. [1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241003035333.49261-3-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241003035333.49261-3-mhklinux@outlook.com>
2025-01-10x86/hyperv: Don't assume cpu_possible_mask is denseMichael Kelley
Current code allocates the hv_vp_assist_page array with size num_possible_cpus(). This code assumes cpu_possible_mask is dense, which is not true in the general case per [1]. If cpu_possible_mask is sparse, the array might be indexed by a value beyond the size of the array. However, the configurations that Hyper-V provides to guest VMs on x86 hardware, in combination with how x86 code assigns Linux CPU numbers, *does* always produce a dense cpu_possible_mask. So the dense assumption is not currently causing failures. But for robustness against future changes in how cpu_possible_mask is populated, update the code to no longer assume dense. The correct approach is to allocate the array with size "nr_cpu_ids". While this leaves unused array entries corresponding to holes in cpu_possible_mask, the holes are assumed to be minimal and hence the amount of memory wasted by unused entries is minimal. [1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241003035333.49261-2-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241003035333.49261-2-mhklinux@outlook.com>
2025-01-10hyperv: Remove the now unused hyperv-tlfs.h filesNuno Das Neves
Remove all hyperv-tlfs.h files. These are no longer included anywhere. hyperv/hvhdk.h serves the same role, but with an easier path for adding new definitions. Remove the relevant lines in MAINTAINERS. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/1732577084-2122-6-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1732577084-2122-6-git-send-email-nunodasneves@linux.microsoft.com>
2025-01-10hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.hNuno Das Neves
Switch to using hvhdk.h everywhere in the kernel. This header includes all the new Hyper-V headers in include/hyperv, which form a superset of the definitions found in hyperv-tlfs.h. This makes it easier to add new Hyper-V interfaces without being restricted to those in the TLFS doc (reflected in hyperv-tlfs.h). To be more consistent with the original Hyper-V code, the names of some definitions are changed slightly. Update those where needed. Update comments in mshyperv.h files to point to include/hyperv for adding new definitions. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/1732577084-2122-5-git-send-email-nunodasneves@linux.microsoft.com Link: https://lore.kernel.org/r/20250108222138.1623703-3-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>