summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-12drm/xe: Fix possible UAF in guc_exec_queue_process_msgMatthew Brost
Store xe_device ahead of processing message as message can be free'd in some cases. v2: - Including missing local changes v3: - Resend for CI Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202407231445.rpisd1vA-lkp@intel.com/ Fixes: 55ea73aacfb9 ("drm/xe: Build PM into GuC CT layer") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240724164341.1848954-1-matthew.brost@intel.com (cherry picked from commit 1a394b4f504f33eac8c38b6f42ba025105c7e869) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12drm/xe: Remove fence check from send_tlb_invalidationMatthew Brost
'fence' argument in send_tlb_invalidation cannot be NULL, remove non-NULL check from send_tlb_invalidation. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202407231049.esig0Fkb-lkp@intel.com/ Fixes: 58bfe6674467 ("drm/xe: Drop xe_gt_tlb_invalidation_wait") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240723190714.1744653-1-matthew.brost@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit 6482253e6e1ad1c3a76645a3899d3cfdb5b918cb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12drm/xe/gt: Remove double includeLucas De Marchi
The header generated/xe_wa_oob.h is included twice. Remove one. Fixes: 27cb2b7fec2a ("drm/xe/bmg: implement Wa_16023588340") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202407052122.AzuWSPuo-lkp@intel.com/ Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240708173301.1543871-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 3d122660dc70029d9cccb4e8670125f0affa959e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-12net: netfilter: move nf flowtable bpf initialization in ↵Lorenzo Bianconi
nf_flow_table_module_init() Move nf flowtable bpf initialization in nf_flow_table module load routine since nf_flow_table_bpf is part of nf_flow_table module and not nf_flow_table_inet one. This patch allows to avoid the following kernel warning running the reproducer below: $modprobe nf_flow_table_inet $rmmod nf_flow_table_inet $modprobe nf_flow_table_inet modprobe: ERROR: could not insert 'nf_flow_table_inet': Invalid argument [ 184.081501] ------------[ cut here ]------------ [ 184.081527] WARNING: CPU: 0 PID: 1362 at kernel/bpf/btf.c:8206 btf_populate_kfunc_set+0x23c/0x330 [ 184.081550] CPU: 0 UID: 0 PID: 1362 Comm: modprobe Kdump: loaded Not tainted 6.11.0-0.rc5.22.el10.x86_64 #1 [ 184.081553] Hardware name: Red Hat OpenStack Compute, BIOS 1.14.0-1.module+el8.4.0+8855+a9e237a9 04/01/2014 [ 184.081554] RIP: 0010:btf_populate_kfunc_set+0x23c/0x330 [ 184.081558] RSP: 0018:ff22cfb38071fc90 EFLAGS: 00010202 [ 184.081559] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000 [ 184.081560] RDX: 000000000000006e RSI: ffffffff95c00000 RDI: ff13805543436350 [ 184.081561] RBP: ffffffffc0e22180 R08: ff13805543410808 R09: 000000000001ec00 [ 184.081562] R10: ff13805541c8113c R11: 0000000000000010 R12: ff13805541b83c00 [ 184.081563] R13: ff13805543410800 R14: 0000000000000001 R15: ffffffffc0e2259a [ 184.081564] FS: 00007fa436c46740(0000) GS:ff1380557ba00000(0000) knlGS:0000000000000000 [ 184.081569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 184.081570] CR2: 000055e7b3187000 CR3: 0000000100c48003 CR4: 0000000000771ef0 [ 184.081571] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 184.081572] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 184.081572] PKRU: 55555554 [ 184.081574] Call Trace: [ 184.081575] <TASK> [ 184.081578] ? show_trace_log_lvl+0x1b0/0x2f0 [ 184.081580] ? show_trace_log_lvl+0x1b0/0x2f0 [ 184.081582] ? __register_btf_kfunc_id_set+0x199/0x200 [ 184.081585] ? btf_populate_kfunc_set+0x23c/0x330 [ 184.081586] ? __warn.cold+0x93/0xed [ 184.081590] ? btf_populate_kfunc_set+0x23c/0x330 [ 184.081592] ? report_bug+0xff/0x140 [ 184.081594] ? handle_bug+0x3a/0x70 [ 184.081596] ? exc_invalid_op+0x17/0x70 [ 184.081597] ? asm_exc_invalid_op+0x1a/0x20 [ 184.081601] ? btf_populate_kfunc_set+0x23c/0x330 [ 184.081602] __register_btf_kfunc_id_set+0x199/0x200 [ 184.081605] ? __pfx_nf_flow_inet_module_init+0x10/0x10 [nf_flow_table_inet] [ 184.081607] do_one_initcall+0x58/0x300 [ 184.081611] do_init_module+0x60/0x230 [ 184.081614] __do_sys_init_module+0x17a/0x1b0 [ 184.081617] do_syscall_64+0x7d/0x160 [ 184.081620] ? __count_memcg_events+0x58/0xf0 [ 184.081623] ? handle_mm_fault+0x234/0x350 [ 184.081626] ? do_user_addr_fault+0x347/0x640 [ 184.081630] ? clear_bhb_loop+0x25/0x80 [ 184.081633] ? clear_bhb_loop+0x25/0x80 [ 184.081634] ? clear_bhb_loop+0x25/0x80 [ 184.081637] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 184.081639] RIP: 0033:0x7fa43652e4ce [ 184.081647] RSP: 002b:00007ffe8213be18 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 184.081649] RAX: ffffffffffffffda RBX: 000055e7b3176c20 RCX: 00007fa43652e4ce [ 184.081650] RDX: 000055e7737fde79 RSI: 0000000000003990 RDI: 000055e7b3185380 [ 184.081651] RBP: 000055e7737fde79 R08: 0000000000000007 R09: 000055e7b3179bd0 [ 184.081651] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000040000 [ 184.081652] R13: 000055e7b3176fa0 R14: 0000000000000000 R15: 000055e7b3179b80 Fixes: 391bb6594fd3 ("netfilter: Add bpf_xdp_flow_lookup kfunc") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://patch.msgid.link/20240911-nf-flowtable-bpf-modprob-fix-v1-1-f9fc075aafc3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-12Merge tag 'nf-24-09-12' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains two fixes from Florian Westphal: Patch #1 fixes a sk refcount leak in nft_socket on mismatch. Patch #2 fixes cgroupsv2 matching from containers due to incorrect level in subtree. netfilter pull request 24-09-12 * tag 'nf-24-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_socket: make cgroupsv2 matching work with namespaces netfilter: nft_socket: fix sk refcount leaks ==================== Link: https://patch.msgid.link/20240911222520.3606-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-12LoongArch: KVM: Enable paravirt feature control from VMMBibo Mao
Export kernel paravirt features to user space, so that VMM can control each single paravirt feature. By default paravirt features will be the same with kvm supported features if VMM does not set it. Also a new feature KVM_FEATURE_VIRT_EXTIOI is added which can be set from user space. This feature indicates that the virt EIOINTC can route interrupts to 256 vCPUs, rather than 4 vCPUs like with real HW. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-12LoongArch: KVM: Add PMU support for guestSong Gao
On LoongArch, the host and guest have their own PMU CSRs registers and they share PMU hardware resources. A set of PMU CSRs consists of a CTRL register and a CNTR register. We can set which PMU CSRs are used by the guest by writing to the GCFG register [24:26] bits. On KVM side: - Save the host PMU CSRs into structure kvm_context. - If the host supports the PMU feature. - When entering guest mode, save the host PMU CSRs and restore the guest PMU CSRs. - When exiting guest mode, save the guest PMU CSRs and restore the host PMU CSRs. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-12PCI: Fix potential deadlock in pcim_intx()Philipp Stanner
25216afc9db5 ("PCI: Add managed pcim_intx()") moved the allocation step for pci_intx()'s device resource from pcim_enable_device() to pcim_intx(). As before, pcim_enable_device() sets pci_dev.is_managed to true; and it is never set to false again. Due to the lifecycle of a struct pci_dev, it can happen that a second driver obtains the same pci_dev after a first driver ran. If one driver uses pcim_enable_device() and the other doesn't, this causes the other driver to run into managed pcim_intx(), which will try to allocate when called for the first time. Allocations might sleep, so calling pci_intx() while holding spinlocks becomes then invalid, which causes lockdep warnings and could cause deadlocks: ======================================================== WARNING: possible irq lock inversion dependency detected 6.11.0-rc6+ #59 Tainted: G W -------------------------------------------------------- CPU 0/KVM/1537 just changed the state of lock: ffffa0f0cff965f0 (&vdev->irqlock){-...}-{2:2}, at: vfio_intx_handler+0x21/0xd0 [vfio_pci_core] but this lock took another, HARDIRQ-unsafe lock in the past: (fs_reclaim){+.+.}-{0:0} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); local_irq_disable(); lock(&vdev->irqlock); lock(fs_reclaim); <Interrupt> lock(&vdev->irqlock); *** DEADLOCK *** Have pcim_enable_device()'s release function, pcim_disable_device(), set pci_dev.is_managed to false so that subsequent drivers using the same struct pci_dev do not implicitly run into managed code. Link: https://lore.kernel.org/r/20240905072556.11375-2-pstanner@redhat.com Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()") Reported-by: Alex Williamson <alex.williamson@redhat.com> Closes: https://lore.kernel.org/all/20240903094431.63551744.alex.williamson@redhat.com/ Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
2024-09-12Merge branch 'for-next/timers' into for-next/coreWill Deacon
* for-next/timers: arm64: Implement prctl(PR_{G,S}ET_TSC)
2024-09-12Merge branch 'for-next/selftests' into for-next/coreWill Deacon
* for-next/selftests: kselftest/arm64: Fix build warnings for ptrace kselftest/arm64: Actually test SME vector length changes via sigreturn kselftest/arm64: signal: fix/refactor SVE vector length enumeration
2024-09-12Merge branch 'for-next/poe' into for-next/coreWill Deacon
* for-next/poe: (31 commits) arm64: pkeys: remove redundant WARN kselftest/arm64: Add test case for POR_EL0 signal frame records kselftest/arm64: parse POE_MAGIC in a signal frame kselftest/arm64: add HWCAP test for FEAT_S1POE selftests: mm: make protection_keys test work on arm64 selftests: mm: move fpregs printing kselftest/arm64: move get_header() arm64: add Permission Overlay Extension Kconfig arm64: enable PKEY support for CPUs with S1POE arm64: enable POE and PIE to coexist arm64/ptrace: add support for FEAT_POE arm64: add POE signal support arm64: implement PKEYS support arm64: add pte_access_permitted_no_overlay() arm64: handle PKEY/POE faults arm64: mask out POIndex when modifying a PTE arm64: convert protection key into vm_flags and pgprot values arm64: add POIndex defines arm64: re-order MTE VM_ flags arm64: enable the Permission Overlay Extension for EL0 ...
2024-09-12Merge branch 'for-next/pkvm-guest' into for-next/coreWill Deacon
* for-next/pkvm-guest: arm64: smccc: Reserve block of KVM "vendor" services for pKVM hypercalls drivers/virt: pkvm: Intercept ioremap using pKVM MMIO_GUARD hypercall arm64: mm: Add confidential computing hook to ioremap_prot() drivers/virt: pkvm: Hook up mem_encrypt API using pKVM hypercalls arm64: mm: Add top-level dispatcher for internal mem_encrypt API drivers/virt: pkvm: Add initial support for running as a protected guest firmware/smccc: Call arch-specific hook on discovering KVM services
2024-09-12Merge branch 'for-next/perf' into for-next/coreWill Deacon
* for-next/perf: (33 commits) perf: arm-ni: Fix an NULL vs IS_ERR() bug perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled MAINTAINERS: List Arm interconnect PMUs as supported perf: Add driver for Arm NI-700 interconnect PMU dt-bindings/perf: Add Arm NI-700 PMU perf/arm-cmn: Improve format attr printing perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check perf/arm-cmn: Support CMN S3 dt-bindings: perf: arm-cmn: Add CMN S3 perf/arm-cmn: Refactor DTC PMU register access perf/arm-cmn: Make cycle counts less surprising perf/arm-cmn: Improve build-time assertion perf/arm-cmn: Ensure dtm_idx is big enough perf/arm-cmn: Fix CCLA register offset perf/arm-cmn: Refactor node ID handling. Again. drivers/perf: hisi_pcie: Export supported Root Ports [bdf_min, bdf_max] drivers/perf: hisi_pcie: Fix TLP headers bandwidth counting drivers/perf: hisi_pcie: Record hardware counts correctly drivers/perf: arm_spe: Use perf_allow_kernel() for permissions perf/dwc_pcie: Add support for QCOM vendor devices ...
2024-09-12Merge branch 'for-next/mm' into for-next/coreWill Deacon
* for-next/mm: arm64/mm: use lm_alias() with addresses passed to memblock_free() mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags() arm64: Expose the end of the linear map in PHYSMEM_END arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec() arm64/mm: Delete __init region from memblock.reserved
2024-09-12Merge branch 'for-next/misc' into for-next/coreWill Deacon
* for-next/misc: arm64: hibernate: Fix warning for cast from restricted gfp_t arm64: esr: Define ESR_ELx_EC_* constants as UL arm64: Constify struct kobj_type arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first arm64/sve: Remove unused declaration read_smcr_features() arm64: mm: Remove unused declaration early_io_map() arm64: el2_setup.h: Rename some labels to be more diff-friendly arm64: signal: Fix some under-bracketed UAPI macros arm64/mm: Drop TCR_SMP_FLAGS arm64/mm: Drop PMD_SECT_VALID
2024-09-12Merge branch 'for-next/errata' into for-next/coreWill Deacon
* for-next/errata: arm64: errata: Enable the AC03_CPU_38 workaround for ampere1a
2024-09-12Merge branch 'for-next/acpi' into for-next/coreWill Deacon
* for-next/acpi: ACPI/IORT: Add PMCG platform information for HiSilicon HIP10/11 ACPI: ARM64: add acpi_iort.h to MAINTAINERS ACPI/IORT: Switch to use kmemdup_array()
2024-09-12erofs: sunset unneeded NOFAILsGao Xiang
With iterative development, our codebase can now deal with compressed buffer misses properly if both in-place I/O and compressed buffer allocation fail. Note that if readahead fails (with non-uptodate folios), the original request will then fall back to synchronous read, and `.read_folio()` should return appropriate errnos; otherwise -EIO will be passed to user space, which is unexpected. To simplify rarely encountered failure paths, a mimic decompression will be just used. Before that, failure reasons are recorded in compressed_bvecs[] and they also act as placeholders to avoid in-place pages. They will be parsed just before decompression and then pass back to `.read_folio()`. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240905084732.2684515-1-hsiangkao@linux.alibaba.com
2024-09-12perf: arm-ni: Fix an NULL vs IS_ERR() bugDan Carpenter
The devm_ioremap() function never returns error pointers, it returns a NULL pointer if there is an error. Fixes: 4d5a7680f2b4 ("perf: Add driver for Arm NI-700 interconnect PMU") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/04d6ccc3-6d31-4f0f-ab0f-7a88342cc09a@stanley.mountain Signed-off-by: Will Deacon <will@kernel.org>
2024-09-12arm64: hibernate: Fix warning for cast from restricted gfp_tMin-Hua Chen
This patch fixes the following warning by adding __force to the cast: arch/arm64/kernel/hibernate.c:410:44: sparse: warning: cast from restricted gfp_t No functional change intended. Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Link: https://lore.kernel.org/r/20240910232507.313555-1-minhuadotchen@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-12docs: filesystems: corrected grammar of netfs pageDennis Lam
Fixed the word "aren't" to "isn't" based on singular word "bufferage". Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com> Link: https://lore.kernel.org/r/20240912012550.13748-2-dennis.lamerice@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12Merge branch 'netfs-writeback' of ↵Christian Brauner
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into vfs.netfs Merge patch series "netfs: Read/write improvements" from David Howells <dhowells@redhat.com>. * 'netfs-writeback' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (25 commits) cifs: Don't support ITER_XARRAY cifs: Switch crypto buffer to use a folio_queue rather than an xarray cifs: Use iterate_and_advance*() routines directly for hashing netfs: Cancel dirty folios that have no storage destination cachefiles, netfs: Fix write to partial block at EOF netfs: Remove fs/netfs/io.c netfs: Speed up buffered reading afs: Make read subreqs async netfs: Simplify the writeback code netfs: Provide an iterator-reset function netfs: Use new folio_queue data type and iterator instead of xarray iter cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs iov_iter: Provide copy_folio_from_iter() mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios netfs: Use bh-disabling spinlocks for rreq->lock netfs: Set the request work function upon allocation netfs: Remove NETFS_COPY_TO_CACHE netfs: Reserve netfs_sreq_source 0 as unset/unknown netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode ... Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12cifs: Don't support ITER_XARRAYDavid Howells
There's now no need to support ITER_XARRAY in cifs as netfslib hands down ITER_FOLIOQ instead - and that's simpler to use with iterate_and_advance() as it doesn't hold the RCU read lock over the step function. This is part of the process of phasing out ITER_XARRAY. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Enzo Matsumiya <ematsumiya@suse.de> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-26-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12cifs: Switch crypto buffer to use a folio_queue rather than an xarrayDavid Howells
Switch cifs from using an xarray to hold the transport crypto buffer to using a folio_queue and use ITER_FOLIOQ rather than ITER_XARRAY. This is part of the process of phasing out ITER_XARRAY. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Enzo Matsumiya <ematsumiya@suse.de> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-25-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12cifs: Use iterate_and_advance*() routines directly for hashingDavid Howells
Replace the bespoke cifs iterators of ITER_BVEC and ITER_KVEC to do hashing with iterate_and_advance_kernel() - a variant on iterate_and_advance() that only supports kernel-internal ITER_* types and not UBUF/IOVEC types. The bespoke ITER_XARRAY is left because we don't really want to be calling crypto_shash_update() under the RCU read lock for large amounts of data; besides, ITER_XARRAY is going to be phased out. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Enzo Matsumiya <ematsumiya@suse.de> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-24-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12netfs: Cancel dirty folios that have no storage destinationDavid Howells
Kafs wants to be able to cache the contents of directories (and symlinks), but whilst these are downloaded from the server with the FS.FetchData RPC op and similar, the same as for regular files, they can't be updated by FS.StoreData, but rather have special operations (FS.MakeDir, etc.). Now, rather than redownloading a directory's content after each change made to that directory, kafs modifies the local blob. This blob can be saved out to the cache, and since it's using netfslib, kafs just marks the folios dirty and lets ->writepages() on the directory take care of it, as for an regular file. This is fine as long as there's a cache as although the upload stream is disabled, there's a cache stream to drive the procedure. But if the cache goes away in the meantime, suddenly there's no way do any writes and the code gets confused, complains "R=%x: No submit" to dmesg and leaves the dirty folio hanging. Fix this by just cancelling the store of the folio if neither stream is active. (If there's no cache at the time of dirtying, we should just not mark the folio dirty). Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-23-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12cachefiles, netfs: Fix write to partial block at EOFDavid Howells
Because it uses DIO writes, cachefiles is unable to make a write to the backing file if that write is not aligned to and sized according to the backing file's DIO block alignment. This makes it tricky to handle a write to the cache where the EOF on the network file is not correctly aligned. To get around this, netfslib attempts to tell the driver it is calling how much more data there is available beyond the EOF that it can use to pad the write (netfslib preclears the part of the folio above the EOF). However, it tries to tell the cache what the maximum length is, but doesn't calculate this correctly; and, in any case, cachefiles actually ignores the value and just skips the block. Fix this by: (1) Change the value passed to indicate the amount of extra data that can be added to the operation (now ->submit_extendable_to). This is much simpler to calculate as it's just the end of the folio minus the top of the data within the folio - rather than having to account for data spread over multiple folios. (2) Make cachefiles add some of this data if the subrequest it is given ends at the network file's i_size if the extra data is sufficient to pad out to a whole block. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-22-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12netfs: Remove fs/netfs/io.cDavid Howells
Remove fs/netfs/io.c as it is no longer used. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-21-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12netfs: Speed up buffered readingDavid Howells
Improve the efficiency of buffered reads in a number of ways: (1) Overhaul the algorithm in general so that it's a lot more compact and split the read submission code between buffered and unbuffered versions. The unbuffered version can be vastly simplified. (2) Read-result collection is handed off to a work queue rather than being done in the I/O thread. Multiple subrequests can be processes simultaneously. (3) When a subrequest is collected, any folios it fully spans are collected and "spare" data on either side is donated to either the previous or the next subrequest in the sequence. Notes: (*) Readahead expansion is massively slows down fio, presumably because it causes a load of extra allocations, both folio and xarray, up front before RPC requests can be transmitted. (*) RDMA with cifs does appear to work, both with SIW and RXE. (*) PG_private_2-based reading and copy-to-cache is split out into its own file and altered to use folio_queue. Note that the copy to the cache now creates a new write transaction against the cache and adds the folios to be copied into it. This allows it to use part of the writeback I/O code. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-20-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12afs: Make read subreqs asyncDavid Howells
Perform AFS read subrequests in a work item rather than in the calling thread. For normal buffered reads, this will allow the calling thread to copy data from the pagecache to the application at the same time as the demarshalling thread is shovelling data from skbuffs into the pagecache. This will also allow the RA mark to trigger a new read before we've finished shovelling the data from the current one. Note: This would be a bit safer if the FS.FetchData RPC ops returned the metadata (including the data version number) before returning the data. This would allow me to flush the pagecache before installing the new data. In future, it may be possible to asynchronously flush the pagecache either side of the region being read. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-19-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12netfs: Simplify the writeback codeDavid Howells
Use the new folio_queue structures to simplify the writeback code. The problem with referring to the i_pages xarray directly is that we may have gaps in the sequence of folios we're writing from that we need to skip when we're removing the writeback mark from the folios we're writing back from. At the moment the code tries to deal with this by carefully tracking the gaps in each writeback stream (eg. write to server and write to cache) and divining when there's a gap that spans folios (something that's not helped by folios not being a consistent size). Instead, the folio_queue buffer contains pointers only the folios we're dealing with, has them in ascending order and indicates a gap by placing non-consequitive folios next to each other. This makes it possible to track where we need to clean up to by just keeping track of where we've processed to on each stream and taking the minimum. Note that the I/O iterator is always rounded up to the end of the folio, even if that is beyond the EOF position, so that the cache can do DIO from the page. The excess space is cleared, though mmapped writes clobber it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-18-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12netfs: Provide an iterator-reset functionDavid Howells
Provide a function to reset the iterator on a subrequest. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-17-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12netfs: Use new folio_queue data type and iterator instead of xarray iterDavid Howells
Make the netfs write-side routines use the new folio_queue struct to hold a rolling buffer of folios, with the issuer adding folios at the tail and the collector removing them from the head as they're processed instead of using an xarray. This will allow a subsequent patch to simplify the write collector. The primary mark (as tested by folioq_is_marked()) is used to note if the corresponding folio needs putting. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-16-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEsDavid Howells
Make smb_extract_iter_to_rdma() extract page fragments from an ITER_FOLIOQ iterator into RDMA SGEs. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Enzo Matsumiya <ematsumiya@suse.de> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-15-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12iov_iter: Provide copy_folio_from_iter()David Howells
Provide a copy_folio_from_iter() wrapper. Signed-off-by: David Howells <dhowells@redhat.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20240814203850.2240469-14-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of foliosDavid Howells
Define a data structure, struct folio_queue, to represent a sequence of folios and a kernel-internal I/O iterator type, ITER_FOLIOQ, to allow a list of folio_queue structures to be used to provide a buffer to iov_iter-taking functions, such as sendmsg and recvmsg. The folio_queue structure looks like: struct folio_queue { struct folio_batch vec; u8 orders[PAGEVEC_SIZE]; struct folio_queue *next; struct folio_queue *prev; unsigned long marks; unsigned long marks2; }; It does not use a list_head so that next and/or prev can be set to NULL at the ends of the list, allowing iov_iter-handling routines to determine that they *are* the ends without needing to store a head pointer in the iov_iter struct. A folio_batch struct is used to hold the folio pointers which allows the batch to be passed to batch handling functions. Two mark bits are available per slot. The intention is to use at least one of them to mark folios that need putting, but that might not be ultimately necessary. Accessor functions are used to access the slots to do the masking and an additional accessor function is used to indicate the size of the array. The order of each folio is also stored in the structure to avoid the need for iov_iter_advance() and iov_iter_revert() to have to query each folio to find its size. With careful barriering, this can be used as an extending buffer with new folios inserted and new folio_queue structs added without the need for a lock. Further, provided we always keep at least one struct in the buffer, we can also remove consumed folios and consumed structs from the head end as we without the need for locks. [Questions/thoughts] (1) To manage this, I need a head pointer, a tail pointer, a tail slot number (assuming insertion happens at the tail end and the next pointers point from head to tail). Should I put these into a struct of their own, say "folio_queue_head" or "rolling_buffer"? I will end up with two of these in netfs_io_request eventually, one keeping track of the pagecache I'm dealing with for buffered I/O and the other to hold a bounce buffer when we need one. (2) Should I make the slots {folio,off,len} or bio_vec? (3) This is intended to replace ITER_XARRAY eventually. Using an xarray in I/O iteration requires the taking of the RCU read lock, doing copying under the RCU read lock, walking the xarray (which may change under us), handling retries and dealing with special values. The advantage of ITER_XARRAY is that when we're dealing with the pagecache directly, we don't need any allocation - but if we're doing encrypted comms, there's a good chance we'd be using a bounce buffer anyway. This will require afs, erofs, cifs, orangefs and fscache to be converted to not use this. afs still uses it for dirs and symlinks; some of erofs usages should be easy to change, but there's one which won't be so easy; ceph's use via fscache can be fixed by porting ceph to netfslib; cifs is using xarray as a bounce buffer - that can be moved to use sheaves instead; and orangefs has a similar problem to erofs - maybe orangefs could use netfslib? Signed-off-by: David Howells <dhowells@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: Steve French <sfrench@samba.org> cc: Ilya Dryomov <idryomov@gmail.com> cc: Gao Xiang <xiang@kernel.org> cc: Mike Marshall <hubcap@omnibond.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: linux-erofs@lists.ozlabs.org cc: devel@lists.orangefs.org Link: https://lore.kernel.org/r/20240814203850.2240469-13-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12uidgid: make sure we fit into one cachelineChristian Brauner
When I expanded uidgid mappings I intended for a struct uid_gid_map to fit into a single cacheline on x86 as they tend to be pretty performance sensitive (idmapped mounts etc). But a 4 byte hole was added that brought it over 64 bytes. Fix that by adding the static extent array and the extent counter into a substruct. C's type punning for unions guarantees that we can access ->nr_extents even if the last written to member wasn't within the same object. This is also what we rely on in struct_group() and friends. This of course relies on non-strict aliasing which we don't do. 99) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). Link: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf Link: https://lore.kernel.org/r/20240910-work-uid_gid_map-v1-1-e6bc761363ed@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12Merge patch series "file: remove f_version"Christian Brauner
Christian Brauner <brauner@kernel.org> says: The f_version member in struct file isn't particularly well-defined. It is mainly used as a cookie to detect concurrent seeks when iterating directories. But it is also abused by some subsystems for completely unrelated things. It is mostly a directory specific thing that doesn't really need to live in struct file and with its wonky semantics it really lacks a specific function. For pipes, f_version is (ab)used to defer poll notifications until a write has happened. And struct pipe_inode_info is used by multiple struct files in their ->private_data so there's no chance of pushing that down into file->private_data without introducing another pointer indirection. But this should be a solvable problem. Only regular files with FMODE_ATOMIC_POS and directories require f_pos_lock. Pipes and other files don't. So this adds a union into struct file encompassing f_pos_lock and a pipe specific f_pipe member that pipes can use. This union of course can be extended to other file types and is similar to what we do in struct inode already. * patches from https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-0-6d3e4816aa7b@kernel.org: fs: remove f_version pipe: use f_pipe fs: add f_pipe ubifs: store cookie in private data ufs: store cookie in private data udf: store cookie in private data proc: store cookie in private data ocfs2: store cookie in private data input: remove f_version abuse ext4: store cookie in private data ext2: store cookie in private data affs: store cookie in private data fs: add generic_llseek_cookie() fs: use must_set_pos() fs: add must_set_pos() fs: add vfs_setpos_cookie() s390: remove unused f_version ceph: remove unused f_version adi: remove unused f_version file: remove pointless comment Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-0-6d3e4816aa7b@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12fs: remove f_versionChristian Brauner
Now that detecting concurrent seeks is done by the filesystems that require it we can remove f_version and free up 8 bytes for future extensions. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-20-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12pipe: use f_pipeChristian Brauner
Pipes use f_version to defer poll notifications until a write has been observed. Since multiple file's refer to the same struct pipe_inode_info in their ->private_data moving it into their isn't feasible since we would need to introduce an additional pointer indirection. However, since pipes don't require f_pos_lock we placed a new f_pipe member into a union with f_pos_lock that pipes can use. This is similar to what we already do for struct inode where we have additional fields per file type. This will allow us to fully remove f_version in the next step. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-19-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12fs: add f_pipeChristian Brauner
Only regular files with FMODE_ATOMIC_POS and directories need f_pos_lock. Place a new f_pipe member in a union with f_pos_lock that they can use and make them stop abusing f_version in follow-up patches. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-18-6d3e4816aa7b@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12ubifs: store cookie in private dataChristian Brauner
Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-17-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12ufs: store cookie in private dataChristian Brauner
Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-16-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12udf: store cookie in private dataChristian Brauner
Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-15-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12proc: store cookie in private dataChristian Brauner
Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-14-6d3e4816aa7b@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12ocfs2: store cookie in private dataChristian Brauner
Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-13-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12input: remove f_version abuseChristian Brauner
f_version is removed from struct file. Make input stop abusing f_version for stashing information for poll. Move the input state counter into input_seq_state and allocate it via seq_private_open() and free via seq_release_private(). Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-12-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12Merge patch series "can: m_can: fix struct net_device_ops::{open,stop} ↵Marc Kleine-Budde
callbacks under high bus load" Marc Kleine-Budde <mkl@pengutronix.de> says: Under high CAN-bus load the struct net_device_ops::{open,stop} callbacks (m_can_open(), m_can_close()) don't properly start and shutdown the device. Fix the problems by re-arranging the order of functions in m_can_open() and m_can_close(). Link: https://patch.msgid.link/20240910-can-m_can-fix-ifup-v3-0-6c1720ba45ce@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-09-12can: m_can: m_can_close(): stop clocks after device has been shut downMarc Kleine-Budde
After calling m_can_stop() an interrupt may be pending or NAPI might still be executed. This means the driver might still touch registers of the IP core after the clocks have been disabled. This is not good practice and might lead to aborts depending on the SoC integration. To avoid these potential problems, make m_can_close() symmetric to m_can_open(), i.e. stop the clocks at the end, right before shutting down the transceiver. Fixes: e0d1f4816f2a ("can: m_can: add Bosch M_CAN controller support") Link: https://patch.msgid.link/20240910-can-m_can-fix-ifup-v3-2-6c1720ba45ce@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-09-12can: m_can: enable NAPI before enabling interruptsJake Hamby
If an interrupt (RX-complete or error flag) is set when bringing up the CAN device, e.g. due to CAN bus traffic before initializing the device, when m_can_start() is called and interrupts are enabled, m_can_isr() is called immediately, which disables all CAN interrupts and calls napi_schedule(). Because napi_enable() isn't called until later in m_can_open(), the call to napi_schedule() never schedules the m_can_poll() callback and the device is left with interrupts disabled and can't receive any CAN packets until rebooted. This can be verified by running "cansend" from another device before setting the bitrate and calling "ip link set up can0" on the test device. Adding debug lines to m_can_isr() shows it's called with flags (IR_EP | IR_EW | IR_CRCE), which calls m_can_disable_all_interrupts() and napi_schedule(), and then m_can_poll() is never called. Move the call to napi_enable() above the call to m_can_start() to enable any initial interrupt flags to be handled by m_can_poll() so that interrupts are reenabled. Add a call to napi_disable() in the error handling section of m_can_open(), to handle the case where later functions return errors. Also, in m_can_close(), move the call to napi_disable() below the call to m_can_stop() to ensure all interrupts are handled when bringing down the device. This race condition is much less likely to occur. Tested on a Microchip SAMA7G54 MPU. The fix should be applicable to any SoC with a Bosch M_CAN controller. Signed-off-by: Jake Hamby <Jake.Hamby@Teledyne.com> Fixes: e0d1f4816f2a ("can: m_can: add Bosch M_CAN controller support") Link: https://patch.msgid.link/20240910-can-m_can-fix-ifup-v3-1-6c1720ba45ce@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>