Age | Commit message (Collapse) | Author |
|
Store xe_device ahead of processing message as message can be free'd in
some cases.
v2:
- Including missing local changes
v3:
- Resend for CI
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202407231445.rpisd1vA-lkp@intel.com/
Fixes: 55ea73aacfb9 ("drm/xe: Build PM into GuC CT layer")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240724164341.1848954-1-matthew.brost@intel.com
(cherry picked from commit 1a394b4f504f33eac8c38b6f42ba025105c7e869)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
'fence' argument in send_tlb_invalidation cannot be NULL, remove
non-NULL check from send_tlb_invalidation.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202407231049.esig0Fkb-lkp@intel.com/
Fixes: 58bfe6674467 ("drm/xe: Drop xe_gt_tlb_invalidation_wait")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240723190714.1744653-1-matthew.brost@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
(cherry picked from commit 6482253e6e1ad1c3a76645a3899d3cfdb5b918cb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The header generated/xe_wa_oob.h is included twice. Remove one.
Fixes: 27cb2b7fec2a ("drm/xe/bmg: implement Wa_16023588340")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202407052122.AzuWSPuo-lkp@intel.com/
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240708173301.1543871-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 3d122660dc70029d9cccb4e8670125f0affa959e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
nf_flow_table_module_init()
Move nf flowtable bpf initialization in nf_flow_table module load
routine since nf_flow_table_bpf is part of nf_flow_table module and not
nf_flow_table_inet one. This patch allows to avoid the following kernel
warning running the reproducer below:
$modprobe nf_flow_table_inet
$rmmod nf_flow_table_inet
$modprobe nf_flow_table_inet
modprobe: ERROR: could not insert 'nf_flow_table_inet': Invalid argument
[ 184.081501] ------------[ cut here ]------------
[ 184.081527] WARNING: CPU: 0 PID: 1362 at kernel/bpf/btf.c:8206 btf_populate_kfunc_set+0x23c/0x330
[ 184.081550] CPU: 0 UID: 0 PID: 1362 Comm: modprobe Kdump: loaded Not tainted 6.11.0-0.rc5.22.el10.x86_64 #1
[ 184.081553] Hardware name: Red Hat OpenStack Compute, BIOS 1.14.0-1.module+el8.4.0+8855+a9e237a9 04/01/2014
[ 184.081554] RIP: 0010:btf_populate_kfunc_set+0x23c/0x330
[ 184.081558] RSP: 0018:ff22cfb38071fc90 EFLAGS: 00010202
[ 184.081559] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
[ 184.081560] RDX: 000000000000006e RSI: ffffffff95c00000 RDI: ff13805543436350
[ 184.081561] RBP: ffffffffc0e22180 R08: ff13805543410808 R09: 000000000001ec00
[ 184.081562] R10: ff13805541c8113c R11: 0000000000000010 R12: ff13805541b83c00
[ 184.081563] R13: ff13805543410800 R14: 0000000000000001 R15: ffffffffc0e2259a
[ 184.081564] FS: 00007fa436c46740(0000) GS:ff1380557ba00000(0000) knlGS:0000000000000000
[ 184.081569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 184.081570] CR2: 000055e7b3187000 CR3: 0000000100c48003 CR4: 0000000000771ef0
[ 184.081571] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 184.081572] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 184.081572] PKRU: 55555554
[ 184.081574] Call Trace:
[ 184.081575] <TASK>
[ 184.081578] ? show_trace_log_lvl+0x1b0/0x2f0
[ 184.081580] ? show_trace_log_lvl+0x1b0/0x2f0
[ 184.081582] ? __register_btf_kfunc_id_set+0x199/0x200
[ 184.081585] ? btf_populate_kfunc_set+0x23c/0x330
[ 184.081586] ? __warn.cold+0x93/0xed
[ 184.081590] ? btf_populate_kfunc_set+0x23c/0x330
[ 184.081592] ? report_bug+0xff/0x140
[ 184.081594] ? handle_bug+0x3a/0x70
[ 184.081596] ? exc_invalid_op+0x17/0x70
[ 184.081597] ? asm_exc_invalid_op+0x1a/0x20
[ 184.081601] ? btf_populate_kfunc_set+0x23c/0x330
[ 184.081602] __register_btf_kfunc_id_set+0x199/0x200
[ 184.081605] ? __pfx_nf_flow_inet_module_init+0x10/0x10 [nf_flow_table_inet]
[ 184.081607] do_one_initcall+0x58/0x300
[ 184.081611] do_init_module+0x60/0x230
[ 184.081614] __do_sys_init_module+0x17a/0x1b0
[ 184.081617] do_syscall_64+0x7d/0x160
[ 184.081620] ? __count_memcg_events+0x58/0xf0
[ 184.081623] ? handle_mm_fault+0x234/0x350
[ 184.081626] ? do_user_addr_fault+0x347/0x640
[ 184.081630] ? clear_bhb_loop+0x25/0x80
[ 184.081633] ? clear_bhb_loop+0x25/0x80
[ 184.081634] ? clear_bhb_loop+0x25/0x80
[ 184.081637] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 184.081639] RIP: 0033:0x7fa43652e4ce
[ 184.081647] RSP: 002b:00007ffe8213be18 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 184.081649] RAX: ffffffffffffffda RBX: 000055e7b3176c20 RCX: 00007fa43652e4ce
[ 184.081650] RDX: 000055e7737fde79 RSI: 0000000000003990 RDI: 000055e7b3185380
[ 184.081651] RBP: 000055e7737fde79 R08: 0000000000000007 R09: 000055e7b3179bd0
[ 184.081651] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000040000
[ 184.081652] R13: 000055e7b3176fa0 R14: 0000000000000000 R15: 000055e7b3179b80
Fixes: 391bb6594fd3 ("netfilter: Add bpf_xdp_flow_lookup kfunc")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://patch.msgid.link/20240911-nf-flowtable-bpf-modprob-fix-v1-1-f9fc075aafc3@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains two fixes from Florian Westphal:
Patch #1 fixes a sk refcount leak in nft_socket on mismatch.
Patch #2 fixes cgroupsv2 matching from containers due to incorrect
level in subtree.
netfilter pull request 24-09-12
* tag 'nf-24-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_socket: make cgroupsv2 matching work with namespaces
netfilter: nft_socket: fix sk refcount leaks
====================
Link: https://patch.msgid.link/20240911222520.3606-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Export kernel paravirt features to user space, so that VMM can control
each single paravirt feature. By default paravirt features will be the
same with kvm supported features if VMM does not set it.
Also a new feature KVM_FEATURE_VIRT_EXTIOI is added which can be set
from user space. This feature indicates that the virt EIOINTC can route
interrupts to 256 vCPUs, rather than 4 vCPUs like with real HW.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
On LoongArch, the host and guest have their own PMU CSRs registers and
they share PMU hardware resources. A set of PMU CSRs consists of a CTRL
register and a CNTR register. We can set which PMU CSRs are used by the
guest by writing to the GCFG register [24:26] bits.
On KVM side:
- Save the host PMU CSRs into structure kvm_context.
- If the host supports the PMU feature.
- When entering guest mode, save the host PMU CSRs and restore the guest PMU CSRs.
- When exiting guest mode, save the guest PMU CSRs and restore the host PMU CSRs.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
25216afc9db5 ("PCI: Add managed pcim_intx()") moved the allocation step for
pci_intx()'s device resource from pcim_enable_device() to pcim_intx(). As
before, pcim_enable_device() sets pci_dev.is_managed to true; and it is
never set to false again.
Due to the lifecycle of a struct pci_dev, it can happen that a second
driver obtains the same pci_dev after a first driver ran. If one driver
uses pcim_enable_device() and the other doesn't, this causes the other
driver to run into managed pcim_intx(), which will try to allocate when
called for the first time.
Allocations might sleep, so calling pci_intx() while holding spinlocks
becomes then invalid, which causes lockdep warnings and could cause
deadlocks:
========================================================
WARNING: possible irq lock inversion dependency detected
6.11.0-rc6+ #59 Tainted: G W
--------------------------------------------------------
CPU 0/KVM/1537 just changed the state of lock:
ffffa0f0cff965f0 (&vdev->irqlock){-...}-{2:2}, at:
vfio_intx_handler+0x21/0xd0 [vfio_pci_core] but this lock took another,
HARDIRQ-unsafe lock in the past: (fs_reclaim){+.+.}-{0:0}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
local_irq_disable();
lock(&vdev->irqlock);
lock(fs_reclaim);
<Interrupt>
lock(&vdev->irqlock);
*** DEADLOCK ***
Have pcim_enable_device()'s release function, pcim_disable_device(), set
pci_dev.is_managed to false so that subsequent drivers using the same
struct pci_dev do not implicitly run into managed code.
Link: https://lore.kernel.org/r/20240905072556.11375-2-pstanner@redhat.com
Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Closes: https://lore.kernel.org/all/20240903094431.63551744.alex.williamson@redhat.com/
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
|
|
* for-next/timers:
arm64: Implement prctl(PR_{G,S}ET_TSC)
|
|
* for-next/selftests:
kselftest/arm64: Fix build warnings for ptrace
kselftest/arm64: Actually test SME vector length changes via sigreturn
kselftest/arm64: signal: fix/refactor SVE vector length enumeration
|
|
* for-next/poe: (31 commits)
arm64: pkeys: remove redundant WARN
kselftest/arm64: Add test case for POR_EL0 signal frame records
kselftest/arm64: parse POE_MAGIC in a signal frame
kselftest/arm64: add HWCAP test for FEAT_S1POE
selftests: mm: make protection_keys test work on arm64
selftests: mm: move fpregs printing
kselftest/arm64: move get_header()
arm64: add Permission Overlay Extension Kconfig
arm64: enable PKEY support for CPUs with S1POE
arm64: enable POE and PIE to coexist
arm64/ptrace: add support for FEAT_POE
arm64: add POE signal support
arm64: implement PKEYS support
arm64: add pte_access_permitted_no_overlay()
arm64: handle PKEY/POE faults
arm64: mask out POIndex when modifying a PTE
arm64: convert protection key into vm_flags and pgprot values
arm64: add POIndex defines
arm64: re-order MTE VM_ flags
arm64: enable the Permission Overlay Extension for EL0
...
|
|
* for-next/pkvm-guest:
arm64: smccc: Reserve block of KVM "vendor" services for pKVM hypercalls
drivers/virt: pkvm: Intercept ioremap using pKVM MMIO_GUARD hypercall
arm64: mm: Add confidential computing hook to ioremap_prot()
drivers/virt: pkvm: Hook up mem_encrypt API using pKVM hypercalls
arm64: mm: Add top-level dispatcher for internal mem_encrypt API
drivers/virt: pkvm: Add initial support for running as a protected guest
firmware/smccc: Call arch-specific hook on discovering KVM services
|
|
* for-next/perf: (33 commits)
perf: arm-ni: Fix an NULL vs IS_ERR() bug
perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled
MAINTAINERS: List Arm interconnect PMUs as supported
perf: Add driver for Arm NI-700 interconnect PMU
dt-bindings/perf: Add Arm NI-700 PMU
perf/arm-cmn: Improve format attr printing
perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check
perf/arm-cmn: Support CMN S3
dt-bindings: perf: arm-cmn: Add CMN S3
perf/arm-cmn: Refactor DTC PMU register access
perf/arm-cmn: Make cycle counts less surprising
perf/arm-cmn: Improve build-time assertion
perf/arm-cmn: Ensure dtm_idx is big enough
perf/arm-cmn: Fix CCLA register offset
perf/arm-cmn: Refactor node ID handling. Again.
drivers/perf: hisi_pcie: Export supported Root Ports [bdf_min, bdf_max]
drivers/perf: hisi_pcie: Fix TLP headers bandwidth counting
drivers/perf: hisi_pcie: Record hardware counts correctly
drivers/perf: arm_spe: Use perf_allow_kernel() for permissions
perf/dwc_pcie: Add support for QCOM vendor devices
...
|
|
* for-next/mm:
arm64/mm: use lm_alias() with addresses passed to memblock_free()
mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags()
arm64: Expose the end of the linear map in PHYSMEM_END
arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec()
arm64/mm: Delete __init region from memblock.reserved
|
|
* for-next/misc:
arm64: hibernate: Fix warning for cast from restricted gfp_t
arm64: esr: Define ESR_ELx_EC_* constants as UL
arm64: Constify struct kobj_type
arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first
arm64/sve: Remove unused declaration read_smcr_features()
arm64: mm: Remove unused declaration early_io_map()
arm64: el2_setup.h: Rename some labels to be more diff-friendly
arm64: signal: Fix some under-bracketed UAPI macros
arm64/mm: Drop TCR_SMP_FLAGS
arm64/mm: Drop PMD_SECT_VALID
|
|
* for-next/errata:
arm64: errata: Enable the AC03_CPU_38 workaround for ampere1a
|
|
* for-next/acpi:
ACPI/IORT: Add PMCG platform information for HiSilicon HIP10/11
ACPI: ARM64: add acpi_iort.h to MAINTAINERS
ACPI/IORT: Switch to use kmemdup_array()
|
|
With iterative development, our codebase can now deal with compressed
buffer misses properly if both in-place I/O and compressed buffer
allocation fail.
Note that if readahead fails (with non-uptodate folios), the original
request will then fall back to synchronous read, and `.read_folio()`
should return appropriate errnos; otherwise -EIO will be passed to
user space, which is unexpected.
To simplify rarely encountered failure paths, a mimic decompression
will be just used. Before that, failure reasons are recorded in
compressed_bvecs[] and they also act as placeholders to avoid in-place
pages. They will be parsed just before decompression and then pass
back to `.read_folio()`.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240905084732.2684515-1-hsiangkao@linux.alibaba.com
|
|
The devm_ioremap() function never returns error pointers, it returns a
NULL pointer if there is an error.
Fixes: 4d5a7680f2b4 ("perf: Add driver for Arm NI-700 interconnect PMU")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/04d6ccc3-6d31-4f0f-ab0f-7a88342cc09a@stanley.mountain
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch fixes the following warning by adding __force
to the cast:
arch/arm64/kernel/hibernate.c:410:44: sparse: warning: cast from restricted gfp_t
No functional change intended.
Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com>
Link: https://lore.kernel.org/r/20240910232507.313555-1-minhuadotchen@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Fixed the word "aren't" to "isn't" based on singular word "bufferage".
Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com>
Link: https://lore.kernel.org/r/20240912012550.13748-2-dennis.lamerice@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into vfs.netfs
Merge patch series "netfs: Read/write improvements" from David Howells
<dhowells@redhat.com>.
* 'netfs-writeback' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (25 commits)
cifs: Don't support ITER_XARRAY
cifs: Switch crypto buffer to use a folio_queue rather than an xarray
cifs: Use iterate_and_advance*() routines directly for hashing
netfs: Cancel dirty folios that have no storage destination
cachefiles, netfs: Fix write to partial block at EOF
netfs: Remove fs/netfs/io.c
netfs: Speed up buffered reading
afs: Make read subreqs async
netfs: Simplify the writeback code
netfs: Provide an iterator-reset function
netfs: Use new folio_queue data type and iterator instead of xarray iter
cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs
iov_iter: Provide copy_folio_from_iter()
mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios
netfs: Use bh-disabling spinlocks for rreq->lock
netfs: Set the request work function upon allocation
netfs: Remove NETFS_COPY_TO_CACHE
netfs: Reserve netfs_sreq_source 0 as unset/unknown
netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode
...
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
There's now no need to support ITER_XARRAY in cifs as netfslib hands down
ITER_FOLIOQ instead - and that's simpler to use with iterate_and_advance()
as it doesn't hold the RCU read lock over the step function.
This is part of the process of phasing out ITER_XARRAY.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Tom Talpey <tom@talpey.com>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: linux-cifs@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-26-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Switch cifs from using an xarray to hold the transport crypto buffer to
using a folio_queue and use ITER_FOLIOQ rather than ITER_XARRAY.
This is part of the process of phasing out ITER_XARRAY.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Tom Talpey <tom@talpey.com>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: linux-cifs@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-25-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Replace the bespoke cifs iterators of ITER_BVEC and ITER_KVEC to do hashing
with iterate_and_advance_kernel() - a variant on iterate_and_advance() that
only supports kernel-internal ITER_* types and not UBUF/IOVEC types.
The bespoke ITER_XARRAY is left because we don't really want to be calling
crypto_shash_update() under the RCU read lock for large amounts of data;
besides, ITER_XARRAY is going to be phased out.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Tom Talpey <tom@talpey.com>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: linux-cifs@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-24-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Kafs wants to be able to cache the contents of directories (and symlinks),
but whilst these are downloaded from the server with the FS.FetchData RPC
op and similar, the same as for regular files, they can't be updated by
FS.StoreData, but rather have special operations (FS.MakeDir, etc.).
Now, rather than redownloading a directory's content after each change made
to that directory, kafs modifies the local blob. This blob can be saved
out to the cache, and since it's using netfslib, kafs just marks the folios
dirty and lets ->writepages() on the directory take care of it, as for an
regular file.
This is fine as long as there's a cache as although the upload stream is
disabled, there's a cache stream to drive the procedure. But if the cache
goes away in the meantime, suddenly there's no way do any writes and the
code gets confused, complains "R=%x: No submit" to dmesg and leaves the
dirty folio hanging.
Fix this by just cancelling the store of the folio if neither stream is
active. (If there's no cache at the time of dirtying, we should just not
mark the folio dirty).
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-23-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Because it uses DIO writes, cachefiles is unable to make a write to the
backing file if that write is not aligned to and sized according to the
backing file's DIO block alignment. This makes it tricky to handle a write
to the cache where the EOF on the network file is not correctly aligned.
To get around this, netfslib attempts to tell the driver it is calling how
much more data there is available beyond the EOF that it can use to pad the
write (netfslib preclears the part of the folio above the EOF). However,
it tries to tell the cache what the maximum length is, but doesn't
calculate this correctly; and, in any case, cachefiles actually ignores the
value and just skips the block.
Fix this by:
(1) Change the value passed to indicate the amount of extra data that can
be added to the operation (now ->submit_extendable_to). This is much
simpler to calculate as it's just the end of the folio minus the top
of the data within the folio - rather than having to account for data
spread over multiple folios.
(2) Make cachefiles add some of this data if the subrequest it is given
ends at the network file's i_size if the extra data is sufficient to
pad out to a whole block.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-22-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Remove fs/netfs/io.c as it is no longer used.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-21-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Improve the efficiency of buffered reads in a number of ways:
(1) Overhaul the algorithm in general so that it's a lot more compact and
split the read submission code between buffered and unbuffered
versions. The unbuffered version can be vastly simplified.
(2) Read-result collection is handed off to a work queue rather than being
done in the I/O thread. Multiple subrequests can be processes
simultaneously.
(3) When a subrequest is collected, any folios it fully spans are
collected and "spare" data on either side is donated to either the
previous or the next subrequest in the sequence.
Notes:
(*) Readahead expansion is massively slows down fio, presumably because it
causes a load of extra allocations, both folio and xarray, up front
before RPC requests can be transmitted.
(*) RDMA with cifs does appear to work, both with SIW and RXE.
(*) PG_private_2-based reading and copy-to-cache is split out into its own
file and altered to use folio_queue. Note that the copy to the cache
now creates a new write transaction against the cache and adds the
folios to be copied into it. This allows it to use part of the
writeback I/O code.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-20-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Perform AFS read subrequests in a work item rather than in the calling
thread. For normal buffered reads, this will allow the calling thread to
copy data from the pagecache to the application at the same time as the
demarshalling thread is shovelling data from skbuffs into the pagecache.
This will also allow the RA mark to trigger a new read before we've
finished shovelling the data from the current one.
Note: This would be a bit safer if the FS.FetchData RPC ops returned the
metadata (including the data version number) before returning the data.
This would allow me to flush the pagecache before installing the new data.
In future, it may be possible to asynchronously flush the pagecache either
side of the region being read.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-19-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use the new folio_queue structures to simplify the writeback code. The
problem with referring to the i_pages xarray directly is that we may have
gaps in the sequence of folios we're writing from that we need to skip when
we're removing the writeback mark from the folios we're writing back from.
At the moment the code tries to deal with this by carefully tracking the
gaps in each writeback stream (eg. write to server and write to cache) and
divining when there's a gap that spans folios (something that's not helped
by folios not being a consistent size).
Instead, the folio_queue buffer contains pointers only the folios we're
dealing with, has them in ascending order and indicates a gap by placing
non-consequitive folios next to each other. This makes it possible to
track where we need to clean up to by just keeping track of where we've
processed to on each stream and taking the minimum.
Note that the I/O iterator is always rounded up to the end of the folio,
even if that is beyond the EOF position, so that the cache can do DIO from
the page. The excess space is cleared, though mmapped writes clobber it.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-18-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Provide a function to reset the iterator on a subrequest.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-17-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make the netfs write-side routines use the new folio_queue struct to hold a
rolling buffer of folios, with the issuer adding folios at the tail and the
collector removing them from the head as they're processed instead of using
an xarray.
This will allow a subsequent patch to simplify the write collector.
The primary mark (as tested by folioq_is_marked()) is used to note if the
corresponding folio needs putting.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-16-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make smb_extract_iter_to_rdma() extract page fragments from an ITER_FOLIOQ
iterator into RDMA SGEs.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Tom Talpey <tom@talpey.com>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: linux-cifs@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-15-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Provide a copy_folio_from_iter() wrapper.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <christian@brauner.io>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20240814203850.2240469-14-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Define a data structure, struct folio_queue, to represent a sequence of
folios and a kernel-internal I/O iterator type, ITER_FOLIOQ, to allow a
list of folio_queue structures to be used to provide a buffer to
iov_iter-taking functions, such as sendmsg and recvmsg.
The folio_queue structure looks like:
struct folio_queue {
struct folio_batch vec;
u8 orders[PAGEVEC_SIZE];
struct folio_queue *next;
struct folio_queue *prev;
unsigned long marks;
unsigned long marks2;
};
It does not use a list_head so that next and/or prev can be set to NULL at
the ends of the list, allowing iov_iter-handling routines to determine that
they *are* the ends without needing to store a head pointer in the iov_iter
struct.
A folio_batch struct is used to hold the folio pointers which allows the
batch to be passed to batch handling functions. Two mark bits are
available per slot. The intention is to use at least one of them to mark
folios that need putting, but that might not be ultimately necessary.
Accessor functions are used to access the slots to do the masking and an
additional accessor function is used to indicate the size of the array.
The order of each folio is also stored in the structure to avoid the need
for iov_iter_advance() and iov_iter_revert() to have to query each folio to
find its size.
With careful barriering, this can be used as an extending buffer with new
folios inserted and new folio_queue structs added without the need for a
lock. Further, provided we always keep at least one struct in the buffer,
we can also remove consumed folios and consumed structs from the head end
as we without the need for locks.
[Questions/thoughts]
(1) To manage this, I need a head pointer, a tail pointer, a tail slot
number (assuming insertion happens at the tail end and the next
pointers point from head to tail). Should I put these into a struct
of their own, say "folio_queue_head" or "rolling_buffer"?
I will end up with two of these in netfs_io_request eventually, one
keeping track of the pagecache I'm dealing with for buffered I/O and
the other to hold a bounce buffer when we need one.
(2) Should I make the slots {folio,off,len} or bio_vec?
(3) This is intended to replace ITER_XARRAY eventually. Using an xarray
in I/O iteration requires the taking of the RCU read lock, doing
copying under the RCU read lock, walking the xarray (which may change
under us), handling retries and dealing with special values.
The advantage of ITER_XARRAY is that when we're dealing with the
pagecache directly, we don't need any allocation - but if we're doing
encrypted comms, there's a good chance we'd be using a bounce buffer
anyway.
This will require afs, erofs, cifs, orangefs and fscache to be
converted to not use this. afs still uses it for dirs and symlinks;
some of erofs usages should be easy to change, but there's one which
won't be so easy; ceph's use via fscache can be fixed by porting ceph
to netfslib; cifs is using xarray as a bounce buffer - that can be
moved to use sheaves instead; and orangefs has a similar problem to
erofs - maybe orangefs could use netfslib?
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Gao Xiang <xiang@kernel.org>
cc: Mike Marshall <hubcap@omnibond.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: linux-erofs@lists.ozlabs.org
cc: devel@lists.orangefs.org
Link: https://lore.kernel.org/r/20240814203850.2240469-13-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When I expanded uidgid mappings I intended for a struct uid_gid_map to
fit into a single cacheline on x86 as they tend to be pretty
performance sensitive (idmapped mounts etc). But a 4 byte hole was added
that brought it over 64 bytes. Fix that by adding the static extent
array and the extent counter into a substruct. C's type punning for
unions guarantees that we can access ->nr_extents even if the last
written to member wasn't within the same object. This is also what we
rely on in struct_group() and friends. This of course relies on
non-strict aliasing which we don't do.
99) If the member used to read the contents of a union object is not the
same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as
described in 6.2.6 (a process sometimes called "type punning").
Link: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf
Link: https://lore.kernel.org/r/20240910-work-uid_gid_map-v1-1-e6bc761363ed@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Christian Brauner <brauner@kernel.org> says:
The f_version member in struct file isn't particularly well-defined. It
is mainly used as a cookie to detect concurrent seeks when iterating
directories. But it is also abused by some subsystems for completely
unrelated things.
It is mostly a directory specific thing that doesn't really need to live
in struct file and with its wonky semantics it really lacks a specific
function.
For pipes, f_version is (ab)used to defer poll notifications until a
write has happened. And struct pipe_inode_info is used by multiple
struct files in their ->private_data so there's no chance of pushing
that down into file->private_data without introducing another pointer
indirection.
But this should be a solvable problem. Only regular files with
FMODE_ATOMIC_POS and directories require f_pos_lock. Pipes and other
files don't. So this adds a union into struct file encompassing
f_pos_lock and a pipe specific f_pipe member that pipes can use. This
union of course can be extended to other file types and is similar to
what we do in struct inode already.
* patches from https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-0-6d3e4816aa7b@kernel.org:
fs: remove f_version
pipe: use f_pipe
fs: add f_pipe
ubifs: store cookie in private data
ufs: store cookie in private data
udf: store cookie in private data
proc: store cookie in private data
ocfs2: store cookie in private data
input: remove f_version abuse
ext4: store cookie in private data
ext2: store cookie in private data
affs: store cookie in private data
fs: add generic_llseek_cookie()
fs: use must_set_pos()
fs: add must_set_pos()
fs: add vfs_setpos_cookie()
s390: remove unused f_version
ceph: remove unused f_version
adi: remove unused f_version
file: remove pointless comment
Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-0-6d3e4816aa7b@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now that detecting concurrent seeks is done by the filesystems that
require it we can remove f_version and free up 8 bytes for future
extensions.
Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-20-6d3e4816aa7b@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pipes use f_version to defer poll notifications until a write has been
observed. Since multiple file's refer to the same struct pipe_inode_info
in their ->private_data moving it into their isn't feasible since we
would need to introduce an additional pointer indirection.
However, since pipes don't require f_pos_lock we placed a new f_pipe
member into a union with f_pos_lock that pipes can use. This is similar
to what we already do for struct inode where we have additional fields
per file type. This will allow us to fully remove f_version in the next
step.
Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-19-6d3e4816aa7b@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Only regular files with FMODE_ATOMIC_POS and directories need
f_pos_lock. Place a new f_pipe member in a union with f_pos_lock
that they can use and make them stop abusing f_version in follow-up
patches.
Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-18-6d3e4816aa7b@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Store the cookie to detect concurrent seeks on directories in
file->private_data.
Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-17-6d3e4816aa7b@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Store the cookie to detect concurrent seeks on directories in
file->private_data.
Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-16-6d3e4816aa7b@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Store the cookie to detect concurrent seeks on directories in
file->private_data.
Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-15-6d3e4816aa7b@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Store the cookie to detect concurrent seeks on directories in
file->private_data.
Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-14-6d3e4816aa7b@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Store the cookie to detect concurrent seeks on directories in
file->private_data.
Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-13-6d3e4816aa7b@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
f_version is removed from struct file. Make input stop abusing f_version
for stashing information for poll. Move the input state counter into
input_seq_state and allocate it via seq_private_open() and free via
seq_release_private().
Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-12-6d3e4816aa7b@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
callbacks under high bus load"
Marc Kleine-Budde <mkl@pengutronix.de> says:
Under high CAN-bus load the struct net_device_ops::{open,stop}
callbacks (m_can_open(), m_can_close()) don't properly start and
shutdown the device.
Fix the problems by re-arranging the order of functions in
m_can_open() and m_can_close().
Link: https://patch.msgid.link/20240910-can-m_can-fix-ifup-v3-0-6c1720ba45ce@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
After calling m_can_stop() an interrupt may be pending or NAPI might
still be executed. This means the driver might still touch registers
of the IP core after the clocks have been disabled. This is not good
practice and might lead to aborts depending on the SoC integration.
To avoid these potential problems, make m_can_close() symmetric to
m_can_open(), i.e. stop the clocks at the end, right before shutting
down the transceiver.
Fixes: e0d1f4816f2a ("can: m_can: add Bosch M_CAN controller support")
Link: https://patch.msgid.link/20240910-can-m_can-fix-ifup-v3-2-6c1720ba45ce@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
If an interrupt (RX-complete or error flag) is set when bringing up
the CAN device, e.g. due to CAN bus traffic before initializing the
device, when m_can_start() is called and interrupts are enabled,
m_can_isr() is called immediately, which disables all CAN interrupts
and calls napi_schedule().
Because napi_enable() isn't called until later in m_can_open(), the
call to napi_schedule() never schedules the m_can_poll() callback and
the device is left with interrupts disabled and can't receive any CAN
packets until rebooted.
This can be verified by running "cansend" from another device before
setting the bitrate and calling "ip link set up can0" on the test
device. Adding debug lines to m_can_isr() shows it's called with flags
(IR_EP | IR_EW | IR_CRCE), which calls m_can_disable_all_interrupts()
and napi_schedule(), and then m_can_poll() is never called.
Move the call to napi_enable() above the call to m_can_start() to
enable any initial interrupt flags to be handled by m_can_poll() so
that interrupts are reenabled. Add a call to napi_disable() in the
error handling section of m_can_open(), to handle the case where later
functions return errors.
Also, in m_can_close(), move the call to napi_disable() below the call
to m_can_stop() to ensure all interrupts are handled when bringing
down the device. This race condition is much less likely to occur.
Tested on a Microchip SAMA7G54 MPU. The fix should be applicable to
any SoC with a Bosch M_CAN controller.
Signed-off-by: Jake Hamby <Jake.Hamby@Teledyne.com>
Fixes: e0d1f4816f2a ("can: m_can: add Bosch M_CAN controller support")
Link: https://patch.msgid.link/20240910-can-m_can-fix-ifup-v3-1-6c1720ba45ce@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|