summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-09-13blk-mq: unconditional nr_integrity_segmentsKeith Busch
Always defining the field will make using it easier and less error prone in future patches. There shouldn't be any downside to this: the field fits in what would otherwise be a 2-byte hole, so we're not saving space by conditionally leaving it out. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20240913182854.2445457-2-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-13mm: Define VM_DROPPABLE for powerpc/32Christophe Leroy
Commit 9651fcedf7b9 ("mm: add MAP_DROPPABLE for designating always lazily freeable mappings") only adds VM_DROPPABLE for 64 bits architectures. In order to also use the getrandom vDSO implementation on powerpc/32, use VM_ARCH_1 for VM_DROPPABLE on powerpc/32. This is possible because VM_ARCH_1 is used for VM_SAO on powerpc and VM_SAO is only for powerpc/64. It is used in combination with PROT_SAO in some parts of code that are restricted to CONFIG_PPC64 through #ifdefs, it is therefore possible to define VM_SAO for CONFIG_PPC64 only. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13hwmon: Remove devm_hwmon_device_unregister() API functionGuenter Roeck
devm_hwmon_device_unregister() has no in-tree user, and its implementation is wrong since it does not pass the to-be-removed hardware monitoring device as parameter. I do not envision a valid use for it; drivers needing it should not have called devm_hwmon_device_register_with_info() in the first place. Remove it. Reported-by: Matthew Sanders <m@ttsande.rs> Closes: https://lore.kernel.org/linux-hwmon/488b3bdf870ea76c4b943dbe5fd15ac8113019dc.camel@kernel.org/ Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-09-13driver core: attribute_container: Remove unused functionsDr. David Alan Gilbert
I can't find any use of 'attribute_container_add_class_device_adapter' or 'attribute_container_trigger' in git history. Their export decls went in 2006: commit 1740757e8f94 ("[PATCH] Driver Core: remove unused exports") and their docs disappeared in 2016: commit 47cb398dd75a ("Docs: sphinxify device-drivers.tmpl") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20240913010955.1393995-1-linux@treblig.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-13Merge branch 'for-6.12/hidraw' into for-linusBenjamin Tissoires
- introduction of HIDIOCREVOKE ioctl to revoke a hidraw fd opened by a regular (non-root) application (Peter Hutterer)
2024-09-13Merge branch 'for-6.12/constify-rdesc' into for-linusBenjamin Tissoires
- Constification of report descriptors so drivers can use read-only memory when declaring report descriptors fixups (Thomas Weißschuh)
2024-09-13Merge branch 'for-6.12/core' into for-linusBenjamin Tissoires
- add helper for finding a field with a certain usage (Kerem Karabay)
2024-09-13Merge branches 'fixes', 'arm/smmu', 'intel/vt-d', 'amd/amd-vi' and 'core' ↵Joerg Roedel
into next
2024-09-13Merge branch 'slab/for-6.12/kmem_cache_args' into slab/for-nextVlastimil Babka
Merge kmem_cache_create() refactoring by Christian Brauner. Note this includes a merge of the vfs.file tree that contains the prerequisity kmem_cache_create_rcu() work.
2024-09-13mm, slab: restore kerneldoc for kmem_cache_create()Vlastimil Babka
As kmem_cache_create() became a _Generic() wrapper macro, it currently has no kerneldoc despite being the main API to use. Add it. Also adjust kmem_cache_create_usercopy() kerneldoc to indicate it is now a legacy wrapper. Also expand the kerneldoc for struct kmem_cache_args, especially for the freeptr_offset field, where important details were removed with the removal of kmem_cache_create_rcu(). Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org>
2024-09-13locking/mutex: Introduce mutex_init_with_key()Bart Van Assche
The following pattern occurs 5 times in kernel drivers: lockdep_register_key(key); __mutex_init(mutex, name, key); In several cases the 'name' argument matches #mutex. Hence, introduce the mutex_init_with_key() macro. This macro derives the 'name' argument from the 'mutex' argument. Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240912223956.3554086-3-bvanassche@acm.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-13locking/mutex: Define mutex_init() onceBart Van Assche
With CONFIG_PREEMPT_RT disabled __mutex_init() is a function. With CONFIG_PREEMPT_RT enabled, __mutex_init() is a macro. I assume this is why mutex_init() is defined twice as exactly the same macro. Prepare for introducing a new macro for mutex initialization by combining the two identical mutex_init() definitions into a single definition. This patch does not change any functionality because the C preprocessor expands macros when it encounters the macro name and not when a macro definition is encountered. See also commit bb630f9f7a7d ("locking/rtmutex: Add mutex variant for RT"). Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240912223956.3554086-2-bvanassche@acm.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-13RDMA/mlx5: Use IB set_netdev and get_netdev functionsChiara Meiohas
The IB layer provides a common interface to store and get net devices associated to an IB device port (ib_device_set_netdev() and ib_device_get_netdev()). Previously, mlx5_ib stored and managed the associated net devices internally. Replace internal net device management in mlx5_ib with ib_device_set_netdev() when attaching/detaching a net device and ib_device_get_netdev() when retrieving the net device. Export ib_device_get_netdev(). For mlx5 representors/PFs/VFs and lag creation we replace the netdev assignments with the IB set/get netdev functions. In active-backup mode lag the active slave net device is stored in the lag itself. To assure the net device stored in a lag bond IB device is the active slave we implement the following: - mlx5_core: when modifying the slave of a bond we send the internal driver event MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE. - mlx5_ib: when catching the event call ib_device_set_netdev() This patch also ensures the correct IB events are sent in switchdev lag. While at it, when in multiport eswitch mode, only a single IB device is created for all ports. The said IB device will receive all netdev events of its VFs once loaded, thus to avoid overwriting the mapping of PF IB device to PF netdev, ignore NETDEV_REGISTER events if the ib device has already been mapped to a netdev. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909173025.30422-6-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13net/mlx5: Handle memory scheme ODP capabilitiesMichael Guralnik
When running over new FW that supports the new memory scheme ODP, set the cap in the FW to signal the FW we are working in the new scheme. In the memory scheme ODP the per_transport_service capabilities are RO for the driver so we skip their setting. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-9-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-12net/mlx5: Add NOT_READY command return statusShay Drory
Add a new command status MLX5_CMD_STAT_NOT_READY to handle cases where the firmware is not ready. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20240911201757.1505453-14-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12net/mlx5: Add device cap for supporting hot reset in sync reset flowMoshe Shemesh
New devices with new FW can support sync reset for firmware activate using hot reset. Add capability for supporting it and add MFRL field to query from FW which type of PCI reset method to use while handling sync reset events. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240911201757.1505453-10-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12net/mlx5: fs, make get_root_namespace API functionMoshe Shemesh
As preparation for HW Steering support, where the function get_root_namespace() is needed to get root FDB, make it an API function and rename it to mlx5_get_root_namespace(). Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20240911201757.1505453-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12security,bpf: constify struct path in bpf_token_create() LSM hookAndrii Nakryiko
There is no reason why struct path pointer shouldn't be const-qualified when being passed into bpf_token_create() LSM hook. Add that const. Acked-by: Paul Moore <paul@paul-moore.com> (LSM/SELinux) Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts (sort of) and no adjacent changes. This merge reverts commit b3c9e65eb227 ("net: hsr: remove seqnr_lock") from net, as it was superseded by commit 430d67bdcb04 ("net: hsr: Use the seqnr lock for frames received via interlink port.") in net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12Merge tag 'net-6.11-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. There is a recently notified BT regression with no fix yet. I do not think a fix will land in the next week. Current release - regressions: - core: tighten bad gso csum offset check in virtio_net_hdr - netfilter: move nf flowtable bpf initialization in nf_flow_table_module_init() - eth: ice: stop calling pci_disable_device() as we use pcim - eth: fou: fix null-ptr-deref in GRO. Current release - new code bugs: - hsr: prevent NULL pointer dereference in hsr_proxy_announce() Previous releases - regressions: - hsr: remove seqnr_lock - netfilter: nft_socket: fix sk refcount leaks - mptcp: pm: fix uaf in __timer_delete_sync - phy: dp83822: fix NULL pointer dereference on DP83825 devices - eth: revert "virtio_net: rx enable premapped mode by default" - eth: octeontx2-af: Modify SMQ flush sequence to drop packets Previous releases - always broken: - eth: mlx5: fix bridge mode operations when there are no VFs - eth: igb: Always call igb_xdp_ring_update_tail() under Tx lock" * tag 'net-6.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) net: netfilter: move nf flowtable bpf initialization in nf_flow_table_module_init() net: tighten bad gso csum offset check in virtio_net_hdr netlink: specs: mptcp: fix port endianness net: dpaa: Pad packets to ETH_ZLEN mptcp: pm: Fix uaf in __timer_delete_sync net: libwx: fix number of Rx and Tx descriptors net: dsa: felix: ignore pending status of TAS module when it's disabled net: hsr: prevent NULL pointer dereference in hsr_proxy_announce() selftests: mptcp: include net_helper.sh file selftests: mptcp: include lib.sh file selftests: mptcp: join: restrict fullmesh endp on 1st sf netfilter: nft_socket: make cgroupsv2 matching work with namespaces netfilter: nft_socket: fix sk refcount leaks MAINTAINERS: Add ethtool pse-pd to PSE NETWORK DRIVER dt-bindings: net: tja11xx: fix the broken binding selftests: net: csum: Fix checksums for packets with non-zero padding net: phy: dp83822: Fix NULL pointer dereference on DP83825 devices virtio_net: disable premapped mode by default Revert "virtio_net: big mode skip the unmap check" Revert "virtio_net: rx remove premapped failover code" ...
2024-09-12Merge tag 'platform-drivers-x86-v6.11-7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - asus-wmi: Disable OOBE that interferes with backlight control - panasonic-laptop: Two fixes to SINF array handling * tag 'platform-drivers-x86-v6.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: asus-wmi: Disable OOBE experience on Zenbook S 16 platform/x86: panasonic-laptop: Allocate 1 entry extra in the sinf array platform/x86: panasonic-laptop: Fix SINF array out of bounds accesses
2024-09-12dmaengine: cirrus: remove platform codeNikita Shubin
Remove DMA platform header, from now on we use device tree for DMA clients. Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-12ARM: ep93xx: soc: drop definesNikita Shubin
Remove unnecessary defines, as we dropped board files. Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-12ARM: ep93xx: delete all boardfilesNikita Shubin
Delete the ep93xx board files. Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-12ata: pata_ep93xx: remove legacy pinctrl useNikita Shubin
Drop legacy acquire/release since we are using pinctrl for this now. Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Andy Shevchenko <andy@kernel.org> Acked-by: Damien Le Moal <dlemoal@kernel.org> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-12pwm: ep93xx: drop legacy pinctrlNikita Shubin
Drop legacy gpio request/free since we are using pinctrl for this now. Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Thierry Reding <thierry.reding@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-12input: keypad: ep93xx: add DT support for Cirrus EP93xxNikita Shubin
- drop flags, they were not used anyway - add OF ID match table - process "autorepeat", "debounce-delay-ms", prescale from device tree - drop platform data usage and it's header - keymap goes from device tree now on Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-12dmaengine: cirrus: Convert to DT for Cirrus EP93xxNikita Shubin
Convert Cirrus EP93xx DMA to device tree usage: - add OF ID match table with data - add of_probe for device tree - add xlate for m2m/m2p - drop subsys_initcall code - drop platform probe - drop platform structs usage >From now on it only supports device tree probing. Co-developed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-12ARM: ep93xx: add regmap aux_devNikita Shubin
The following driver's should be instantiated by ep93xx syscon driver: - reboot - pinctrl - clock They all require access to DEVCFG register with a shared lock held, to avoid conflict writing to swlocked parts of DEVCFG. Provide common resources such as base, regmap and spinlock via auxiliary bus framework. Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-12Merge branch 'ep93xx/clk-dependency' into ep93xx/dt-conversionArnd Bergmann
This is a dependency for clk driver Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-12firewire: core: update documentation of kernel APIs for flushing completionsTakashi Sakamoto
There is a slight difference between fw_iso_context_flush_completions() and fw_iso_context_schedule_flush_completions(). This commit updates the documentations for them. Link: https://lore.kernel.org/r/20240912133038.238786-5-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-09-12Revert "firewire: core: use mutex to coordinate concurrent calls to flush ↵Takashi Sakamoto
completions" This reverts commit d9605d67562505e27dcc0f71af418118d3db91e5, since this commit is on the following reverted changes. Link: https://lore.kernel.org/r/20240912133038.238786-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-09-12Merge branch 'for-next/poe' into for-next/coreWill Deacon
* for-next/poe: (31 commits) arm64: pkeys: remove redundant WARN kselftest/arm64: Add test case for POR_EL0 signal frame records kselftest/arm64: parse POE_MAGIC in a signal frame kselftest/arm64: add HWCAP test for FEAT_S1POE selftests: mm: make protection_keys test work on arm64 selftests: mm: move fpregs printing kselftest/arm64: move get_header() arm64: add Permission Overlay Extension Kconfig arm64: enable PKEY support for CPUs with S1POE arm64: enable POE and PIE to coexist arm64/ptrace: add support for FEAT_POE arm64: add POE signal support arm64: implement PKEYS support arm64: add pte_access_permitted_no_overlay() arm64: handle PKEY/POE faults arm64: mask out POIndex when modifying a PTE arm64: convert protection key into vm_flags and pgprot values arm64: add POIndex defines arm64: re-order MTE VM_ flags arm64: enable the Permission Overlay Extension for EL0 ...
2024-09-12Merge branch 'for-next/pkvm-guest' into for-next/coreWill Deacon
* for-next/pkvm-guest: arm64: smccc: Reserve block of KVM "vendor" services for pKVM hypercalls drivers/virt: pkvm: Intercept ioremap using pKVM MMIO_GUARD hypercall arm64: mm: Add confidential computing hook to ioremap_prot() drivers/virt: pkvm: Hook up mem_encrypt API using pKVM hypercalls arm64: mm: Add top-level dispatcher for internal mem_encrypt API drivers/virt: pkvm: Add initial support for running as a protected guest firmware/smccc: Call arch-specific hook on discovering KVM services
2024-09-12spi: Merge up fixesMark Brown
A patch for Qualcomm depends on some fixes.
2024-09-12cifs: Use iterate_and_advance*() routines directly for hashingDavid Howells
Replace the bespoke cifs iterators of ITER_BVEC and ITER_KVEC to do hashing with iterate_and_advance_kernel() - a variant on iterate_and_advance() that only supports kernel-internal ITER_* types and not UBUF/IOVEC types. The bespoke ITER_XARRAY is left because we don't really want to be calling crypto_shash_update() under the RCU read lock for large amounts of data; besides, ITER_XARRAY is going to be phased out. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Enzo Matsumiya <ematsumiya@suse.de> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-24-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12cachefiles, netfs: Fix write to partial block at EOFDavid Howells
Because it uses DIO writes, cachefiles is unable to make a write to the backing file if that write is not aligned to and sized according to the backing file's DIO block alignment. This makes it tricky to handle a write to the cache where the EOF on the network file is not correctly aligned. To get around this, netfslib attempts to tell the driver it is calling how much more data there is available beyond the EOF that it can use to pad the write (netfslib preclears the part of the folio above the EOF). However, it tries to tell the cache what the maximum length is, but doesn't calculate this correctly; and, in any case, cachefiles actually ignores the value and just skips the block. Fix this by: (1) Change the value passed to indicate the amount of extra data that can be added to the operation (now ->submit_extendable_to). This is much simpler to calculate as it's just the end of the folio minus the top of the data within the folio - rather than having to account for data spread over multiple folios. (2) Make cachefiles add some of this data if the subrequest it is given ends at the network file's i_size if the extra data is sufficient to pad out to a whole block. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-22-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12netfs: Speed up buffered readingDavid Howells
Improve the efficiency of buffered reads in a number of ways: (1) Overhaul the algorithm in general so that it's a lot more compact and split the read submission code between buffered and unbuffered versions. The unbuffered version can be vastly simplified. (2) Read-result collection is handed off to a work queue rather than being done in the I/O thread. Multiple subrequests can be processes simultaneously. (3) When a subrequest is collected, any folios it fully spans are collected and "spare" data on either side is donated to either the previous or the next subrequest in the sequence. Notes: (*) Readahead expansion is massively slows down fio, presumably because it causes a load of extra allocations, both folio and xarray, up front before RPC requests can be transmitted. (*) RDMA with cifs does appear to work, both with SIW and RXE. (*) PG_private_2-based reading and copy-to-cache is split out into its own file and altered to use folio_queue. Note that the copy to the cache now creates a new write transaction against the cache and adds the folios to be copied into it. This allows it to use part of the writeback I/O code. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-20-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12netfs: Simplify the writeback codeDavid Howells
Use the new folio_queue structures to simplify the writeback code. The problem with referring to the i_pages xarray directly is that we may have gaps in the sequence of folios we're writing from that we need to skip when we're removing the writeback mark from the folios we're writing back from. At the moment the code tries to deal with this by carefully tracking the gaps in each writeback stream (eg. write to server and write to cache) and divining when there's a gap that spans folios (something that's not helped by folios not being a consistent size). Instead, the folio_queue buffer contains pointers only the folios we're dealing with, has them in ascending order and indicates a gap by placing non-consequitive folios next to each other. This makes it possible to track where we need to clean up to by just keeping track of where we've processed to on each stream and taking the minimum. Note that the I/O iterator is always rounded up to the end of the folio, even if that is beyond the EOF position, so that the cache can do DIO from the page. The excess space is cleared, though mmapped writes clobber it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-18-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12netfs: Use new folio_queue data type and iterator instead of xarray iterDavid Howells
Make the netfs write-side routines use the new folio_queue struct to hold a rolling buffer of folios, with the issuer adding folios at the tail and the collector removing them from the head as they're processed instead of using an xarray. This will allow a subsequent patch to simplify the write collector. The primary mark (as tested by folioq_is_marked()) is used to note if the corresponding folio needs putting. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-16-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12iov_iter: Provide copy_folio_from_iter()David Howells
Provide a copy_folio_from_iter() wrapper. Signed-off-by: David Howells <dhowells@redhat.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20240814203850.2240469-14-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of foliosDavid Howells
Define a data structure, struct folio_queue, to represent a sequence of folios and a kernel-internal I/O iterator type, ITER_FOLIOQ, to allow a list of folio_queue structures to be used to provide a buffer to iov_iter-taking functions, such as sendmsg and recvmsg. The folio_queue structure looks like: struct folio_queue { struct folio_batch vec; u8 orders[PAGEVEC_SIZE]; struct folio_queue *next; struct folio_queue *prev; unsigned long marks; unsigned long marks2; }; It does not use a list_head so that next and/or prev can be set to NULL at the ends of the list, allowing iov_iter-handling routines to determine that they *are* the ends without needing to store a head pointer in the iov_iter struct. A folio_batch struct is used to hold the folio pointers which allows the batch to be passed to batch handling functions. Two mark bits are available per slot. The intention is to use at least one of them to mark folios that need putting, but that might not be ultimately necessary. Accessor functions are used to access the slots to do the masking and an additional accessor function is used to indicate the size of the array. The order of each folio is also stored in the structure to avoid the need for iov_iter_advance() and iov_iter_revert() to have to query each folio to find its size. With careful barriering, this can be used as an extending buffer with new folios inserted and new folio_queue structs added without the need for a lock. Further, provided we always keep at least one struct in the buffer, we can also remove consumed folios and consumed structs from the head end as we without the need for locks. [Questions/thoughts] (1) To manage this, I need a head pointer, a tail pointer, a tail slot number (assuming insertion happens at the tail end and the next pointers point from head to tail). Should I put these into a struct of their own, say "folio_queue_head" or "rolling_buffer"? I will end up with two of these in netfs_io_request eventually, one keeping track of the pagecache I'm dealing with for buffered I/O and the other to hold a bounce buffer when we need one. (2) Should I make the slots {folio,off,len} or bio_vec? (3) This is intended to replace ITER_XARRAY eventually. Using an xarray in I/O iteration requires the taking of the RCU read lock, doing copying under the RCU read lock, walking the xarray (which may change under us), handling retries and dealing with special values. The advantage of ITER_XARRAY is that when we're dealing with the pagecache directly, we don't need any allocation - but if we're doing encrypted comms, there's a good chance we'd be using a bounce buffer anyway. This will require afs, erofs, cifs, orangefs and fscache to be converted to not use this. afs still uses it for dirs and symlinks; some of erofs usages should be easy to change, but there's one which won't be so easy; ceph's use via fscache can be fixed by porting ceph to netfslib; cifs is using xarray as a bounce buffer - that can be moved to use sheaves instead; and orangefs has a similar problem to erofs - maybe orangefs could use netfslib? Signed-off-by: David Howells <dhowells@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: Steve French <sfrench@samba.org> cc: Ilya Dryomov <idryomov@gmail.com> cc: Gao Xiang <xiang@kernel.org> cc: Mike Marshall <hubcap@omnibond.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: linux-erofs@lists.ozlabs.org cc: devel@lists.orangefs.org Link: https://lore.kernel.org/r/20240814203850.2240469-13-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12uidgid: make sure we fit into one cachelineChristian Brauner
When I expanded uidgid mappings I intended for a struct uid_gid_map to fit into a single cacheline on x86 as they tend to be pretty performance sensitive (idmapped mounts etc). But a 4 byte hole was added that brought it over 64 bytes. Fix that by adding the static extent array and the extent counter into a substruct. C's type punning for unions guarantees that we can access ->nr_extents even if the last written to member wasn't within the same object. This is also what we rely on in struct_group() and friends. This of course relies on non-strict aliasing which we don't do. 99) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). Link: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf Link: https://lore.kernel.org/r/20240910-work-uid_gid_map-v1-1-e6bc761363ed@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12fs: remove f_versionChristian Brauner
Now that detecting concurrent seeks is done by the filesystems that require it we can remove f_version and free up 8 bytes for future extensions. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-20-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12fs: add f_pipeChristian Brauner
Only regular files with FMODE_ATOMIC_POS and directories need f_pos_lock. Place a new f_pipe member in a union with f_pos_lock that they can use and make them stop abusing f_version in follow-up patches. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-18-6d3e4816aa7b@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-11net: ethernet: oa_tc6: add helper function to enable zero align rx frameParthiban Veerasooran
Zero align receive frame feature can be enabled to align all receive ethernet frames data to start at the beginning of any receive data chunk payload with a start word offset (SWO) of zero. Receive frames may begin anywhere within the receive data chunk payload when this feature is not enabled. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com> Link: https://patch.msgid.link/20240909082514.262942-13-Parthiban.Veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet framesParthiban Veerasooran
The transmit ethernet frame will be converted into multiple transmit data chunks. Each transmit data chunk consists of a 4 bytes header followed by a 64 bytes transmit data chunk payload. The 4 bytes data header occurs at the beginning of each transmit data chunk on MOSI. The data header contains the information needed to determine the validity and location of the transmit frame data within the data chunk payload. The number of transmit data chunks transmitted to mac-phy is limited to the number transmit credits available in the mac-phy. Initially the transmit credits will be updated from the buffer status register and then it will be updated from the footer received on each spi data transfer. The received footer will be examined for the transmit errors if any. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com> Link: https://patch.msgid.link/20240909082514.262942-10-Parthiban.Veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: ethernet: oa_tc6: implement internal PHY initializationParthiban Veerasooran
Internal PHY is initialized as per the PHY register capability supported by the MAC-PHY. Direct PHY Register Access Capability indicates if PHY registers are directly accessible within the SPI register memory space. Indirect PHY Register Access Capability indicates if PHY registers are indirectly accessible through the MDIO/MDC registers MDIOACCn defined in OPEN Alliance specification. Currently the direct register access is only supported. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com> Link: https://patch.msgid.link/20240909082514.262942-7-Parthiban.Veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: ethernet: oa_tc6: implement register read operationParthiban Veerasooran
Implement register read operation according to the control communication specified in the OPEN Alliance 10BASE-T1x MACPHY Serial Interface document. Control read commands are used by the SPI host to read registers within the MAC-PHY. Each control read commands are composed of a 32 bits control command header. The MAC-PHY ignores all data from the SPI host following the control header for the remainder of the control read command. Control read commands can read either a single register or multiple consecutive registers. When multiple consecutive registers are read, the address is automatically post-incremented by the MAC-PHY. Reading any unimplemented or undefined registers shall return zero. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com> Link: https://patch.msgid.link/20240909082514.262942-4-Parthiban.Veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: ethernet: oa_tc6: implement register write operationParthiban Veerasooran
Implement register write operation according to the control communication specified in the OPEN Alliance 10BASE-T1x MACPHY Serial Interface document. Control write commands are used by the SPI host to write registers within the MAC-PHY. Each control write commands are composed of a 32 bits control command header followed by register write data. The MAC-PHY ignores the final 32 bits of data from the SPI host at the end of the control write command. The write command and data is also echoed from the MAC-PHY back to the SPI host to enable the SPI host to identify which register write failed in the case of any bus errors. Control write commands can write either a single register or multiple consecutive registers. When multiple consecutive registers are written, the address is automatically post-incremented by the MAC-PHY. Writing to any unimplemented or undefined registers shall be ignored and yield no effect. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com> Link: https://patch.msgid.link/20240909082514.262942-3-Parthiban.Veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>