summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2023-10-30Merge tag 'vfs-6.7.xattr' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs xattr updates from Christian Brauner: "The 's_xattr' field of 'struct super_block' currently requires a mutable table of 'struct xattr_handler' entries (although each handler itself is const). However, no code in vfs actually modifies the tables. This changes the type of 's_xattr' to allow const tables, and modifies existing file systems to move their tables to .rodata. This is desirable because these tables contain entries with function pointers in them; moving them to .rodata makes it considerably less likely to be modified accidentally or maliciously at runtime" * tag 'vfs-6.7.xattr' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (30 commits) const_structs.checkpatch: add xattr_handler net: move sockfs_xattr_handlers to .rodata shmem: move shmem_xattr_handlers to .rodata overlayfs: move xattr tables to .rodata xfs: move xfs_xattr_handlers to .rodata ubifs: move ubifs_xattr_handlers to .rodata squashfs: move squashfs_xattr_handlers to .rodata smb: move cifs_xattr_handlers to .rodata reiserfs: move reiserfs_xattr_handlers to .rodata orangefs: move orangefs_xattr_handlers to .rodata ocfs2: move ocfs2_xattr_handlers and ocfs2_xattr_handler_map to .rodata ntfs3: move ntfs_xattr_handlers to .rodata nfs: move nfs4_xattr_handlers to .rodata kernfs: move kernfs_xattr_handlers to .rodata jfs: move jfs_xattr_handlers to .rodata jffs2: move jffs2_xattr_handlers to .rodata hfsplus: move hfsplus_xattr_handlers to .rodata hfs: move hfs_xattr_handlers to .rodata gfs2: move gfs2_xattr_handlers_max to .rodata fuse: move fuse_xattr_handlers to .rodata ...
2023-10-30Merge tag 'vfs-6.7.iov_iter' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull iov_iter updates from Christian Brauner: "This contain's David's iov_iter cleanup work to convert the iov_iter iteration macros to inline functions: - Remove last_offset from iov_iter as it was only used by ITER_PIPE - Add a __user tag on copy_mc_to_user()'s dst argument on x86 to match that on powerpc and get rid of a sparse warning - Convert iter->user_backed to user_backed_iter() in the sound PCM driver - Convert iter->user_backed to user_backed_iter() in a couple of infiniband drivers - Renumber the type enum so that the ITER_* constants match the order in iterate_and_advance*() - Since the preceding patch puts UBUF and IOVEC at 0 and 1, change user_backed_iter() to just use the type value and get rid of the extra flag - Convert the iov_iter iteration macros to always-inline functions to make the code easier to follow. It uses function pointers, but they get optimised away - Move the check for ->copy_mc to _copy_from_iter() and copy_page_from_iter_atomic() rather than in memcpy_from_iter_mc() where it gets repeated for every segment. Instead, we check once and invoke a side function that can use iterate_bvec() rather than iterate_and_advance() and supply a different step function - Move the copy-and-csum code to net/ where it can be in proximity with the code that uses it - Fold memcpy_and_csum() in to its two users - Move csum_and_copy_from_iter_full() out of line and merge in csum_and_copy_from_iter() since the former is the only caller of the latter - Move hash_and_copy_to_iter() to net/ where it can be with its only caller" * tag 'vfs-6.7.iov_iter' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iov_iter, net: Move hash_and_copy_to_iter() to net/ iov_iter, net: Merge csum_and_copy_from_iter{,_full}() together iov_iter, net: Fold in csum_and_memcpy() iov_iter, net: Move csum_and_copy_to/from_iter() to net/ iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() iov_iter: Convert iterate*() to inline funcs iov_iter: Derive user-backedness from the iterator type iov_iter: Renumber ITER_* constants infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC sound: Fix snd_pcm_readv()/writev() to use iov access functions iov_iter, x86: Be consistent about the __user tag on copy_mc_to_user() iov_iter: Remove last_offset from iov_iter as it was for ITER_PIPE
2023-10-30Merge tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds
Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Rename and export helpers that get write access to a mount. They are used in overlayfs to get write access to the upper mount. - Print the pretty name of the root device on boot failure. This helps in scenarios where we would usually only print "unknown-block(1,2)". - Add an internal SB_I_NOUMASK flag. This is another part in the endless POSIX ACL saga in a way. When POSIX ACLs are enabled via SB_POSIXACL the vfs cannot strip the umask because if the relevant inode has POSIX ACLs set it might take the umask from there. But if the inode doesn't have any POSIX ACLs set then we apply the umask in the filesytem itself. So we end up with: (1) no SB_POSIXACL -> strip umask in vfs (2) SB_POSIXACL -> strip umask in filesystem The umask semantics associated with SB_POSIXACL allowed filesystems that don't even support POSIX ACLs at all to raise SB_POSIXACL purely to avoid umask stripping. That specifically means NFS v4 and Overlayfs. NFS v4 does it because it delegates this to the server and Overlayfs because it needs to delegate umask stripping to the upper filesystem, i.e., the filesystem used as the writable layer. This went so far that SB_POSIXACL is raised eve on kernels that don't even have POSIX ACL support at all. Stop this blatant abuse and add SB_I_NOUMASK which is an internal superblock flag that filesystems can raise to opt out of umask handling. That should really only be the two mentioned above. It's not that we want any filesystems to do this. Ideally we have all umask handling always in the vfs. - Make overlayfs use SB_I_NOUMASK too. - Now that we have SB_I_NOUMASK, stop checking for SB_POSIXACL in IS_POSIXACL() if the kernel doesn't have support for it. This is a very old patch but it's only possible to do this now with the wider cleanup that was done. - Follow-up work on fake path handling from last cycle. Citing mostly from Amir: When overlayfs was first merged, overlayfs files of regular files and directories, the ones that are installed in file table, had a "fake" path, namely, f_path is the overlayfs path and f_inode is the "real" inode on the underlying filesystem. In v6.5, we took another small step by introducing of the backing_file container and the file_real_path() helper. This change allowed vfs and filesystem code to get the "real" path of an overlayfs backing file. With this change, we were able to make fsnotify work correctly and report events on the "real" filesystem objects that were accessed via overlayfs. This method works fine, but it still leaves the vfs vulnerable to new code that is not aware of files with fake path. A recent example is commit db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version"). This commit uses direct referencing to f_path in IMA code that otherwise uses file_inode() and file_dentry() to reference the filesystem objects that it is measuring. This contains work to switch things around: instead of having filesystem code opt-in to get the "real" path, have generic code opt-in for the "fake" path in the few places that it is needed. Is it far more likely that new filesystems code that does not use the file_dentry() and file_real_path() helpers will end up causing crashes or averting LSM/audit rules if we keep the "fake" path exposed by default. This change already makes file_dentry() moot, but for now we did not change this helper just added a WARN_ON() in ovl_d_real() to catch if we have made any wrong assumptions. After the dust settles on this change, we can make file_dentry() a plain accessor and we can drop the inode argument to ->d_real(). - Switch struct file to SLAB_TYPESAFE_BY_RCU. This looks like a small change but it really isn't and I would like to see everyone on their tippie toes for any possible bugs from this work. Essentially we've been doing most of what SLAB_TYPESAFE_BY_RCU for files since a very long time because of the nasty interactions between the SCM_RIGHTS file descriptor garbage collection. So extending it makes a lot of sense but it is a subtle change. There are almost no places that fiddle with file rcu semantics directly and the ones that did mess around with struct file internal under rcu have been made to stop doing that because it really was always dodgy. I forgot to put in the link tag for this change and the discussion in the commit so adding it into the merge message: https://lore.kernel.org/r/20230926162228.68666-1-mjguzik@gmail.com Cleanups: - Various smaller pipe cleanups including the removal of a spin lock that was only used to protect against writes without pipe_lock() from O_NOTIFICATION_PIPE aka watch queues. As that was never implemented remove the additional locking from pipe_write(). - Annotate struct watch_filter with the new __counted_by attribute. - Clarify do_unlinkat() cleanup so that it doesn't look like an extra iput() is done that would cause issues. - Simplify file cleanup when the file has never been opened. - Use module helper instead of open-coding it. - Predict error unlikely for stale retry. - Use WRITE_ONCE() for mount expiry field instead of just commenting that one hopes the compiler doesn't get smart. Fixes: - Fix readahead on block devices. - Fix writeback when layztime is enabled and inodes whose timestamp is the only thing that changed reside on wb->b_dirty_time. This caused excessively large zombie memory cgroup when lazytime was enabled as such inodes weren't handled fast enough. - Convert BUG_ON() to WARN_ON_ONCE() in open_last_lookups()" * tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (26 commits) file, i915: fix file reference for mmap_singleton() vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookups writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs chardev: Simplify usage of try_module_get() ovl: rely on SB_I_NOUMASK fs: fix umask on NFS with CONFIG_FS_POSIX_ACL=n fs: store real path instead of fake path in backing file f_path fs: create helper file_user_path() for user displayed mapped file path fs: get mnt_writers count for an open backing file's real path vfs: stop counting on gcc not messing with mnt_expiry_mark if not asked vfs: predict the error in retry_estale as unlikely backing file: free directly vfs: fix readahead(2) on block devices io_uring: use files_lookup_fd_locked() file: convert to SLAB_TYPESAFE_BY_RCU vfs: shave work on failed file open fs: simplify misleading code to remove ambiguity regarding ihold()/iput() watch_queue: Annotate struct watch_filter with __counted_by fs/pipe: use spinlock in pipe_read() only if there is a watch_queue fs/pipe: remove unnecessary spinlock from pipe_write() ...
2023-10-30Merge tag 'vfs-6.7.super' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs superblock updates from Christian Brauner: "This contains the work to make block device opening functions return a struct bdev_handle instead of just a struct block_device. The same struct bdev_handle is then also passed to block device closing functions. This allows us to propagate context from opening to closing a block device without having to modify all users everytime. Sidenote, in the future we might even want to try and have block device opening functions return a struct file directly but that's a series on top of this. These are further preparatory changes to be able to count writable opens and blocking writes to mounted block devices. That's a separate piece of work for next cycle and for that we absolutely need the changes to btrfs that have been quietly dropped somehow. Originally the series contained a patch that removed the old blkdev_*() helpers. But since this would've caused needles churn in -next for bcachefs we ended up delaying it. The second piece of work addresses one of the major annoyances about the work last cycle, namely that we required dropping s_umount whenever we used the superblock and fs_holder_ops for a block device. The reason for that requirement had been that in some codepaths s_umount could've been taken under disk->open_mutex (that's always been the case, at least theoretically). For example, on surprise block device removal or media change. And opening and closing block devices required grabbing disk->open_mutex as well. So we did the work and went through the block layer and fixed all those places so that s_umount is never taken under disk->open_mutex. This means no more brittle games where we yield and reacquire s_umount during block device opening and closing and no more requirements where block devices need to be closed. Filesystems don't need to care about this. There's a bunch of other follow-up work such as moving block device freezing and thawing to holder operations which makes it work for all block devices and not just the main block device just as we did for surprise removal. But that is for next cycle. Tested with fstests for all major fses, blktests, LTP" * tag 'vfs-6.7.super' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (37 commits) porting: update locking requirements fs: assert that open_mutex isn't held over holder ops block: assert that we're not holding open_mutex over blk_report_disk_dead block: move bdev_mark_dead out of disk_check_media_change block: WARN_ON_ONCE() when we remove active partitions block: simplify bdev_del_partition() fs: Avoid grabbing sb->s_umount under bdev->bd_holder_lock jfs: fix log->bdev_handle null ptr deref in lbmStartIO bcache: Fixup error handling in register_cache() xfs: Convert to bdev_open_by_path() reiserfs: Convert to bdev_open_by_dev/path() ocfs2: Convert to use bdev_open_by_dev() nfs/blocklayout: Convert to use bdev_open_by_dev/path() jfs: Convert to bdev_open_by_dev() f2fs: Convert to bdev_open_by_dev/path() ext4: Convert to bdev_open_by_dev() erofs: Convert to use bdev_open_by_path() btrfs: Convert to bdev_open_by_path() fs: Convert to bdev_open_by_dev() mm/swap: Convert to use bdev_open_by_dev() ...
2023-10-30fbdev: stifb: Make the STI next font pointer a 32-bit signed offsetHelge Deller
The pointer to the next STI font is actually a signed 32-bit offset. With this change the 64-bit kernel will correctly subract the (signed 32-bit) offset instead of adding a (unsigned 32-bit) offset. It has no effect on 32-bit kernels. This fixes the stifb driver with a 64-bit kernel on qemu. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org
2023-10-29dt-bindings: watchdog: aspeed-wdt: Add aspeed,reset-mask propertyZev Weiss
This property configures the Aspeed watchdog timer's reset mask, which controls which peripherals are reset when the watchdog timer expires. Some platforms require that certain devices be left untouched across a reboot; aspeed,reset-mask can now be used to express such constraints. Signed-off-by: Zev Weiss <zev@bewilderbeest.net> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20230922104231.1434-5-zev@bewilderbeest.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2023-10-28ubi: fastmap: Add control in 'UBI_IOCATT' ioctl to reserve PEBs for filling ↵Zhihao Cheng
pools This patch imports a new field 'need_resv_pool' in struct 'ubi_attach_req' to control whether or not reserving free PEBs for filling pool/wl_pool. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217787 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2023-10-28seq_buf: Introduce DECLARE_SEQ_BUF and seq_buf_str()Kees Cook
Solve two ergonomic issues with struct seq_buf; 1) Too much boilerplate is required to initialize: struct seq_buf s; char buf[32]; seq_buf_init(s, buf, sizeof(buf)); Instead, we can build this directly on the stack. Provide DECLARE_SEQ_BUF() macro to do this: DECLARE_SEQ_BUF(s, 32); 2) %NUL termination is fragile and requires 2 steps to get a valid C String (and is a layering violation exposing the "internals" of seq_buf): seq_buf_terminate(s); do_something(s->buffer); Instead, we can just return s->buffer directly after terminating it in the refactored seq_buf_terminate(), now known as seq_buf_str(): do_something(seq_buf_str(s)); Link: https://lore.kernel.org/linux-trace-kernel/20231027155634.make.260-kees@kernel.org Link: https://lore.kernel.org/linux-trace-kernel/20231026194033.it.702-kees@kernel.org/ Cc: Yosry Ahmed <yosryahmed@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Justin Stitt <justinstitt@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Petr Mladek <pmladek@suse.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Yun Zhou <yun.zhou@windriver.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-28Merge branch 'pci/misc'Bjorn Helgaas
- Prevent xHCI driver from claiming AMD VanGogh USB3 DRD device so dwc3 can claim it instead (Vicki Pfau) - Make pci_assign_unassigned_resources() non-init because sparc uses it after init-time (Randy Dunlap) - Remove logic_outb(), _outw(), outl() duplicate declarations (John Sanpe) - Remove unnecessary UTF-8 in Kconfig help text that confuses menuconfig (Liu Song) - Fix double free in __pci_epc_create() (Dan Carpenter) - Simplify pcie_capability_clear_and_set_word() cases that could be pcie_capability_clear_word() (Ilpo Järvinen) * pci/misc: PCI: Simplify pcie_capability_clear_and_set_word() to ..._clear_word() PCI: endpoint: Fix double free in __pci_epc_create() PCI: Replace unnecessary UTF-8 in Kconfig logic_pio: Remove logic_outb(), _outw(), outl() duplicate declarations PCI: Make pci_assign_unassigned_resources() non-init PCI: Prevent xHCI driver from claiming AMD VanGogh USB3 DRD device
2023-10-28Merge branch 'pci/field-get'Bjorn Helgaas
- Use FIELD_GET()/FIELD_PREP() when possible throughout drivers/pci/ (Ilpo Järvinen, Bjorn Helgaas) - Rework DPC control programming for clarity (Ilpo Järvinen) * pci/field-get: PCI/portdrv: Use FIELD_GET() PCI/VC: Use FIELD_GET() PCI/PTM: Use FIELD_GET() PCI/PME: Use FIELD_GET() PCI/ATS: Use FIELD_GET() PCI/ATS: Show PASID Capability register width in bitmasks PCI: Use FIELD_GET() in Sapphire RX 5600 XT Pulse quirk PCI: Use FIELD_GET() PCI/MSI: Use FIELD_GET/PREP() PCI/DPC: Use defines with DPC reason fields PCI/DPC: Use defined fields with DPC_CTL register PCI/DPC: Use FIELD_GET() PCI: hotplug: Use FIELD_GET/PREP() PCI: dwc: Use FIELD_GET/PREP() PCI: cadence: Use FIELD_GET() PCI: Use FIELD_GET() to extract Link Width PCI: mvebu: Use FIELD_PREP() with Link Width PCI: tegra194: Use FIELD_GET()/FIELD_PREP() with Link Width fields # Conflicts: # drivers/pci/controller/dwc/pcie-tegra194.c
2023-10-28Merge branch 'pci/vga'Bjorn Helgaas
- Add pci_is_vga() helper, which checks for both PCI_CLASS_DISPLAY_VGA and PCI_CLASS_NOT_DEFINED_VGA (which catches ancient devices built before Class Codes were defined) (Sui Jingfeng) - Use the new pci_is_vga() to identify devices for the VGA arbiter, the sysfs "boot_vga" attribute, and the virtio and qxl drivers (SUi Jingfeng) * pci/vga: drm/qxl: Use pci_is_vga() to identify VGA devices drm/virtio: Use pci_is_vga() to identify VGA devices PCI/sysfs: Enable 'boot_vga' attribute via pci_is_vga() PCI/VGA: Select VGA devices earlier PCI/VGA: Use pci_is_vga() to identify VGA devices PCI: Add pci_is_vga() helper
2023-10-28Merge branch 'pci/enumeration'Bjorn Helgaas
- Add and use pci_get_base_class() to search for all PCI_BASE_CLASS_DISPLAY devices (Sui Jingfeng) - Fix a vmd check for multi-function devices (Ilpo Järvinen) - Add PCI_HEADER_TYPE_MFD and use it to replace literals (Ilpo Järvinen) - Use acpi_evaluate_dsm_typed() instead of open-coding it (Andy Shevchenko) - Keep .remove() and .probe() callbacks (previously marked __init) in case they're used via sysfs (Uwe Kleine-König) * pci/enumeration: PCI: keystone: Don't discard .probe() callback PCI: keystone: Don't discard .remove() callback PCI: kirin: Don't discard .remove() callback PCI: exynos: Don't discard .remove() callback PCI/ACPI: Use acpi_evaluate_dsm_typed() PCI: Use PCI_HEADER_TYPE_* instead of literals PCI: Add PCI_HEADER_TYPE_MFD definition PCI: vmd: Correct PCI Header Type Register's multi-function check drm/radeon: Use pci_get_base_class() to reduce duplicated code drm/amdgpu: Use pci_get_base_class() to reduce duplicated code drm/nouveau: Use pci_get_base_class() to reduce duplicated code ALSA: hda: Use pci_get_base_class() to reduce duplicated code PCI: Add pci_get_base_class() helper
2023-10-28fs: fix build error with CONFIG_EXPORTFS=m or not definedAmir Goldstein
Many of the filesystems that call the generic exportfs helpers do not select the EXPORTFS config. Move generic_encode_ino32_fh() to libfs.c, same as generic_fh_to_*() to avoid having to fix all those config dependencies. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310262151.renqMvme-lkp@intel.com/ Fixes: dfaf653dc415 ("exportfs: make ->encode_fh() a mandatory method for NFS export") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231026204540.143217-1-amir73il@gmail.com Tested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28exportfs: support encoding non-decodeable file handles by defaultAmir Goldstein
AT_HANDLE_FID was added as an API for name_to_handle_at() that request the encoding of a file id, which is not intended to be decoded. This file id is used by fanotify to describe objects in events. So far, overlayfs is the only filesystem that supports encoding non-decodeable file ids, by providing export_operations with an ->encode_fh() method and without a ->decode_fh() method. Add support for encoding non-decodeable file ids to all the filesystems that do not provide export_operations, by encoding a file id of type FILEID_INO64_GEN from { i_ino, i_generation }. A filesystem may that does not support NFS export, can opt-out of encoding non-decodeable file ids for fanotify by defining an empty export_operations struct (i.e. with a NULL ->encode_fh() method). This allows the use of fanotify events with file ids on filesystems like 9p which do not support NFS export to bring fanotify in feature parity with inotify on those filesystems. Note that fanotify also requires that the filesystems report a non-null fsid. Currently, many simple filesystems that have support for inotify (e.g. debugfs, tracefs, sysfs) report a null fsid, so can still not be used with fanotify in file id reporting mode. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-5-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28exportfs: define FILEID_INO64_GEN* file handle typesAmir Goldstein
Similar to the common FILEID_INO32* file handle types, define common FILEID_INO64* file handle types. The type values of FILEID_INO64_GEN and FILEID_INO64_GEN_PARENT are the values returned by fuse and xfs for 64bit ino encoded file handle types. Note that these type value are filesystem specific and they do not define a universal file handle format, for example: fuse encodes FILEID_INO64_GEN as [ino-hi32,ino-lo32,gen] and xfs encodes FILEID_INO64_GEN as [hostr-order-ino64,gen] (a.k.a xfs_fid64). The FILEID_INO64_GEN fhandle type is going to be used for file ids for fanotify from filesystems that do not support NFS export. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-4-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28exportfs: make ->encode_fh() a mandatory method for NFS exportAmir Goldstein
Rename the default helper for encoding FILEID_INO32_GEN* file handles to generic_encode_ino32_fh() and convert the filesystems that used the default implementation to use the generic helper explicitly. After this change, exportfs_encode_inode_fh() no longer has a default implementation to encode FILEID_INO32_GEN* file handles. This is a step towards allowing filesystems to encode non-decodeable file handles for fanotify without having to implement any export_operations. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28ACPI: Add helper acpi_use_parent_companionHeiner Kallweit
In several drivers devices use the ACPI companion of the parent. Add a helper for this use case to avoid code duplication. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-10-28linux/init: remove __memexit* annotationsMasahiro Yamada
We have never used __memexit, __memexitdata, or __memexitconst. These were unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de>
2023-10-28fs: Convert to bdev_open_by_dev()Jan Kara
Convert mount code to use bdev_open_by_dev() and propagate the handle around to bdev_release(). Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-19-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28mm/swap: Convert to use bdev_open_by_dev()Jan Kara
Convert swapping code to use bdev_open_by_dev() and pass the handle around. CC: linux-mm@kvack.org CC: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-18-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28dm: Convert to bdev_open_by_dev()Jan Kara
Convert device mapper to use bdev_open_by_dev() and pass the handle around. CC: Alasdair Kergon <agk@redhat.com> CC: Mike Snitzer <snitzer@kernel.org> CC: dm-devel@redhat.com Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-10-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28pktcdvd: Convert to bdev_open_by_dev()Jan Kara
Convert pktcdvd to use bdev_open_by_dev(). Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-5-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28block: Use bdev_open_by_dev() in blkdev_open()Jan Kara
Convert blkdev_open() to use bdev_open_by_dev(). To be able to propagate handle from blkdev_open() to blkdev_release() we need to stop using existence of file->private_data to determine exclusive block device opens. Use bdev_handle->mode for this purpose since file->f_flags isn't usable for this (O_EXCL is cleared from the flags during open). Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-2-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28block: Provide bdev_open_* functionsJan Kara
Create struct bdev_handle that contains all parameters that need to be passed to blkdev_put() and provide bdev_open_* functions that return this structure instead of plain bdev pointer. This will eventually allow us to pass one more argument to blkdev_put() (renamed to bdev_release()) without too much hassle. Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-1-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-27acpi: Move common tables helper functions to common libDave Jiang
Some of the routines in ACPI driver/acpi/tables.c can be shared with parsing CDAT. CDAT is a device-provided data structure that is formatted similar to a platform provided ACPI table. CDAT is used by CXL and can exist on platforms that do not use ACPI. Split out the common routine from ACPI to accommodate platforms that do not support ACPI and move that to /lib. The common routines can be built outside of ACPI if FIRMWARE_TABLES is selected. Link: https://lore.kernel.org/linux-cxl/CAJZ5v0jipbtTNnsA0-o5ozOk8ZgWnOg34m34a9pPenTyRLj=6A@mail.gmail.com/ Suggested-by: "Rafael J. Wysocki" <rafael@kernel.org> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/169713683430.2205276.17899451119920103445.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27PCI/AER: Refactor cper_print_aer() for use by CXL driver moduleTerry Bowman
The CXL driver plans to use cper_print_aer() for logging restricted CXL host (RCH) AER errors. cper_print_aer() is not currently exported and therefore not usable by the CXL drivers built as loadable modules. Export the cper_print_aer() function. Use the EXPORT_SYMBOL_NS_GPL() variant to restrict the export to CXL drivers. The CONFIG_ACPI_APEI_PCIEAER kernel config is currently used to enable cper_print_aer(). cper_print_aer() logs the AER registers and is useful in PCIE AER logging outside of APEI. Remove the CONFIG_ACPI_APEI_PCIEAER dependency to enable cper_print_aer(). The cper_print_aer() function name implies CPER specific use but is useful in non-CPER cases as well. Rename cper_print_aer() to pci_print_aer(). Also, update cxl_core to import CXL namespace imports. Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com> Cc: Oliver O'Halloran <oohall@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-13-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27Merge tag 'ata-6.6-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA fix from Damien Le Moal: "A single patch to fix a regression introduced by the recent suspend/resume fixes. The regression is that ATA disks are not stopped on system shutdown, which is not recommended and increases the disks SMART counters for unclean power off events. This patch fixes this by refining the recent rework of the scsi device manage_xxx flags" * tag 'ata-6.6-final' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: scsi: sd: Introduce manage_shutdown device flag
2023-10-27doc/netlink: Update schema to support cmd-cnt-name and cmd-max-nameDavide Caratti
allow specifying cmd-cnt-name and cmd-max-name in netlink specs, in accordance with Documentation/userspace-api/netlink/c-code-gen.rst. Use cmd-cnt-name and attr-cnt-name in the mptcp yaml spec and in the corresponding uAPI headers, to preserve the #defines we had in the past and avoid adding new ones. v2: - squash modification in mptcp.yaml and MPTCP uAPI headers Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/12d4ed0116d8883cf4b533b856f3125a34e56749.1698415310.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27ASoC: Merge up workaround for CODECs that play noise on stopped streamMark Brown
This was sent too late to actually make it for v6.6 but was sent against v6.6 so merge it up here.
2023-10-27ASoC: soc-dai: add flag to mute and unmute stream during triggerSrinivas Kandagatla
In some setups like Speaker amps which are very sensitive, ex: keeping them unmute without actual data stream for very short duration results in a static charge and results in pop and clicks. To minimize this, provide a way to mute and unmute such codecs during trigger callbacks. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20231027105747.32450-2-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2023-10-27cdx: add sysfs for subsystem, class and revisionAbhijit Gangurde
CDX controller provides subsystem vendor, subsystem device, class and revision info of the device along with vendor and device ID in native endian format. CDX Bus system uses this information to bind the cdx device to the cdx device driver. Co-developed-by: Puneet Gupta <puneet.gupta@amd.com> Signed-off-by: Puneet Gupta <puneet.gupta@amd.com> Co-developed-by: Nipun Gupta <nipun.gupta@amd.com> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Tested-by: Nikhil Agarwal <nikhil.agarwal@amd.com> Link: https://lore.kernel.org/r/20231017160505.10640-8-abhijit.gangurde@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27cdx: add support for bus enable and disableAbhijit Gangurde
CDX bus needs to be disabled before updating/writing devices in the FPGA. Once the devices are written, the bus shall be rescanned. This change provides sysfs entry to enable/disable the CDX bus. Co-developed-by: Nipun Gupta <nipun.gupta@amd.com> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://lore.kernel.org/r/20231017160505.10640-6-abhijit.gangurde@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27cdx: Register cdx bus as a device on cdx subsystemAbhijit Gangurde
While scanning for CDX devices, register newly discovered bus as a cdx device. CDX device attributes are visible based on device type. Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://lore.kernel.org/r/20231017160505.10640-5-abhijit.gangurde@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27cdx: Remove cdx controller list from cdx bus systemAbhijit Gangurde
Remove xarray list of cdx controller. Instead, use platform bus to locate the cdx controller using compat string used by cdx controller platform driver. Also, use ida to allocate a unique id for the controller. Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://lore.kernel.org/r/20231017160505.10640-2-abhijit.gangurde@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27Revert "nvmem: add new config option"Rafał Miłecki
This reverts commit 517f14d9cf3533d5ab4fded195ab6f80a92e378f. Config option "no_of_node" is no longer needed since adding a more explicit and targeted option "add_legacy_fixed_of_cells". That "no_of_node" config option was needed *earlier* to help mtd's case. DT nodes of MTD partitions (that are also NVMEM devices) may contain subnodes. Those SHOULD NOT be treated as NVMEM fixed cells. To prevent NVMEM core code from parsing subnodes a "no_of_node" option was added (and set to true in mtd) to make for_each_child_of_node() in NVMEM a no-op. That was a bit hacky because it was messing with "of_node" pointer to achieve some side-effect. With the introduction of "add_legacy_fixed_of_cells" config option things got more explicit. MTD subsystem simply tells NVMEM when to look for fixed cells and there is no need to hack "of_node" pointer anymore. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20231023102759.31529-1-zajec5@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27usb: raw-gadget: report suspend, resume, reset, and disconnect eventsAndrey Konovalov
Update USB_RAW_IOCTL_EVENT_FETCH to also report suspend, resume, reset, and disconnect events. This allows the code that emulates a USB device via Raw Gadget to handle these events. For example, the device can restart enumeration when it gets reset. Also do not print a WARNING when the event queue overflows. With these new events being queued, the queue might overflow if the device emulation code stops fetching events. Also print debug messages when a non-control event is received. Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/d610b629a5f32fb76c24012180743f7f0f1872c0.1698350424.git.andreyknvl@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27crypto: FIPS 202 SHA-3 register in hash info for IMADimitri John Ledkov
Register FIPS 202 SHA-3 hashes in hash info for IMA and other users. Sizes 256 and up, as 224 is too weak for any practical purposes. Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27x509: Add OIDs for FIPS 202 SHA-3 hash and signaturesDimitri John Ledkov
Add OID for FIPS 202 SHA-3 family of hash functions, RSA & ECDSA signatures using those. Limit to 256 or larger sizes, for interoperability reasons. 224 is too weak for any practical uses. Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27crypto: ahash - optimize performance when wrapping shashEric Biggers
The "ahash" API provides access to both CPU-based and hardware offload- based implementations of hash algorithms. Typically the former are implemented as "shash" algorithms under the hood, while the latter are implemented as "ahash" algorithms. The "ahash" API provides access to both. Various kernel subsystems use the ahash API because they want to support hashing hardware offload without using a separate API for it. Yet, the common case is that a crypto accelerator is not actually being used, and ahash is just wrapping a CPU-based shash algorithm. This patch optimizes the ahash API for that common case by eliminating the extra indirect call for each ahash operation on top of shash. It also fixes the double-counting of crypto stats in this scenario (though CONFIG_CRYPTO_STATS should *not* be enabled by anyone interested in performance anyway...), and it eliminates redundant checking of CRYPTO_TFM_NEED_KEY. As a bonus, it also shrinks struct crypto_ahash. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27crypto: ahash - remove crypto_ahash_alignmaskEric Biggers
crypto_ahash_alignmask() no longer has any callers, and it always returns 0 now that neither ahash nor shash algorithms support nonzero alignmasks anymore. Therefore, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27crypto: ahash - remove support for nonzero alignmaskEric Biggers
Currently, the ahash API checks the alignment of all key and result buffers against the algorithm's declared alignmask, and for any unaligned buffers it falls back to manually aligned temporary buffers. This is virtually useless, however. First, since it does not apply to the message, its effect is much more limited than e.g. is the case for the alignmask for "skcipher". Second, the key and result buffers are given as virtual addresses and cannot (in general) be DMA'ed into, so drivers end up having to copy to/from them in software anyway. As a result it's easy to use memcpy() or the unaligned access helpers. The crypto_hash_walk_*() helper functions do use the alignmask to align the message. But with one exception those are only used for shash algorithms being exposed via the ahash API, not for native ahashes, and aligning the message is not required in this case, especially now that alignmask support has been removed from shash. The exception is the n2_core driver, which doesn't set an alignmask. In any case, no ahash algorithms actually set a nonzero alignmask anymore. Therefore, remove support for it from ahash. The benefit is that all the code to handle "misaligned" buffers in the ahash API goes away, reducing the overhead of the ahash API. This follows the same change that was made to shash. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27crypto: shash - remove crypto_shash_ctx_aligned()Eric Biggers
crypto_shash_ctx_aligned() is no longer used, and it is useless now that shash algorithms don't support nonzero alignmasks, so remove it. Also remove crypto_tfm_ctx_aligned() which was only called by crypto_shash_ctx_aligned(). It's unlikely to be useful again, since it seems inappropriate to use cra_alignmask to represent alignment for the tfm context when it already means alignment for inputs/outputs. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27units: Add BYTES_PER_*BITDamian Muszynski
There is going to be a new user of the BYTES_PER_[K/M/G]BIT definition besides possibly existing ones. Add them to the header. Signed-off-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27crypto: shash - remove crypto_shash_alignmaskEric Biggers
crypto_shash_alignmask() no longer has any callers, and it always returns 0 now that the shash algorithm type no longer supports nonzero alignmasks. Therefore, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27crypto: shash - eliminate indirect call for default import and exportEric Biggers
Most shash algorithms don't have custom ->import and ->export functions, resulting in the memcpy() based default being used. Yet, crypto_shash_import() and crypto_shash_export() still make an indirect call, which is expensive. Therefore, change how the default import and export are called to make it so that crypto_shash_import() and crypto_shash_export() don't do an indirect call in this case. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27net: Add MDB get device operationIdo Schimmel
Add MDB net device operation that will be invoked by rtnetlink code in response to received RTM_GETMDB messages. Subsequent patches will implement the operation in the bridge and VXLAN drivers. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27bridge: add MDB get uAPI attributesIdo Schimmel
Add MDB get attributes that correspond to the MDB set attributes used in RTM_NEWMDB messages. Specifically, add 'MDBA_GET_ENTRY' which will hold a 'struct br_mdb_entry' and 'MDBA_GET_ENTRY_ATTRS' which will hold 'MDBE_ATTR_*' attributes that are used as indexes (source IP and source VNI). An example request will look as follows: [ struct nlmsghdr ] [ struct br_port_msg ] [ MDBA_GET_ENTRY ] struct br_mdb_entry [ MDBA_GET_ENTRY_ATTRS ] [ MDBE_ATTR_SOURCE ] struct in_addr / struct in6_addr [ MDBE_ATTR_SRC_VNI ] u32 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27Merge tag 'thunderbolt-for-v6.7-rc1' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-next Mika writes: thunderbolt: Changes for v6.7 merge window This includes following USB4/Thunderbolt changes for the v6.7 merge window: - Configure asymmetric link if the DisplayPort bandwidth requires so - Enable path power management packet support for USB4 v2 routers - Make the bandwidth reservations to follow the USB4 v2 connection manager guide suggestions - DisplayPort tunneling improvements - Small cleanups and improvements around the driver. All these have been in linux-next with no reported issues. * tag 'thunderbolt-for-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (25 commits) thunderbolt: Fix one kernel-doc comment thunderbolt: Configure asymmetric link if needed and bandwidth allows thunderbolt: Add support for asymmetric link thunderbolt: Introduce tb_switch_depth() thunderbolt: Introduce tb_for_each_upstream_port_on_path() thunderbolt: Introduce tb_port_path_direction_downstream() thunderbolt: Set path power management packet support bit for USB4 v2 routers thunderbolt: Change bandwidth reservations to comply USB4 v2 thunderbolt: Make is_gen4_link() available to the rest of the driver thunderbolt: Use weight constants in tb_usb3_consumed_bandwidth() thunderbolt: Use constants for path weight and priority thunderbolt: Add DP IN added last in the head of the list of DP resources thunderbolt: Create multiple DisplayPort tunnels if there are more DP IN/OUT pairs thunderbolt: Log NVM version of routers and retimers thunderbolt: Use tb_tunnel_xxx() log macros in tb.c thunderbolt: Expose tb_tunnel_xxx() log macros to the rest of the driver thunderbolt: Use tb_tunnel_dbg() where possible to make logging more consistent thunderbolt: Fix typo of HPD bit for Hot Plug Detect thunderbolt: Fix typo in enum tb_link_width kernel-doc thunderbolt: Fix debug log when DisplayPort adapter not available for pairing ...
2023-10-27net/tcp: Add TCP_AO_REPAIRDmitry Safonov
Add TCP_AO_REPAIR setsockopt(), getsockopt(). They let a user to repair TCP-AO ISNs/SNEs. Also let the user hack around when (tp->repair) is on and add ao_info on a socket in any supported state. As SNEs now can be read/written at any moment, use WRITE_ONCE()/READ_ONCE() to set/read them. Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Wire up l3index to TCP-AODmitry Safonov
Similarly how TCP_MD5SIG_FLAG_IFINDEX works for TCP-MD5, TCP_AO_KEYF_IFINDEX is an AO-key flag that binds that MKT to a specified by L3 ifinndex. Similarly, without this flag the key will work in the default VRF l3index = 0 for connections. To prevent AO-keys from overlapping, it's restricted to add key B for a socket that has key A, which have the same sndid/rcvid and one of the following is true: - !(A.keyflags & TCP_AO_KEYF_IFINDEX) or !(B.keyflags & TCP_AO_KEYF_IFINDEX) so that any key is non-bound to a VRF - A.l3index == B.l3index both want to work for the same VRF Additionally, it's restricted to match TCP-MD5 keys for the same peer the following way: |--------------|--------------------|----------------|---------------| | | MD5 key without | MD5 key | MD5 key | | | l3index | l3index=0 | l3index=N | |--------------|--------------------|----------------|---------------| | TCP-AO key | | | | | without | reject | reject | reject | | l3index | | | | |--------------|--------------------|----------------|---------------| | TCP-AO key | | | | | l3index=0 | reject | reject | allow | |--------------|--------------------|----------------|---------------| | TCP-AO key | | | | | l3index=N | reject | allow | reject | |--------------|--------------------|----------------|---------------| This is done with the help of tcp_md5_do_lookup_any_l3index() to reject adding AO key without TCP_AO_KEYF_IFINDEX if there's TCP-MD5 in any VRF. This is important for case where sysctl_tcp_l3mdev_accept = 1 Similarly, for TCP-AO lookups tcp_ao_do_lookup() may be used with l3index < 0, so that __tcp_ao_key_cmp() will match TCP-AO key in any VRF. Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>