summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-07-08arch, mm: wire up memfd_secret system call where relevantMike Rapoport
Wire up memfd_secret system call on architectures that define ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86. Link: https://lkml.kernel.org/r/20210518072034.31572-7-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christopher Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tycho Andersen <tycho@tycho.ws> Cc: Will Deacon <will@kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08PM: hibernate: disable when there are active secretmem usersMike Rapoport
It is unsafe to allow saving of secretmem areas to the hibernation snapshot as they would be visible after the resume and this essentially will defeat the purpose of secret memory mappings. Prevent hibernation whenever there are active secret memory users. Link: https://lkml.kernel.org/r/20210518072034.31572-6-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tycho Andersen <tycho@tycho.ws> Cc: Will Deacon <will@kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08mm: introduce memfd_secret system call to create "secret" memory areasMike Rapoport
Introduce "memfd_secret" system call with the ability to create memory areas visible only in the context of the owning process and not mapped not only to other processes but in the kernel page tables as well. The secretmem feature is off by default and the user must explicitly enable it at the boot time. Once secretmem is enabled, the user will be able to create a file descriptor using the memfd_secret() system call. The memory areas created by mmap() calls from this file descriptor will be unmapped from the kernel direct map and they will be only mapped in the page table of the processes that have access to the file descriptor. Secretmem is designed to provide the following protections: * Enhanced protection (in conjunction with all the other in-kernel attack prevention systems) against ROP attacks. Seceretmem makes "simple" ROP insufficient to perform exfiltration, which increases the required complexity of the attack. Along with other protections like the kernel stack size limit and address space layout randomization which make finding gadgets is really hard, absence of any in-kernel primitive for accessing secret memory means the one gadget ROP attack can't work. Since the only way to access secret memory is to reconstruct the missing mapping entry, the attacker has to recover the physical page and insert a PTE pointing to it in the kernel and then retrieve the contents. That takes at least three gadgets which is a level of difficulty beyond most standard attacks. * Prevent cross-process secret userspace memory exposures. Once the secret memory is allocated, the user can't accidentally pass it into the kernel to be transmitted somewhere. The secreremem pages cannot be accessed via the direct map and they are disallowed in GUP. * Harden against exploited kernel flaws. In order to access secretmem, a kernel-side attack would need to either walk the page tables and create new ones, or spawn a new privileged uiserspace process to perform secrets exfiltration using ptrace. The file descriptor based memory has several advantages over the "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). File descriptor approach allows explicit and controlled sharing of the memory areas, it allows to seal the operations. Besides, file descriptor based memory paves the way for VMMs to remove the secret memory range from the userspace hipervisor process, for instance QEMU. Andy Lutomirski says: "Getting fd-backed memory into a guest will take some possibly major work in the kernel, but getting vma-backed memory into a guest without mapping it in the host user address space seems much, much worse." memfd_secret() is made a dedicated system call rather than an extension to memfd_create() because it's purpose is to allow the user to create more secure memory mappings rather than to simply allow file based access to the memory. Nowadays a new system call cost is negligible while it is way simpler for userspace to deal with a clear-cut system calls than with a multiplexer or an overloaded syscall. Moreover, the initial implementation of memfd_secret() is completely distinct from memfd_create() so there is no much sense in overloading memfd_create() to begin with. If there will be a need for code sharing between these implementation it can be easily achieved without a need to adjust user visible APIs. The secret memory remains accessible in the process context using uaccess primitives, but it is not exposed to the kernel otherwise; secret memory areas are removed from the direct map and functions in the follow_page()/get_user_page() family will refuse to return a page that belongs to the secret memory area. Once there will be a use case that will require exposing secretmem to the kernel it will be an opt-in request in the system call flags so that user would have to decide what data can be exposed to the kernel. Removing of the pages from the direct map may cause its fragmentation on architectures that use large pages to map the physical memory which affects the system performance. However, the original Kconfig text for CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736 ("x86: add gbpages switches")) and the recent report [1] showed that "... although 1G mappings are a good default choice, there is no compelling evidence that it must be the only choice". Hence, it is sufficient to have secretmem disabled by default with the ability of a system administrator to enable it at boot time. Pages in the secretmem regions are unevictable and unmovable to avoid accidental exposure of the sensitive data via swap or during page migration. Since the secretmem mappings are locked in memory they cannot exceed RLIMIT_MEMLOCK. Since these mappings are already locked independently from mlock(), an attempt to mlock()/munlock() secretmem range would fail and mlockall()/munlockall() will ignore secretmem mappings. However, unlike mlock()ed memory, secretmem currently behaves more like long-term GUP: secretmem mappings are unmovable mappings directly consumed by user space. With default limits, there is no excessive use of secretmem and it poses no real problem in combination with ZONE_MOVABLE/CMA, but in the future this should be addressed to allow balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA. A page that was a part of the secret memory area is cleared when it is freed to ensure the data is not exposed to the next user of that page. The following example demonstrates creation of a secret mapping (error handling is omitted): fd = memfd_secret(0); ftruncate(fd, MAP_SIZE); ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/ [akpm@linux-foundation.org: suppress Kconfig whine] Link: https://lkml.kernel.org/r/20210518072034.31572-5-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tycho Andersen <tycho@tycho.ws> Cc: Will Deacon <will@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08set_memory: allow querying whether set_direct_map_*() is actually enabledMike Rapoport
On arm64, set_direct_map_*() functions may return 0 without actually changing the linear map. This behaviour can be controlled using kernel parameters, so we need a way to determine at runtime whether calls to set_direct_map_invalid_noflush() and set_direct_map_default_noflush() have any effect. Extend set_memory API with can_set_direct_map() function that allows checking if calling set_direct_map_*() will actually change the page table, replace several occurrences of open coded checks in arm64 with the new function and provide a generic stub for architectures that always modify page tables upon calls to set_direct_map APIs. [arnd@arndb.de: arm64: kfence: fix header inclusion ] Link: https://lkml.kernel.org/r/20210518072034.31572-4-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christopher Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tycho Andersen <tycho@tycho.ws> Cc: Will Deacon <will@kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08lib: fix spelling mistakes in header filesZhen Lei
Fix some spelling mistakes in comments found by "codespell": Hoever ==> However poiter ==> pointer representaion ==> representation uppon ==> upon independend ==> independent aquired ==> acquired mis-match ==> mismatch scrach ==> scratch struture ==> structure Analagous ==> Analogous interation ==> iteration And some were discovered manually by Joe Perches and Christoph Lameter: stroed ==> stored arch independent ==> an architecture independent A example structure for ==> Example structure for Link: https://lkml.kernel.org/r/20210609150027.14805-2-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Christoph Lameter <cl@gentwo.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08Merge part 2 of branch 'sysfs-devel'Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08NFSv4/pnfs: Clean up layout get on openTrond Myklebust
Cache the layout in the arguments so we don't have to keep looking it up from the inode. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08Merge branch 'sysfs-devel'Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: remove an offlined xprt using sysfsOlga Kornievskaia
Once a transport has been put offline, this transport can be also removed from the list of transports. Any tasks that have been stuck on this transport would find the next available active transport and be re-tried. This transport would be removed from the xprt_switch list and freed. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08NFSv4.1 identify and mark RPC tasks that can move between transportsOlga Kornievskaia
In preparation for when we can re-try a task on a different transport, identify and mark such RPC tasks as moveable. Only 4.1+ operarations can be re-tried on a different transport. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08SUNRPC: take a xprt offline using sysfsOlga Kornievskaia
Using sysfs's xprt_state attribute, mark a particular transport offline. It will not be picked during the round-robin selection. It's not allowed to take the main (1st created transport associated with the rpc_client) offline. Also bring a transport back online via sysfs by writing "online" and that would allow for this transport to be picked during the round- robin selection. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add dst_attr attributes to the sysfs xprt directoryOlga Kornievskaia
Allow to query and set the destination's address of a transport. Setting of the destination address is allowed only for TCP or RDMA based connections. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08SUNRPC query transport's source portOlga Kornievskaia
Provide ability to query transport's source port. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08SUNRPC mark the first transportOlga Kornievskaia
When an RPC client gets created it's first transport is special and should be marked a main transport. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add add sysfs directory per xprt under each xprt_switchOlga Kornievskaia
Add individual transport directories under each transport switch group. For instance, for each nconnect=X connections there will be a transport directory. Naming conventions also identifies transport type -- xprt-<id>-<type> where type is udp, tcp, rdma, local, bc. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add xprt_switch direcotry to sunrpc's sysfsOlga Kornievskaia
Add xprt_switch directory to the sysfs and create individual xprt_swith subdirectories for multipath transport group. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: keep track of the xprt_class in rpc_xprt structureOlga Kornievskaia
We need to keep track of the type for a given transport. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add IDs to multipathOlga Kornievskaia
This is used to uniquely identify sunrpc multipath objects in /sys. Signed-off-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add xprt idOlga Kornievskaia
This adds a unique identifier for a sunrpc transport in sysfs, which is similarly managed to the unique IDs of clients. Signed-off-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: Create per-rpc_clnt sysfs kobjectsOlga Kornievskaia
These will eventually have files placed under them for sysfs operations. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08ext4: inline jbd2_journal_[un]register_shrinker()Theodore Ts'o
The function jbd2_journal_unregister_shrinker() was getting called twice when the file system was getting unmounted. On Power and ARM platforms this was causing kernel crash when unmounting the file system, when a percpu_counter was destroyed twice. Fix this by removing jbd2_journal_[un]register_shrinker() functions, and inlining the shrinker setup and teardown into journal_init_common() and jbd2_journal_destroy(). This means that ext4 and ocfs2 now no longer need to know about registering and unregistering jbd2's shrinker. Also, while we're at it, rename the percpu counter from j_jh_shrink_count to j_checkpoint_jh_count, since this makes it clearer what this counter is intended to track. Link: https://lore.kernel.org/r/20210705145025.3363130-1-tytso@mit.edu Fixes: 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers") Reported-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-07-08virtio-pci library: introduce vp_modern_get_driver_features()Jason Wang
This patch introduce a helper to get driver/guest features from the device. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210602021536.39525-3-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eli Cohen <elic@nvidia.com>
2021-07-08vdpa: support packed virtqueue for set/get_vq_state()Jason Wang
This patch extends the vdpa_vq_state to support packed virtqueue state which is basically the device/driver ring wrap counters and the avail and used index. This will be used for the virito-vdpa support for the packed virtqueue and the future vhost/vhost-vdpa support for the packed virtqueue. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210602021536.39525-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eli Cohen <elic@nvidia.com>
2021-07-07bpf: devmap: Implement devmap prog execution for generic XDPKumar Kartikeya Dwivedi
This lifts the restriction on running devmap BPF progs in generic redirect mode. To match native XDP behavior, it is invoked right before generic_xdp_tx is called, and only supports XDP_PASS/XDP_ABORTED/ XDP_DROP actions. We also return 0 even if devmap program drops the packet, as semantically redirect has already succeeded and the devmap prog is the last point before TX of the packet to device where it can deliver a verdict on the packet. This also means it must take care of freeing the skb, as xdp_do_generic_redirect callers only do that in case an error is returned. Since devmap entry prog is supported, remove the check in generic_xdp_install entirely. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-5-memxor@gmail.com
2021-07-07bpf: cpumap: Implement generic cpumapKumar Kartikeya Dwivedi
This change implements CPUMAP redirect support for generic XDP programs. The idea is to reuse the cpu map entry's queue that is used to push native xdp frames for redirecting skb to a different CPU. This will match native XDP behavior (in that RPS is invoked again for packet reinjected into networking stack). To be able to determine whether the incoming skb is from the driver or cpumap, we reuse skb->redirected bit that skips generic XDP processing when it is set. To always make use of this, CONFIG_NET_REDIRECT guard on it has been lifted and it is always available. >From the redirect side, we add the skb to ptr_ring with its lowest bit set to 1. This should be safe as skb is not 1-byte aligned. This allows kthread to discern between xdp_frames and sk_buff. On consumption of the ptr_ring item, the lowest bit is unset. In the end, the skb is simply added to the list that kthread is anyway going to maintain for xdp_frames converted to skb, and then received again by using netif_receive_skb_list. Bulking optimization for generic cpumap is left as an exercise for a future patch for now. Since cpumap entry progs are now supported, also remove check in generic_xdp_install for the cpumap. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-4-memxor@gmail.com
2021-07-07bitops: Add non-atomic bitops for pointersKumar Kartikeya Dwivedi
cpumap needs to set, clear, and test the lowest bit in skb pointer in various places. To make these checks less noisy, add pointer friendly bitop macros that also do some typechecking to sanitize the argument. These wrap the non-atomic bitops __set_bit, __clear_bit, and test_bit but for pointer arguments. Pointer's address has to be passed in and it is treated as an unsigned long *, since width and representation of pointer and unsigned long match on targets Linux supports. They are prefixed with double underscore to indicate lack of atomicity. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-3-memxor@gmail.com
2021-07-07net: core: Split out code to run generic XDP progKumar Kartikeya Dwivedi
This helper can later be utilized in code that runs cpumap and devmap programs in generic redirect mode and adjust skb based on changes made to xdp_buff. When returning XDP_REDIRECT/XDP_TX, it invokes __skb_push, so whenever a generic redirect path invokes devmap/cpumap prog if set, it must __skb_pull again as we expect mac header to be pulled. It also drops the skb_reset_mac_len call after do_xdp_generic, as the mac_header and network_header are advanced by the same offset, so the difference (mac_len) remains constant. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-2-memxor@gmail.com
2021-07-07bpf: Support input xdp_md context in BPF_PROG_TEST_RUNZvi Effron
Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for BPF_PROG_TEST_RUN. The intended use case is to pass some XDP meta data to the test runs of XDP programs that are used as tail calls. For programs that use bpf_prog_test_run_xdp, support xdp_md input and output. Unlike with an actual xdp_md during a non-test run, data_meta must be 0 because it must point to the start of the provided user data. From the initial xdp_md, use data and data_end to adjust the pointers in the generated xdp_buff. All other non-zero fields are prohibited (with EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially different) xdp_md back to the userspace. We require all fields of input xdp_md except the ones we explicitly support to be set to zero. The expectation is that in the future we might add support for more fields and we want to fail explicitly if the user runs the program on the kernel where we don't yet support them. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-3-zeffron@riotgames.com
2021-07-07bpf: Add function for XDP meta data length checkZvi Effron
This commit prepares to use the XDP meta data length check in multiple places by making it into a static inline function instead of a literal. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-2-zeffron@riotgames.com
2021-07-08Merge tag 'drm-misc-next-fixes-2021-07-01' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next Short summary of fixes pull: * amdgpu: TTM fixes * dma-buf: Doc fixes * gma500: Fix potential BO leaks in error handling * radeon: Fix NULL-ptr deref Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YN2GK2SH64yqXqh9@linux-uq9g
2021-07-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Do not refresh timeout in SYN_SENT for syn retransmissions. Add selftest for unreplied TCP connection, from Florian Westphal. 2) Fix null dereference from error path with hardware offload in nftables. 3) Remove useless nf_ct_gre_keymap_flush() from netns exit path, from Vasily Averin. 4) Missing rcu read-lock side in ctnetlink helper info dump, also from Vasily. 5) Do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry, from Ali Abdallah and Florian Westphal. 6) Add tcp_ignore_invalid_rst sysctl to allow to disable out of segment RSTs, from Ali. 7) KCSAN fix for nf_conntrack_all_lock(), from Manfred Spraul. 8) Honor NFTA_LAST_SET in nft_last. 9) Fix incorrect arithmetics when restore last_jiffies in nft_last. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07Merge tag 'acpi-5.14-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These include fixes of the recently introduced support for the Platform Runtime Mechanism (PRM) feature, a new backlight quirk, a suspend-to-idle wakeup fix for non-Intel platforms and a fix for the AMBA bus resource list in /proc/iomem. Specifics: - Fix up the recently added Platform Runtime Mechanism (PRM) support by correnting a couple of implementation mistakes in it and adding a Kconfig help text to describe it (Aubrey Li, Rafael Wysocki). - Add backlight quirk for Dell Vostro 3350 (Hans de Goede). - Avoid spurious wakeups from suspend-to-idle on non-Intel platforms by restricting special EC GPE handling to the Intel ones (Mario Limonciello). - Modify the AMBA bus support in ACPI to avoid adding using resource names in /proc/iomem (Liguang Zhang)" * tag 'acpi-5.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: Do not singal PRM support if not enabled ACPI: Correct \_SB._OSC bit definition for PRM ACPI: Kconfig: Provide help text for the ACPI_PRMT option ACPI: PM: Only mark EC GPE for wakeup on Intel systems ACPI: video: Add quirk for the Dell Vostro 3350 ACPI: AMBA: Fix resource name in /proc/iomem
2021-07-07Merge tag 'pm-5.14-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These include cpufreq core simplifications and fixes, cpufreq driver updates, cpuidle driver update, a generic power domains (genpd) locking fix and a debug-related simplification of the PM core. Specifics: - Drop the ->stop_cpu() (not really useful) and ->resolve_freq() (unused) cpufreq driver callbacks and modify the users of the former accordingly (Viresh Kumar, Rafael Wysocki). - Add frequency invariance support to the ACPI CPPC cpufreq driver again along with the related fixes and cleanups (Viresh Kumar). - Update the Meditak, qcom and SCMI ARM cpufreq drivers (Fabien Parent, Seiya Wang, Sibi Sankar, Christophe JAILLET). - Rename black/white-lists in the DT cpufreq driver (Viresh Kumar). - Add generic performance domains support to the dvfs DT bindings (Sudeep Holla). - Refine locking in the generic power domains (genpd) support code to avoid lock dependency issues (Stephen Boyd). - Update the MSM and qcom ARM cpuidle drivers (Bartosz Dudziak). - Simplify the PM core debug code by using ktime_us_delta() to compute time interval lengths (Mark-PK Tsai)" * tag 'pm-5.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (21 commits) PM: domains: Shrink locking area of the gpd_list_lock PM: sleep: Use ktime_us_delta() in initcall_debug_report() cpufreq: CPPC: Add support for frequency invariance arch_topology: Avoid use-after-free for scale_freq_data cpufreq: CPPC: Pass structure instance by reference cpufreq: CPPC: Fix potential memleak in cppc_cpufreq_cpu_init cpufreq: Remove ->resolve_freq() cpufreq: Reuse cpufreq_driver_resolve_freq() in __cpufreq_driver_target() cpufreq: Remove the ->stop_cpu() driver callback cpufreq: powernv: Migrate to ->exit() callback instead of ->stop_cpu() cpufreq: CPPC: Migrate to ->exit() callback instead of ->stop_cpu() cpufreq: intel_pstate: Combine ->stop_cpu() and ->offline() cpuidle: qcom: Add SPM register data for MSM8226 dt-bindings: arm: msm: Add SAW2 for MSM8226 dt-bindings: cpufreq: update cpu type and clock name for MT8173 SoC clk: mediatek: remove deprecated CLK_INFRA_CA57SEL for MT8173 SoC cpufreq: dt: Rename black/white-lists cpufreq: scmi: Fix an error message cpufreq: mediatek: add support for mt8365 dt-bindings: dvfs: Add support for generic performance domains ...
2021-07-07Merge tag 'for-v5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: "Battery/charger driver changes: - convert charger-manager binding to YAML - drop bd70528-charger driver - drop pm2301-charger driver - introduce rt5033-battery driver - misc improvements and fixes" * tag 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (42 commits) power: supply: ab8500: Fix an old bug power: supply: axp288_fuel_gauge: remove redundant continue statement power: supply: axp288_fuel_gauge: Make "T3 MRD" no_battery_list DMI entry more generic power: supply: axp288_fuel_gauge: Rename fuel_gauge_blacklist to no_battery_list power: supply: bq24190_charger: drop of_match_ptr() from device ID table drivers: power: add missing MODULE_DEVICE_TABLE in keystone-reset.c power: supply: ab8500: add missing MODULE_DEVICE_TABLE power: supply: charger-manager: add missing MODULE_DEVICE_TABLE power: reset: regulator-poweroff: add missing MODULE_DEVICE_TABLE power: supply: cpcap-charger: get the battery inserted infomation from cpcap-battery power: supply: cpcap-battery: invalidate config when incompatible measurements are read power: supply: axp20x_battery: allow disabling battery charging power: supply: max17040: drop unused platform data support power: supply: max17040: simplify POWER_SUPPLY_PROP_ONLINE power: supply: max17040: remove non-working POWER_SUPPLY_PROP_STATUS power: reset: at91-sama5d2_shdwc: Remove redundant error printing in at91_shdwc_probe() power: reset: gpio-poweroff: add missing MODULE_DEVICE_TABLE power: supply: rt5033_battery: Fix device tree enumeration dt-bindings: power: supply: Add DT schema for richtek,rt5033-battery power: supply: Drop BD70528 support ...
2021-07-07Merge tag 'linux-watchdog-5.14-rc1' of ↵Linus Torvalds
git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - Add Mstar MSC313e WDT driver - Add support for sama7g5-wdt - Add compatible for SC7280 SoC - Add compatible for Mediatek MT8195 - sbsa: Support architecture version 1 - Removal of the MV64x60 watchdog driver - Extra PCI IDs for hpwdt - Add hrtimer-based pretimeout feature - Add {min,max}_timeout sysfs nodes - keembay timeout and pre-timeout handling - Several fixes, cleanups and improvements * tag 'linux-watchdog-5.14-rc1' of git://www.linux-watchdog.org/linux-watchdog: (56 commits) watchdog: iTCO_wdt: use dev_err() instead of pr_err() watchdog: Add Mstar MSC313e WDT driver dt-bindings: watchdog: Add Mstar MSC313e WDT devicetree bindings documentation watchdog: iTCO_wdt: Account for rebooting on second timeout dt-bindings: watchdog: Convert arm,sbsa-gwdt to DT schema dt-bindings: watchdog: sama5d4-wdt: add compatible for sama7g5-wdt watchdog: sama5d4_wdt: add support for sama7g5-wdt dt-bindings: watchdog: sama5d4-wdt: convert to yaml watchdog: ziirave_wdt: Remove VERSION_FMT defines and add sysfs newlines dt-bindings: watchdog: Add compatible for Mediatek MT8195 dt-bindings: watchdog: dw-wdt: add description for rk3568 watchdog: imx_sc_wdt: fix pretimeout watchdog: diag288_wdt: Remove redundant assignment watchdog: Add hrtimer-based pretimeout feature dt-bindings: watchdog: Add compatible for SC7280 SoC watchdog: qcom: Move suspend/resume to suspend_late/resume_early watchdog: Fix a typo in the file orion_wdt.c watchdog: jz4740: Fix return value check in jz4740_wdt_probe() watchdog: Remove MV64x60 watchdog driver doc: mtk-wdt: support pre-timeout when the bark irq is available ...
2021-07-07Merge tag 'nfsd-5.14' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: - add tracepoints for callbacks and for client creation and destruction - cache the mounts used for server-to-server copies - expose callback information in /proc/fs/nfsd/clients/*/info - don't hold locks unnecessarily while waiting for commits - update NLM to use xdr_stream, as we have for NFSv2/v3/v4 * tag 'nfsd-5.14' of git://linux-nfs.org/~bfields/linux: (69 commits) nfsd: fix NULL dereference in nfs3svc_encode_getaclres NFSD: Prevent a possible oops in the nfs_dirent() tracepoint nfsd: remove redundant assignment to pointer 'this' nfsd: Reduce contention for the nfsd_file nf_rwsem lockd: Update the NLMv4 SHARE results encoder to use struct xdr_stream lockd: Update the NLMv4 nlm_res results encoder to use struct xdr_stream lockd: Update the NLMv4 TEST results encoder to use struct xdr_stream lockd: Update the NLMv4 void results encoder to use struct xdr_stream lockd: Update the NLMv4 FREE_ALL arguments decoder to use struct xdr_stream lockd: Update the NLMv4 SHARE arguments decoder to use struct xdr_stream lockd: Update the NLMv4 SM_NOTIFY arguments decoder to use struct xdr_stream lockd: Update the NLMv4 nlm_res arguments decoder to use struct xdr_stream lockd: Update the NLMv4 UNLOCK arguments decoder to use struct xdr_stream lockd: Update the NLMv4 CANCEL arguments decoder to use struct xdr_stream lockd: Update the NLMv4 LOCK arguments decoder to use struct xdr_stream lockd: Update the NLMv4 TEST arguments decoder to use struct xdr_stream lockd: Update the NLMv4 void arguments decoder to use struct xdr_stream lockd: Update the NLMv1 SHARE results encoder to use struct xdr_stream lockd: Update the NLMv1 nlm_res results encoder to use struct xdr_stream lockd: Update the NLMv1 TEST results encoder to use struct xdr_stream ...
2021-07-07Merge branches 'acpi-misc', 'acpi-video' and 'acpi-prm'Rafael J. Wysocki
* acpi-misc: ACPI: AMBA: Fix resource name in /proc/iomem * acpi-video: ACPI: video: Add quirk for the Dell Vostro 3350 * acpi-prm: ACPI: Do not singal PRM support if not enabled ACPI: Correct \_SB._OSC bit definition for PRM ACPI: Kconfig: Provide help text for the ACPI_PRMT option
2021-07-07Merge tag 'x86-fpu-2021-07-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Thomas Gleixner: "Fixes and improvements for FPU handling on x86: - Prevent sigaltstack out of bounds writes. The kernel unconditionally writes the FPU state to the alternate stack without checking whether the stack is large enough to accomodate it. Check the alternate stack size before doing so and in case it's too small force a SIGSEGV instead of silently corrupting user space data. - MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never been updated despite the fact that the FPU state which is stored on the signal stack has grown over time which causes trouble in the field when AVX512 is available on a CPU. The kernel does not expose the minimum requirements for the alternate stack size depending on the available and enabled CPU features. ARM already added an aux vector AT_MINSIGSTKSZ for the same reason. Add it to x86 as well. - A major cleanup of the x86 FPU code. The recent discoveries of XSTATE related issues unearthed quite some inconsistencies, duplicated code and other issues. The fine granular overhaul addresses this, makes the code more robust and maintainable, which allows to integrate upcoming XSTATE related features in sane ways" * tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again x86/fpu/signal: Let xrstor handle the features to init x86/fpu/signal: Handle #PF in the direct restore path x86/fpu: Return proper error codes from user access functions x86/fpu/signal: Split out the direct restore code x86/fpu/signal: Sanitize copy_user_to_fpregs_zeroing() x86/fpu/signal: Sanitize the xstate check on sigframe x86/fpu/signal: Remove the legacy alignment check x86/fpu/signal: Move initial checks into fpu__restore_sig() x86/fpu: Mark init_fpstate __ro_after_init x86/pkru: Remove xstate fiddling from write_pkru() x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate() x86/fpu: Remove PKRU handling from switch_fpu_finish() x86/fpu: Mask PKRU from kernel XRSTOR[S] operations x86/fpu: Hook up PKRU into ptrace() x86/fpu: Add PKRU storage outside of task XSAVE buffer x86/fpu: Dont restore PKRU in fpregs_restore_userspace() x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi() x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs() x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs() ...
2021-07-07Merge tag 'for-linus-5.14-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Only two minor patches this time: one cleanup patch and one patch refreshing a Xen header" * tag 'for-linus-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: sync include/xen/interface/io/ring.h with Xen's newest version xen: Use DEVICE_ATTR_*() macro
2021-07-07Merge tag 'rproc-v5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc Pull remoteproc updates from Bjorn Andersson: "This adds support for controlling the PRU and R5F clusters on the TI AM64x, the remote processor in i.MX7ULP, i.MX8MN/P and i.MX8ULP NXP and the audio, compute and modem remoteprocs in the Qualcomm SC8180x platform. It fixes improper ordering of cdev and device creation of the remoteproc control interface and it fixes resource leaks in the error handling path of rproc_add() and the Qualcomm modem and wifi remoteproc drivers. Lastly it fixes a few build warnings and replace the dummy parameter passed in the mailbox api of the stm32 driver to something not living on the stack" * tag 'rproc-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc: (32 commits) remoteproc: qcom: pas: Add SC8180X adsp, cdsp and mpss dt-bindings: remoteproc: qcom: pas: Add SC8180X adsp, cdsp and mpss remoteproc: imx_rproc: support i.MX8ULP dt-bindings: remoteproc: imx_rproc: support i.MX8ULP remoteproc: stm32: fix mbox_send_message call remoteproc: core: Cleanup device in case of failure remoteproc: core: Fix cdev remove and rproc del remoteproc: core: Move validate before device add remoteproc: core: Move cdev add before device add remoteproc: pru: Add support for various PRU cores on K3 AM64x SoCs dt-bindings: remoteproc: pru: Update bindings for K3 AM64x SoCs remoteproc: qcom_wcnss: Use devm_qcom_smem_state_get() remoteproc: qcom_q6v5: Use devm_qcom_smem_state_get() to fix missing put() soc: qcom: smem_state: Add devm_qcom_smem_state_get() dt-bindings: remoteproc: qcom: pas: Fix indentation warnings remoteproc: imx-rproc: Fix IMX_REMOTEPROC configuration remoteproc: imx_rproc: support i.MX8MN/P remoteproc: imx_rproc: support i.MX7ULP remoteproc: imx_rproc: make clk optional remoteproc: imx_rproc: initial support for mutilple start/stop method ...
2021-07-07netfilter: uapi: refer to nfnetlink_conntrack.h, not nf_conntrack_netlink.hDuncan Roe
nf_conntrack_netlink.h does not exist, refer to nfnetlink_conntrack.h instead. Signed-off-by: Duncan Roe <duncan_roe@optusnet.com.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-06lockd: Remove stale commentsChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-07-06Merge tag 'for-linus' of git://github.com/openrisc/linuxLinus Torvalds
Pull OpenRISC updates from Stafford Horne: "One change to simplify Litex CSR (MMIO register) access by limiting them to 32-bit offsets. Now that this is agreed on among Litex hardware and kernel developers it will allow us to start upstreaming other Litex peripheral drivers" * tag 'for-linus' of git://github.com/openrisc/linux: drivers/soc/litex: remove 8-bit subregister option
2021-07-06Merge tag 'kgdb-5.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "This was a extremely quiet cycle for kgdb. This consists of two patches that between them address spelling errors and a switch fallthrough warning" * tag 'kgdb-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kgdb: Fix fall-through warning for Clang kgdb: Fix spelling mistakes
2021-07-06Merge tag 'fuse-update-5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Fixes for virtiofs submounts - Misc fixes and cleanups * tag 'fuse-update-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: virtiofs: Fix spelling mistakes fuse: use DIV_ROUND_UP helper macro for calculations fuse: fix illegal access to inode with reused nodeid fuse: allow fallocate(FALLOC_FL_ZERO_RANGE) fuse: Make fuse_fill_super_submount() static fuse: Switch to fc_mount() for submounts fuse: Call vfs_get_tree() for submounts fuse: add dedicated filesystem context ops for submounts virtiofs: propagate sync() to file server fuse: reject internal errno fuse: check connected before queueing on fpq->io fuse: ignore PG_workingset after stealing fuse: Fix infinite loop in sget_fc() fuse: Fix crash if superblock of submount gets killed early fuse: Fix crash in fuse_dentry_automount() error path
2021-07-06bonding: Add struct bond_ipesc to manage SATaehee Yoo
bonding has been supporting ipsec offload. When SA is added, bonding just passes SA to its own active real interface. But it doesn't manage SA. So, when events(add/del real interface, active real interface change, etc) occur, bonding can't handle that well because It doesn't manage SA. So some problems(panic, UAF, refcnt leak)occur. In order to make it stable, it should manage SA. That's the reason why struct bond_ipsec is added. When a new SA is added to bonding interface, it is stored in the bond_ipsec list. And the SA is passed to a current active real interface. If events occur, it uses bond_ipsec data to handle these events. bond->ipsec_list is protected by bond->ipsec_lock. If a current active real interface is changed, the following logic works. 1. delete all SAs from old active real interface 2. Add all SAs to the new active real interface. 3. If a new active real interface doesn't support ipsec offload or SA's option, it sets real_dev to NULL. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06Merge branch 'pci/kernel-doc'Bjorn Helgaas
- Fix kernel-doc formatting errors (Krzysztof Wilczyński) * pci/kernel-doc: PCI: Fix kernel-doc formatting
2021-07-06Merge branch 'pci/p2pdma'Bjorn Helgaas
- Rename Rename upstream_bridge_distance() to calc_map_type_and_dist() (Logan Gunthorpe) - Collect ACS list message in stack buffer to avoid sleeping (Logan Gunthorpe) - Use correct calc_map_type_and_dist() return type (Logan Gunthorpe) - Warn if host bridge not in whitelist (Logan Gunthorpe) - Refactor pci_p2pdma_map_type() (Logan Gunthorpe) - Avoid pci_get_slot(), which may sleep (Logan Gunthorpe) - Simplify distance calculation in __calc_map_type_and_dist() and calc_map_type_and_dist_warn() (Christoph Hellwig) - Finish RCU conversion of pdev->p2pdma (Eric Dumazet) * pci/p2pdma: PCI/P2PDMA: Finish RCU conversion of pdev->p2pdma PCI/P2PDMA: Simplify distance calculation PCI/P2PDMA: Avoid pci_get_slot(), which may sleep PCI/P2PDMA: Refactor pci_p2pdma_map_type() PCI/P2PDMA: Warn if host bridge not in whitelist PCI/P2PDMA: Use correct calc_map_type_and_dist() return type PCI/P2PDMA: Collect acs list in stack buffer to avoid sleeping PCI/P2PDMA: Rename upstream_bridge_distance() and rework doc
2021-07-06PCI/P2PDMA: Finish RCU conversion of pdev->p2pdmaEric Dumazet
While looking at pci_alloc_p2pmem() I found RCU protection was not properly applied there, as pdev->p2pdma was potentially read multiple times. Fix pci_alloc_p2pmem(), add __rcu qualifier to p2pdma field of struct pci_dev, and fix all other accesses to this field with proper RCU verbs. Link: https://lore.kernel.org/r/20210701095621.3129283-1-eric.dumazet@gmail.com Fixes: 1570175abd16 ("PCI/P2PDMA: track pgmap references per resource, not globally") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
2021-07-06PCI: Fix kernel-doc formattingKrzysztof Wilczyński
Fix kernel-doc formatting throughout drivers/pci and related include files. No change to functionality intended. Check for warnings: $ find include drivers/pci -type f -path "*pci*.[ch]" | xargs scripts/kernel-doc -none [bhelgaas: squashed to one commit] Link: https://lore.kernel.org/r/20210509030237.368540-1-kw@linux.com Link: https://lore.kernel.org/r/20210703151306.1922450-1-kw@linux.com Link: https://lore.kernel.org/r/20210703151306.1922450-2-kw@linux.com Link: https://lore.kernel.org/r/20210703151306.1922450-3-kw@linux.com Link: https://lore.kernel.org/r/20210703151306.1922450-4-kw@linux.com Link: https://lore.kernel.org/r/20210703151306.1922450-5-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>