summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-11parisc: Always use the self-extracting kernel featureHelge Deller
This patch drops the CONFIG_PARISC_SELF_EXTRACT option. The palo boot loader is able to decompress a kernel which was compressed with gzip. That possibility was useful when the Linux kernel self-extracting feature wasn't implemented yet. Beside the fact that the self-extracting feature offers much better compression rates, we do support self-extracting kernels already since kernel v4.14, so now it's really time to get rid of that old option and always use the self-extractor. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11video/fbdev/stifb: Implement the stifb_fillrect() functionHelge Deller
The stifb driver (for Artist/HCRX graphics on PA-RISC) was missing the fillrect function. Tested on a 715/64 PA-RISC machine and in qemu. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Add vDSO supportHelge Deller
Add minimal vDSO support, which provides the signal trampoline helpers, but none of the userspace syscall helpers like time wrappers. The big benefit of this vDSO implementation is, that we now don't need an executeable stack any longer. PA-RISC is one of the last architectures where an executeable stack was needed in oder to implement the signal trampolines by putting assembly instructions on the stack which then gets executed. Instead the kernel will provide the relevant code in the vDSO page and only put the pointers to the signal information on the stack. By dropping the need for executable stacks we avoid running into issues with applications which want non executable stacks for security reasons. Additionally, alternative stacks on memory areas without exec permissions are supported too. This code is based on an initial implementation by Randolph Chung from 2006: https://lore.kernel.org/linux-parisc/4544A34A.6080700@tausq.org/ I did the porting and lifted the code to current code base. Dave fixed the unwind code so that gdb and glibc are able to backtrace through the code. An additional patch to gdb will be pushed upstream by Dave. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Dave Anglin <dave.anglin@bell.net> Cc: Randolph Chung <randolph@tausq.org> Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Simplify fast path for non-access data TLB faultsJohn David Anglin
With the latest cache fix for non-access faults and the support for non-access faults (code 17) in handle_interruption, we can remove the fast path emulation for fdc, fic, pdc, lpa, probe and probei instructions. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Fix handling off probe non-access faultsJohn David Anglin
Currently, the parisc kernel does not fully support non-access TLB fault handling for probe instructions. In the fast path, we set the target register to zero if it is not a shadowed register. The slow path is not implemented, so we call do_page_fault. The architecture indicates that non-access faults should not cause a page fault from disk. This change adds to code to provide non-access fault support for probe instructions. It also modifies the handling of faults on userspace so that if the address lies in a valid VMA and the access type matches that for the VMA, the probe target register is set to one. Otherwise, the target register is set to zero. This was done to make probe instructions more useful for userspace. Probe instructions are not very useful if they set the target register to zero whenever a page is not present in memory. Nominally, the purpose of the probe instruction is determine whether read or write access to a given address is allowed. This fixes a problem in function pointer comparison noticed in the glibc testsuite (stdio-common/tst-vfprintf-user-type). The same problem is likely in glibc (_dl_lookup_address). V2 adds flush and lpa instruction support to handle_nadtlb_fault. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Fix non-access data TLB cache flush faultsJohn David Anglin
When a page is not present, we get non-access data TLB faults from the fdc and fic instructions in flush_user_dcache_range_asm and flush_user_icache_range_asm. When these occur, the cache line is not invalidated and potentially we get memory corruption. The problem was hidden by the nullification of the flush instructions. These faults also affect performance. With pa8800/pa8900 processors, there will be 32 faults per 4 KB page since the cache line is 128 bytes. There will be more faults with earlier processors. The problem is fixed by using flush_cache_pages(). It does the flush using a tmp alias mapping. The flush_cache_pages() call in flush_cache_range() flushed too large a range. V2: Remove unnecessary preempt_disable() and preempt_enable() calls. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11x86/sgx: Free backing memory after faulting the enclave pageJarkko Sakkinen
There is a limited amount of SGX memory (EPC) on each system. When that memory is used up, SGX has its own swapping mechanism which is similar in concept but totally separate from the core mm/* code. Instead of swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM comes from a shared memory pseudo-file and can itself be swapped by the core mm code. There is a hierarchy like this: EPC <-> shmem <-> disk After data is swapped back in from shmem to EPC, the shmem backing storage needs to be freed. Currently, the backing shmem is not freed. This effectively wastes the shmem while the enclave is running. The memory is recovered when the enclave is destroyed and the backing storage freed. Sort this out by freeing memory with shmem_truncate_range(), as soon as a page is faulted back to the EPC. In addition, free the memory for PCMD pages as soon as all PCMD's in a page have been marked as unused by zeroing its contents. Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220303223859.273187-1-jarkko@kernel.org
2022-03-11Merge branch 'davidh' (fixes from David Howells)Linus Torvalds
Merge misc fixes from David Howells: "A set of patches for watch_queue filter issues noted by Jann. I've added in a cleanup patch from Christophe Jaillet to convert to using formal bitmap specifiers for the note allocation bitmap. Also two filesystem fixes (afs and cachefiles)" * emailed patches from David Howells <dhowells@redhat.com>: cachefiles: Fix volume coherency attribute afs: Fix potential thrashing in afs writeback watch_queue: Make comment about setting ->defunct more accurate watch_queue: Fix lack of barrier/sync/lock between post and read watch_queue: Free the alloc bitmap when the watch_queue is torn down watch_queue: Fix the alloc bitmap size to reflect notes allocated watch_queue: Use the bitmap API when applicable watch_queue: Fix to always request a pow-of-2 pipe ring size watch_queue: Fix to release page in ->release() watch_queue, pipe: Free watchqueue state after clearing pipe ring watch_queue: Fix filter limit check
2022-03-11cachefiles: Fix volume coherency attributeDavid Howells
A network filesystem may set coherency data on a volume cookie, and if given, cachefiles will store this in an xattr on the directory in the cache corresponding to the volume. The function that sets the xattr just stores the contents of the volume coherency buffer directly into the xattr, with nothing added; the checking function, on the other hand, has a cut'n'paste error whereby it tries to interpret the xattr contents as would be the xattr on an ordinary file (using the cachefiles_xattr struct). This results in a failure to match the coherency data because the buffer ends up being shifted by 18 bytes. Fix this by defining a structure specifically for the volume xattr and making both the setting and checking functions use it. Since the volume coherency doesn't work if used, take the opportunity to insert a reserved field for future use, set it to 0 and check that it is 0. Log mismatch through the appropriate tracepoint. Note that this only affects cifs; 9p, afs, ceph and nfs don't use the volume coherency data at the moment. Fixes: 32e150037dce ("fscache, cachefiles: Store the volume coherency data") Reported-by: Rohith Surabattula <rohiths.msft@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Steve French <smfrench@gmail.com> cc: linux-cifs@vger.kernel.org cc: linux-cachefs@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11afs: Fix potential thrashing in afs writebackDavid Howells
In afs_writepages_region(), if the dirty page we find is undergoing writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go round the loop trying the same page again and again with no pausing or waiting unless and until another thread manages to clear the writeback and fscache flags. Fix this with three measures: (1) Advance start to after the page we found. (2) Break out of the loop and return if rescheduling is requested. (3) Arbitrarily give up after a maximum of 5 skips. Fixes: 31143d5d515e ("AFS: implement basic file write support") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Acked-by: Marc Dionne <marc.dionne@auristor.com> Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11x86/traps: Mark do_int3() NOKPROBE_SYMBOLLi Huafei
Since kprobe_int3_handler() is called in do_int3(), probing do_int3() can cause a breakpoint recursion and crash the kernel. Therefore, do_int3() should be marked as NOKPROBE_SYMBOL. Fixes: 21e28290b317 ("x86/traps: Split int3 handler up") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220310120915.63349-1-lihuafei1@huawei.com
2022-03-11watch_queue: Make comment about setting ->defunct more accurateDavid Howells
watch_queue_clear() has a comment stating that setting ->defunct to true preventing new additions as well as preventing notifications. Whilst the latter is true, the first bit is superfluous since at the time this function is called, the pipe cannot be accessed to add new event sources. Remove the "new additions" bit from the comment. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Fix lack of barrier/sync/lock between post and readDavid Howells
There's nothing to synchronise post_one_notification() versus pipe_read(). Whilst posting is done under pipe->rd_wait.lock, the reader only takes pipe->mutex which cannot bar notification posting as that may need to be made from contexts that cannot sleep. Fix this by setting pipe->head with a barrier in post_one_notification() and reading pipe->head with a barrier in pipe_read(). If that's not sufficient, the rd_wait.lock will need to be taken, possibly in a ->confirm() op so that it only applies to notifications. The lock would, however, have to be dropped before copy_page_to_iter() is invoked. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Free the alloc bitmap when the watch_queue is torn downDavid Howells
Free the watch_queue note allocation bitmap when the watch_queue is destroyed. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Fix the alloc bitmap size to reflect notes allocatedDavid Howells
Currently, watch_queue_set_size() sets the number of notes available in wqueue->nr_notes according to the number of notes allocated, but sets the size of the bitmap to the unrounded number of notes originally asked for. Fix this by setting the bitmap size to the number of notes we're actually going to make available (ie. the number allocated). Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Use the bitmap API when applicableChristophe JAILLET
Use bitmap_alloc() to simplify code, improve the semantic and reduce some open-coded arithmetic in allocator arguments. Also change a memset(0xff) into an equivalent bitmap_fill() to keep consistency. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Fix to always request a pow-of-2 pipe ring sizeDavid Howells
The pipe ring size must always be a power of 2 as the head and tail pointers are masked off by AND'ing with the size of the ring - 1. watch_queue_set_size(), however, lets you specify any number of notes between 1 and 511. This number is passed through to pipe_resize_ring() without checking/forcing its alignment. Fix this by rounding the number of slots required up to the nearest power of two. The request is meant to guarantee that at least that many notifications can be generated before the queue is full, so rounding down isn't an option, but, alternatively, it may be better to give an error if we aren't allowed to allocate that much ring space. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Fix to release page in ->release()David Howells
When a pipe ring descriptor points to a notification message, the refcount on the backing page is incremented by the generic get function, but the release function, which marks the bitmap, doesn't drop the page ref. Fix this by calling generic_pipe_buf_release() at the end of watch_queue_pipe_buf_release(). Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue, pipe: Free watchqueue state after clearing pipe ringDavid Howells
In free_pipe_info(), free the watchqueue state after clearing the pipe ring as each pipe ring descriptor has a release function, and in the case of a notification message, this is watch_queue_pipe_buf_release() which tries to mark the allocation bitmap that was previously released. Fix this by moving the put of the pipe's ref on the watch queue to after the ring has been cleared. We still need to call watch_queue_clear() before doing that to make sure that the pipe is disconnected from any notification sources first. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Fix filter limit checkDavid Howells
In watch_queue_set_filter(), there are a couple of places where we check that the filter type value does not exceed what the type_filter bitmap can hold. One place calculates the number of bits by: if (tf[i].type >= sizeof(wfilter->type_filter) * 8) which is fine, but the second does: if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) which is not. This can lead to a couple of out-of-bounds writes due to a too-large type: (1) __set_bit() on wfilter->type_filter (2) Writing more elements in wfilter->filters[] than we allocated. Fix this by just using the proper WATCH_TYPE__NR instead, which is the number of types we actually know about. The bug may cause an oops looking something like: BUG: KASAN: slab-out-of-bounds in watch_queue_set_filter+0x659/0x740 Write of size 4 at addr ffff88800d2c66bc by task watch_queue_oob/611 ... Call Trace: <TASK> dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x150 ... kasan_report.cold+0x7f/0x11b ... watch_queue_set_filter+0x659/0x740 ... __x64_sys_ioctl+0x127/0x190 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Allocated by task 611: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 watch_queue_set_filter+0x23a/0x740 __x64_sys_ioctl+0x127/0x190 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88800d2c66a0 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 28 bytes inside of 32-byte region [ffff88800d2c66a0, ffff88800d2c66c0) Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11block: flush plug based on hardware and software queue orderJens Axboe
We used to sort the plug list if we had multiple queues before dispatching requests to the IO scheduler. This usually isn't needed, but for certain workloads that interleave requests to disks, it's a less efficient to process the plug list one-by-one if everything is interleaved. Don't sort the list, but skip through it and flush out entries that have the same target at the same time. Fixes: df87eb0fce8f ("block: get rid of plug list sorting") Reported-and-tested-by: Song Liu <song@kernel.org> Reviewed-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-11block: ensure plug merging checks the correct queue at least onceJens Axboe
Song reports that a RAID rebuild workload runs much slower recently, and it is seeing a lot less merging than it did previously. The reason is that a previous commit reduced the amount of work we do for plug merging. RAID rebuild interleaves requests between disks, so a last-entry check in plug merging always misses a merge opportunity since we always find a different disk than what we are looking for. Modify the logic such that it's still a one-hit cache, but ensure that we check enough to find the right target before giving up. Fixes: d38a9c04c0d5 ("block: only check previous entry for plug merge attempt") Reported-and-tested-by: Song Liu <song@kernel.org> Reviewed-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-11f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fsJaegeuk Kim
Let's purge inode cache in order to avoid the below deadlock. [freeze test] shrinkder freeze_super - pwercpu_down_write(SB_FREEZE_FS) - super_cache_scan - down_read(&sb->s_umount) - prune_icache_sb - dispose_list - evict - f2fs_evict_inode thaw_super - down_write(&sb->s_umount); - __percpu_down_read(SB_FREEZE_FS) Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-03-11fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.Dai Ngo
Update lock usage of lock_manager_operations' functions to reflect the changes in commit 6109c85037e5 ("locks: add a dedicated spinlock to protect i_flctx lists"). Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-03-11NFSD: Fix nfsd_breaker_owns_lease() return valuesChuck Lever
These have been incorrect since the function was introduced. A proper kerneldoc comment is added since this function, though static, is part of an external interface. Reported-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-11NFSD: Clean up _lm_ operation namesChuck Lever
The common practice is to name function instances the same as the method names, but with a uniquifying prefix. Commit aef9583b234a ("NFSD: Get reference of lockowner when coping file_lock") missed this -- the new function names should both have been of the form "nfsd4_lm_*". Before more lock manager operations are added in NFSD, rename these two functions for consistency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-11arch: Remove references to CONFIG_NFSD_V3 in the default configsChuck Lever
CONFIG_NFSD_V3 has been removed. NFSD support for NFSv3 can no longer be disabled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-11NFSD: Remove CONFIG_NFSD_V3Chuck Lever
Eventually support for NFSv2 in the Linux NFS server is to be deprecated and then removed. However, NFSv2 is the "always supported" version that is available as soon as CONFIG_NFSD is set. Before NFSv2 support can be removed, we need to choose a different "always supported" version. This patch removes CONFIG_NFSD_V3 so that NFSv3 is always supported, as NFSv2 is today. When NFSv2 support is removed, NFSv3 will become the only "always supported" NFS version. The defconfigs still need to be updated to remove CONFIG_NFSD_V3=y. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-11sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headersFrederic Weisbecker
Displaying "PREEMPT" on kernel headers when CONFIG_PREEMPT_DYNAMIC=y can be misleading for anybody involved in remote debugging because it is then not guaranteed that there is an actual preemption behaviour. It depends on default Kconfig or boot defined choices. Therefore, tell about PREEMPT_DYNAMIC on static kernel headers and leave the search for the actual preemption behaviour to browsing dmesg. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220217111240.GA742892@lothringen
2022-03-11spi: Update clock-names property for arm pl022Kuldeep Singh
PL022 has two input clocks named sspclk and apb_pclk. Current schema refers to two notations of sspclk which are indeed same and thus one can be dropped. Update clock-names property to reflect the same. Signed-off-by: Kuldeep Singh <singh.kuldeep87k@gmail.com> Link: https://lore.kernel.org/r/20220309171847.5345-1-singh.kuldeep87k@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-03-11xen/blkfront: speed up purge_persistent_grants()Juergen Gross
purge_persistent_grants() is scanning the grants list for persistent grants being no longer in use by the backend. When having found such a grant, it will be set to "invalid" and pushed to the tail of the list. Instead of pushing it directly to the end of the list, add it first to a temporary list, avoiding to scan those entries again in the main list traversal. After having finished the scan, append the temporary list to the grant list. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Link: https://lore.kernel.org/r/20220311103527.12931-1-jgross@suse.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-11Merge branch irq/aic-v2 into irq/irqchip-nextMarc Zyngier
* irq/aic-v2: : . : Add support for the interrupt controller found is the latest : incarnation of Apple M1 systems, courtesy of Hector Martin. : . irqchip/apple-aic: Add support for AICv2 irqchip/apple-aic: Support multiple dies irqchip/apple-aic: Dynamically compute register offsets irqchip/apple-aic: Switch to irq_domain_create_tree and sparse hwirqs irqchip/apple-aic: Add Fast IPI support dt-bindings: interrupt-controller: apple,aic2: New binding for AICv2 PCI: apple: Change MSI handling to handle 4-cell AIC fwspec form Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-03-11irqchip/apple-aic: Add support for AICv2Hector Martin
Introduce support for the new AICv2 hardware block in t6000/t6001 SoCs. It seems these blocks are missing the information required to compute the event register offset in the capability registers, so we specify that in the DT as a second reg entry. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220309192123.152028-8-marcan@marcan.st
2022-03-11irqchip/apple-aic: Support multiple diesHector Martin
Multi-die support in AICv2 uses several sets of IRQ registers. Introduce a die count and compute the register group offset based on the die ID field of the hwirq number, as reported by the hardware. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220309192123.152028-7-marcan@marcan.st
2022-03-11irqchip/apple-aic: Dynamically compute register offsetsHector Martin
This allows us to support AIC variants with different numbers of IRQs based on capability registers. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220309192123.152028-6-marcan@marcan.st
2022-03-11irqchip/apple-aic: Switch to irq_domain_create_tree and sparse hwirqsHector Martin
This allows us to directly use the hardware event number as the hwirq number. Since IRQ events have bit 16 set (type=1), FIQs now move to starting at hwirq number 0. This will become more important once multi-die support is introduced in a later commit. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220309192123.152028-5-marcan@marcan.st
2022-03-11irqchip/apple-aic: Add Fast IPI supportHector Martin
The newer AICv2 present in t600x SoCs does not have legacy IPI support at all. Since t8103 also supports Fast IPIs, implement support for this first. The legacy IPI code is left as a fallback, so it can be potentially used by older SoCs in the future. The vIPI code is shared; only the IPI firing/acking bits change for Fast IPIs. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220309192123.152028-4-marcan@marcan.st
2022-03-11dt-bindings: interrupt-controller: apple,aic2: New binding for AICv2Hector Martin
This new incompatible revision of the AIC peripheral introduces multi-die support. This binding is based on apple,aic, but changes interrupt-cells to add a new die argument. Also adds a second reg entry to specify the offset of the event register. Inexplicably, the capability registers allow us to compute other register offsets, but not this one. This allows us to keep forward-compatibility with future SoCs that will likely implement different die counts, thus shifting the event register. Apple also specify the offset explicitly in their device tree... Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220309192123.152028-3-marcan@marcan.st
2022-03-10Merge tag 'drm-fixes-2022-03-11' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "As expected at this stage its pretty quiet, one sun4i mixer fix and one i915 display flicker fix: i915: - fix psr screen flicker sun4i: - mixer format fix" * tag 'drm-fixes-2022-03-11' of git://anongit.freedesktop.org/drm/drm: drm/sun4i: mixer: Fix P010 and P210 format numbers drm/i915/psr: Set "SF Partial Frame Enable" also on full update
2022-03-10riscv: Fix auipc+jalr relocation range checksEmil Renner Berthing
RISC-V can do PC-relative jumps with a 32bit range using the following two instructions: auipc t0, imm20 ; t0 = PC + imm20 * 2^12 jalr ra, t0, imm12 ; ra = PC + 4, PC = t0 + imm12 Crucially both the 20bit immediate imm20 and the 12bit immediate imm12 are treated as two's-complement signed values. For this reason the immediates are usually calculated like this: imm20 = (offset + 0x800) >> 12 imm12 = offset & 0xfff ..where offset is the signed offset from the auipc instruction. When the 11th bit of offset is 0 the addition of 0x800 doesn't change the top 20 bits and imm12 considered positive. When the 11th bit is 1 the carry of the addition by 0x800 means imm20 is one higher, but since imm12 is then considered negative the two's complement representation means it all cancels out nicely. However, this addition by 0x800 (2^11) means an offset greater than or equal to 2^31 - 2^11 would overflow so imm20 is considered negative and result in a backwards jump. Similarly the lower range of offset is also moved down by 2^11 and hence the true 32bit range is [-2^31 - 2^11, 2^31 - 2^11) Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-03-11Merge tag 'drm-intel-fixes-2022-03-10' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix PSR2 when selective fetch is enabled and cursor at (-1, -1) (Jouni Högander) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YinTFSFg++HvuFpZ@tursulin-mobl2
2022-03-11Merge tag 'drm-misc-fixes-2022-03-10' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes * drm/sun4i: Fix P010 and P210 format numbers Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YipS65Iuu7RMMlAa@linux-uq9g
2022-03-10Merge tag 'trace-v5.17-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Minor tracing fixes: - Fix unregistering the same event twice. A user could disable the same event that osnoise will disable on unregistering. - Inform RCU of a quiescent state in the osnoise testing thread. - Fix some kerneldoc comments" * tag 'trace-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix some W=1 warnings in kernel doc comments tracing/osnoise: Force quiescent states while tracing tracing/osnoise: Do not unregister events twice
2022-03-10Merge tag 'net-5.17-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, and ipsec. Current release - regressions: - Bluetooth: fix unbalanced unlock in set_device_flags() - Bluetooth: fix not processing all entries on cmd_sync_work, make connect with qualcomm and intel adapters reliable - Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0" - xdp: xdp_mem_allocator can be NULL in trace_mem_connect() - eth: ice: fix race condition and deadlock during interface enslave Current release - new code bugs: - tipc: fix incorrect order of state message data sanity check Previous releases - regressions: - esp: fix possible buffer overflow in ESP transformation - dsa: unlock the rtnl_mutex when dsa_master_setup() fails - phy: meson-gxl: fix interrupt handling in forced mode - smsc95xx: ignore -ENODEV errors when device is unplugged Previous releases - always broken: - xfrm: fix tunnel mode fragmentation behavior - esp: fix inter address family tunneling on GSO - tipc: fix null-deref due to race when enabling bearer - sctp: fix kernel-infoleak for SCTP sockets - eth: macb: fix lost RX packet wakeup race in NAPI receive - eth: intel stop disabling VFs due to PF error responses - eth: bcmgenet: don't claim WOL when its not available" * tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) xdp: xdp_mem_allocator can be NULL in trace_mem_connect(). ice: Fix race condition during interface enslave net: phy: meson-gxl: improve link-up behavior net: bcmgenet: Don't claim WOL when its not available net: arc_emac: Fix use after free in arc_mdio_probe() sctp: fix kernel-infoleak for SCTP sockets net: phy: correct spelling error of media in documentation net: phy: DP83822: clear MISR2 register to disable interrupts gianfar: ethtool: Fix refcount leak in gfar_get_ts_info selftests: pmtu.sh: Kill nettest processes launched in subshell. selftests: pmtu.sh: Kill tcpdump processes launched by subshell. NFC: port100: fix use-after-free in port100_send_complete net/mlx5e: SHAMPO, reduce TIR indication net/mlx5e: Lag, Only handle events from highest priority multipath entry net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE net/mlx5: Fix a race on command flush flow net/mlx5: Fix size field in bufferx_reg struct ax25: Fix NULL pointer dereference in ax25_kill_by_device net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr net: ethernet: lpc_eth: Handle error for clk_enable ...
2022-03-10xdp: xdp_mem_allocator can be NULL in trace_mem_connect().Sebastian Andrzej Siewior
Since the commit mentioned below __xdp_reg_mem_model() can return a NULL pointer. This pointer is dereferenced in trace_mem_connect() which leads to segfault. The trace points (mem_connect + mem_disconnect) were put in place to pair connect/disconnect using the IDs. The ID is only assigned if __xdp_reg_mem_model() does not return NULL. That connect trace point is of no use if there is no ID. Skip that connect trace point if xdp_alloc is NULL. [ Toke Høiland-Jørgensen delivered the reasoning for skipping the trace point ] Fixes: 4a48ef70b93b8 ("xdp: Allow registering memory model without rxq reference") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/YikmmXsffE+QajTB@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10ice: Fix race condition during interface enslaveIvan Vecera
Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating auxiliary device") changes a process of re-creation of aux device so ice_plug_aux_dev() is called from ice_service_task() context. This unfortunately opens a race window that can result in dead-lock when interface has left LAG and immediately enters LAG again. Reproducer: ``` #!/bin/sh ip link add lag0 type bond mode 1 miimon 100 ip link set lag0 for n in {1..10}; do echo Cycle: $n ip link set ens7f0 master lag0 sleep 1 ip link set ens7f0 nomaster done ``` This results in: [20976.208697] Workqueue: ice ice_service_task [ice] [20976.213422] Call Trace: [20976.215871] __schedule+0x2d1/0x830 [20976.219364] schedule+0x35/0xa0 [20976.222510] schedule_preempt_disabled+0xa/0x10 [20976.227043] __mutex_lock.isra.7+0x310/0x420 [20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core] [20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core] [20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core] [20976.261079] ib_register_device+0x40d/0x580 [ib_core] [20976.266139] irdma_ib_register_device+0x129/0x250 [irdma] [20976.281409] irdma_probe+0x2c1/0x360 [irdma] [20976.285691] auxiliary_bus_probe+0x45/0x70 [20976.289790] really_probe+0x1f2/0x480 [20976.298509] driver_probe_device+0x49/0xc0 [20976.302609] bus_for_each_drv+0x79/0xc0 [20976.306448] __device_attach+0xdc/0x160 [20976.310286] bus_probe_device+0x9d/0xb0 [20976.314128] device_add+0x43c/0x890 [20976.321287] __auxiliary_device_add+0x43/0x60 [20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice] [20976.330109] ice_service_task+0xd0c/0xed0 [ice] [20976.342591] process_one_work+0x1a7/0x360 [20976.350536] worker_thread+0x30/0x390 [20976.358128] kthread+0x10a/0x120 [20976.365547] ret_from_fork+0x1f/0x40 ... [20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084 [20976.446469] Call Trace: [20976.448921] __schedule+0x2d1/0x830 [20976.452414] schedule+0x35/0xa0 [20976.455559] schedule_preempt_disabled+0xa/0x10 [20976.460090] __mutex_lock.isra.7+0x310/0x420 [20976.464364] device_del+0x36/0x3c0 [20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice] [20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice] [20976.477288] notifier_call_chain+0x47/0x70 [20976.481386] __netdev_upper_dev_link+0x18b/0x280 [20976.489845] bond_enslave+0xe05/0x1790 [bonding] [20976.494475] do_setlink+0x336/0xf50 [20976.502517] __rtnl_newlink+0x529/0x8b0 [20976.543441] rtnl_newlink+0x43/0x60 [20976.546934] rtnetlink_rcv_msg+0x2b1/0x360 [20976.559238] netlink_rcv_skb+0x4c/0x120 [20976.563079] netlink_unicast+0x196/0x230 [20976.567005] netlink_sendmsg+0x204/0x3d0 [20976.570930] sock_sendmsg+0x4c/0x50 [20976.574423] ____sys_sendmsg+0x1eb/0x250 [20976.586807] ___sys_sendmsg+0x7c/0xc0 [20976.606353] __sys_sendmsg+0x57/0xa0 [20976.609930] do_syscall_64+0x5b/0x1a0 [20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca 1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev() is called from ice_service_task() context, aux device is created and associated device->lock is taken. 2. Command 'ip link ... set master...' calls ice's notifier under RTNL lock and that notifier calls ice_unplug_aux_dev(). That function tries to take aux device->lock but this is already taken by ice_plug_aux_dev() in step 1 3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already taken in step 2 4. Dead-lock The patch fixes this issue by following changes: - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev() call in ice_service_task() - The bit is checked in ice_clear_rdma_cap() and only if it is not set then ice_unplug_aux_dev() is called. If it is set (in other words plugging of aux device was requested and ice_plug_aux_dev() is potentially running) then the function only clears the bit - Once ice_plug_aux_dev() call (in ice_service_task) is finished the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked whether it was already cleared by ice_clear_rdma_cap(). If so then aux device is unplugged. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Co-developed-by: Petr Oros <poros@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10Merge branch 'md-next' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.18/drivers Pull MD updates from Song: "This set contains raid5 bio handling cleanups for raid5." * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: raid5: initialize the stripe_head embeeded bios as needed raid5-cache: statically allocate the recovery ra bio raid5-cache: fully initialize flush_bio when needed raid5-ppl: fully initialize the bio in ppl_new_iounit
2022-03-10net: phy: meson-gxl: improve link-up behaviorHeiner Kallweit
Sometimes the link comes up but no data flows. This patch fixes this behavior. It's not clear what's the root cause of the issue. According to the tests one other link-up issue remains. In very rare cases the link isn't even reported as up. Fixes: 84c8f773d2dc ("net: phy: meson-gxl: remove the use of .ack_callback()") Tested-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/e3473452-a1f9-efcf-5fdd-02b6f44c3fcd@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10net: bcmgenet: Don't claim WOL when its not availableJeremy Linton
Some of the bcmgenet platforms don't correctly support WOL, yet ethtool returns: "Supports Wake-on: gsf" which is false. Ideally if there isn't a wol_irq, or there is something else that keeps the device from being able to wakeup it should display: "Supports Wake-on: d" This patch checks whether the device can wakup, before using the hard-coded supported flags. This corrects the ethtool reporting, as well as the WOL configuration because ethtool verifies that the mode is supported before attempting it. Fixes: c51de7f3976b ("net: bcmgenet: add Wake-on-LAN support code") Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Peter Robinson <pbrobinson@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220310045535.224450-1-jeremy.linton@arm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10net: arc_emac: Fix use after free in arc_mdio_probe()Jianglei Nie
If bus->state is equal to MDIOBUS_ALLOCATED, mdiobus_free(bus) will free the "bus". But bus->name is still used in the next line, which will lead to a use after free. We can fix it by putting the name in a local variable and make the bus->name point to the rodata section "name",then use the name in the error message without referring to bus to avoid the uaf. Fixes: 95b5fc03c189 ("net: arc_emac: Make use of the helper function dev_err_probe()") Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Link: https://lore.kernel.org/r/20220309121824.36529-1-niejianglei2021@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>