linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-08-30	Merge tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-block	Linus Torvalds
	Pull support for struct bio recycling from Jens Axboe: "This adds bio recycling support for polled IO, allowing quick reuse of a bio for high IOPS scenarios via a percpu bio_set list. It's good for almost a 10% improvement in performance, bumping our per-core IO limit from ~3.2M IOPS to ~3.5M IOPS" * tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-block: bio: improve kerneldoc documentation for bio_alloc_kiocb() block: provide bio_clear_hipri() helper block: use the percpu bio cache in __blkdev_direct_IO io_uring: enable use of bio alloc cache block: clear BIO_PERCPU_CACHE flag if polling isn't supported bio: add allocation cache abstraction fs: add kiocb alloc cache flag bio: optimize initialization of a bio
2021-08-30	Merge tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-block	Linus Torvalds
	Pull io_uring updates from Jens Axboe: - cancellation cleanups (Hao, Pavel) - io-wq accounting cleanup (Hao) - io_uring submit locking fix (Hao) - io_uring link handling fixes (Hao) - fixed file improvements (wangyangbo, Pavel) - allow updates of linked timeouts like regular timeouts (Pavel) - IOPOLL fix (Pavel) - remove batched file get optimization (Pavel) - improve reference handling (Pavel) - IRQ task_work batching (Pavel) - allow pure fixed file, and add support for open/accept (Pavel) - GFP_ATOMIC RT kernel fix - multiple CQ ring waiter improvement - funnel IRQ completions through task_work - add support for limiting async workers explicitly - add different clocksource support for timeouts - io-wq wakeup race fix - lots of cleanups and improvement (Pavel et al) * tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-block: (87 commits) io-wq: fix wakeup race when adding new work io-wq: wqe and worker locks no longer need to be IRQ safe io-wq: check max_worker limits if a worker transitions bound state io_uring: allow updating linked timeouts io_uring: keep ltimeouts in a list io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts io-wq: provide a way to limit max number of workers io_uring: add build check for buf_index overflows io_uring: clarify io_req_task_cancel() locking io_uring: add task-refs-get helper io_uring: fix failed linkchain code logic io_uring: remove redundant req_set_fail() io_uring: don't free request to slab io_uring: accept directly into fixed file table io_uring: hand code io_accept() fd installing io_uring: openat directly into fixed fd table net: add accept helper not installing fd io_uring: fix io_try_cancel_userdata race for iowq io_uring: IRQ rw completion batching io_uring: batch task work locking ...
2021-08-30	Merge tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block	Linus Torvalds
	Pull block updates from Jens Axboe: "Nothing major in here - lots of good cleanups and tech debt handling, which is also evident in the diffstats. In particular: - Add disk sequence numbers (Matteo) - Discard merge fix (Ming) - Relax disk zoned reporting restrictions (Niklas) - Bio error handling zoned leak fix (Pavel) - Start of proper add_disk() error handling (Luis, Christoph) - blk crypto fix (Eric) - Non-standard GPT location support (Dmitry) - IO priority improvements and cleanups (Damien)o - blk-throtl improvements (Chunguang) - diskstats_show() stack reduction (Abd-Alrhman) - Loop scheduler selection (Bart) - Switch block layer to use kmap_local_page() (Christoph) - Remove obsolete disk_name helper (Christoph) - block_device refcounting improvements (Christoph) - Ensure gendisk always has a request queue reference (Christoph) - Misc fixes/cleanups (Shaokun, Oliver, Guoqing)" * tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block: (129 commits) sg: pass the device name to blk_trace_setup block, bfq: cleanup the repeated declaration blk-crypto: fix check for too-large dun_bytes blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN blk-zoned: allow zone management send operations without CAP_SYS_ADMIN block: mark blkdev_fsync static block: refine the disk_live check in del_gendisk mmc: sdhci-tegra: Enable MMC_CAP2_ALT_GPT_TEGRA mmc: block: Support alternative_gpt_sector() operation partitions/efi: Support non-standard GPT location block: Add alternative_gpt_sector() operation bio: fix page leak bio_add_hw_page failure block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT block: remove a pointless call to MINOR() in device_add_disk null_blk: add error handling support for add_disk() virtio_blk: add error handling support for add_disk() block: add error handling for device_add_disk / add_disk block: return errors from disk_alloc_events block: return errors from blk_integrity_add block: call blk_register_queue earlier in device_add_disk ...
2021-08-30	Merge tag 'timers-core-2021-08-30' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timekeeping, timers and related drivers: Core code: - Cure a couple of correctness issues in the posix CPU timer code to prevent that the tick dependency for NOHZ full is kept alive for no reason. - Avoid expensive double reprogramming of the clockevent device in hrtimer_start_range_ns(). - Avoid pointless SMP function calls when the clock was set to avoid disturbing CPUs which do not have any affected timers queued. - Make the clocksource watchdog test work correctly when CONFIG_HZ is less than 100. Drivers: - Prefer the ARM architected timer over the Exynos timer which is way more expensive to access. - Add device tree bindings for new Ingenic SoCs - The usual improvements and cleanups all over the place" * tag 'timers-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) clocksource: Make clocksource watchdog test safe for slow-HZ systems dt-bindings: timer: Add ABIs for new Ingenic SoCs clocksource/drivers/fttmr010: Pass around less pointers clocksource/drivers/mediatek: Optimize systimer irq clear flow on shutdown clocksource/drivers/ingenic: Use bitfield macro helpers clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel dt-bindings: timer: convert rockchip,rk-timer.txt to YAML clocksource/drivers/exynos_mct: Mark MCT device as CLOCK_EVT_FEAT_PERCPU clocksource/drivers/exynos_mct: Prioritise Arm arch timer on arm64 hrtimer: Unbreak hrtimer_force_reprogram() hrtimer: Use raw_cpu_ptr() in clock_was_set() hrtimer: Avoid more SMP function calls in clock_was_set() hrtimer: Avoid unnecessary SMP function calls in clock_was_set() hrtimer: Add bases argument to clock_was_set() time/timekeeping: Avoid invoking clock_was_set() twice timekeeping: Distangle resume and clock-was-set events timerfd: Provide timerfd_resume() hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case hrtimer: Ensure timerfd notification for HIGHRES=n hrtimer: Consolidate reprogramming code ...
2021-08-30	Merge tag 'sched-core-2021-08-30' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - The biggest change in this cycle is scheduler support for asymmetric scheduling affinity, to support the execution of legacy 32-bit tasks on AArch32 systems that also have 64-bit-only CPUs. Architectures can fill in this functionality by defining their own task_cpu_possible_mask(p). When this is done, the scheduler will make sure the task will only be scheduled on CPUs that support it. (The actual arm64 specific changes are not part of this tree.) For other architectures there will be no change in functionality. - Add cgroup SCHED_IDLE support - Increase node-distance flexibility & delay determining it until a CPU is brought online. (This enables platforms where node distance isn't final until the CPU is only.) - Deadline scheduler enhancements & fixes - Misc fixes & cleanups. * tag 'sched-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) eventfd: Make signal recursion protection a task bit sched/fair: Mark tg_is_idle() an inline in the !CONFIG_FAIR_GROUP_SCHED case sched: Introduce dl_task_check_affinity() to check proposed affinity sched: Allow task CPU affinity to be restricted on asymmetric systems sched: Split the guts of sched_setaffinity() into a helper function sched: Introduce task_struct::user_cpus_ptr to track requested affinity sched: Reject CPU affinity changes based on task_cpu_possible_mask() cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq() cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 sched: Introduce task_cpu_possible_mask() to limit fallback rq selection sched: Cgroup SCHED_IDLE support sched/topology: Skip updating masks for non-online nodes sched: Replace deprecated CPU-hotplug functions. sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS sched: Fix UCLAMP_FLAG_IDLE setting sched/deadline: Fix missing clock update in migrate_task_rq_dl() sched/fair: Avoid a second scan of target in select_idle_cpu sched/fair: Use prev instead of new target as recent_used_cpu sched: Don't report SCHED_FLAG_SUGOV in sched_getattr() ...
2021-08-30	Merge tag 'locks-v5.15' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "This starts with a couple of fixes for potential deadlocks in the fowner/fasync handling. The next patch removes the old mandatory locking code from the kernel altogether. The last patch cleans up rw_verify_area a bit more after the mandatory locking removal" * tag 'locks-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: clean up after mandatory file locking support removal fs: remove mandatory file locking support fcntl: fix potential deadlock for &fasync_struct.fa_lock fcntl: fix potential deadlocks for &fown_struct.lock
2021-08-30	Merge tag 'hole_punch_for_v5.15-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fs hole punching vs cache filling race fixes from Jan Kara: "Fix races leading to possible data corruption or stale data exposure in multiple filesystems when hole punching races with operations such as readahead. This is the series I was sending for the last merge window but with your objection fixed - now filemap_fault() has been modified to take invalidate_lock only when we need to create new page in the page cache and / or bring it uptodate" * tag 'hole_punch_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: filesystems/locking: fix Malformed table warning cifs: Fix race between hole punch and page fault ceph: Fix race between hole punch and page fault fuse: Convert to using invalidate_lock f2fs: Convert to using invalidate_lock zonefs: Convert to using invalidate_lock xfs: Convert double locking of MMAPLOCK to use VFS helpers xfs: Convert to use invalidate_lock xfs: Refactor xfs_isilocked() ext2: Convert to using invalidate_lock ext4: Convert to use mapping->invalidate_lock mm: Add functions to lock invalidate_lock for two mappings mm: Protect operations adding pages to page cache with invalidate_lock documentation: Sync file_operations members with reality mm: Fix comments mentioning i_mutex
2021-08-30	NFS: Always provide aligned buffers to the RPC read layers	Trond Myklebust
	Instead of messing around with XDR padding in the RDMA layer, we should just give the RPC layer an aligned buffer. Try to avoid creating extra RPC calls by aligning to the smaller value of ALIGN(len, rsize) and PAGE_SIZE. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-30	Merge tag 'fs_for_v5.15-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF and isofs updates from Jan Kara: "Several smaller fixes and cleanups in UDF and isofs" * tag 'fs_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf_get_extendedattr() had no boundary checks. isofs: joliet: Fix iocharset=utf8 mount option udf: Fix iocharset=utf8 mount option udf: Get rid of 0-length arrays in struct fileIdentDesc udf: Get rid of 0-length arrays udf: Remove unused declaration udf: Check LVID earlier
2021-08-30	Merge tag 'fiemap_for_v5.15-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull FIEMAP cleanups from Jan Kara: "FIEMAP cleanups from Christoph transitioning all remaining filesystems supporting FIEMAP (ext2, hpfs) to iomap API and removing the old helper" * tag 'fiemap_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fs: remove generic_block_fiemap hpfs: use iomap_fiemap to implement ->fiemap ext2: use iomap_fiemap to implement ->fiemap ext2: make ext2_iomap_ops available unconditionally
2021-08-30	f2fs: enable realtime discard iff device supports discard	Chao Yu
	Let's only enable realtime discard if and only if device supports discard functionality. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-30	f2fs: guarantee to write dirty data when enabling checkpoint back	Jaegeuk Kim
	We must flush all the dirty data when enabling checkpoint back. Let's guarantee that first by adding a retry logic on sync_inodes_sb(). In addition to that, this patch adds to flush data in fsync when checkpoint is disabled, which can mitigate the sync_inodes_sb() failures in advance. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-30	f2fs: fix to unmap pages from userspace process in punch_hole()	Chao Yu
	We need to unmap pages from userspace process before removing pagecache in punch_hole() like we did in f2fs_setattr(). Similar change: commit 5e44f8c374dc ("ext4: hole-punch use truncate_pagecache_range") Fixes: fbfa2cc58d53 ("f2fs: add file operations") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-30	f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()	Chao Yu
	In below path, it will return ENOENT if filesystem is shutdown: - f2fs_map_blocks - f2fs_get_dnode_of_data - f2fs_get_node_page - __get_node_page - read_node_page - is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) return -ENOENT - force return value from ENOENT to 0 It should be fine for read case, since it indicates a hole condition, and caller could use .m_next_pgofs to skip the hole and continue the lookup. However it may cause confusing for write case, since leaving a hole there, and said nothing was wrong doesn't help. There is at least one case from dax_iomap_actor() will complain that, so fix this in prior to supporting dax in f2fs. xfstest generic/388 reports below warning: ubuntu godown: xfstests-induced forced shutdown of /mnt/scratch_f2fs: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 485833 at fs/dax.c:1127 dax_iomap_actor+0x339/0x370 Call Trace: iomap_apply+0x1c4/0x7b0 ? dax_iomap_rw+0x1c0/0x1c0 dax_iomap_rw+0xad/0x1c0 ? dax_iomap_rw+0x1c0/0x1c0 f2fs_file_write_iter+0x5ab/0x970 [f2fs] do_iter_readv_writev+0x273/0x2e0 do_iter_write+0xab/0x1f0 vfs_iter_write+0x21/0x40 iter_file_splice_write+0x287/0x540 do_splice+0x37c/0xa60 __x64_sys_splice+0x15f/0x3a0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae ubuntu godown: xfstests-induced forced shutdown of /mnt/scratch_f2fs: ------------[ cut here ]------------ RIP: 0010:dax_iomap_pte_fault.isra.0+0x72e/0x14a0 Call Trace: dax_iomap_fault+0x44/0x70 f2fs_dax_huge_fault+0x155/0x400 [f2fs] f2fs_dax_fault+0x18/0x30 [f2fs] __do_fault+0x4e/0x120 do_fault+0x3cf/0x7a0 __handle_mm_fault+0xa8c/0xf20 ? find_held_lock+0x39/0xd0 handle_mm_fault+0x1b6/0x480 do_user_addr_fault+0x320/0xcd0 ? rcu_read_lock_sched_held+0x67/0xc0 exc_page_fault+0x77/0x3f0 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 Fixes: 83a3bfdb5a8a ("f2fs: indicate shutdown f2fs to allow unmount successfully") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-30	f2fs: fix to account missing .skipped_gc_rwsem	Chao Yu
	There is a missing place we forgot to account .skipped_gc_rwsem, fix it. Fixes: 6f8d4455060d ("f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-30	f2fs: adjust unlock order for cleanup	Chao Yu
	This patch adjusts unlock order of .i_mmap_sem and .i_gc_rwsem for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-30	f2fs: Don't create discard thread when device doesn't support realtime discard	Fengnan Chang
	Don't create discard thread when device doesn't support realtime discard or user specifies nodiscard mount option. Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-08-30	Merge tag 'fsnotify_for_v5.15-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fsnotify speedups when notification actually isn't used and support for identifying processes which caused fanotify events through pidfd instead of normal pid" * tag 'fsnotify_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: optimize the case of no marks of any type fsnotify: count all objects with attached connectors fsnotify: count s_fsnotify_inode_refs for attached connectors fsnotify: replace igrab() with ihold() on attach connector fanotify: add pidfd support to the fanotify API fanotify: introduce a generic info record copying helper fanotify: minor cosmetic adjustments to fid labels kernel/pid.c: implement additional checks upon pidfd_create() parameters kernel/pid.c: remove static qualifier from pidfd_create()
2021-08-30	fs/ntfs3: Restyle comments to better align with kernel-doc	Kari Argillander
	Capitalize comments and end with period for better reading. Also function comments are now little more kernel-doc style. This way we can easily convert them to kernel-doc style if we want. Note that these are not yet complete with this style. Example function comments start with /* and in kernel-doc style they start /**. Use imperative mood in function descriptions. Change words like ntfs -> NTFS, linux -> Linux. Use "we" not "I" when commenting code. Signed-off-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-30	io-wq: fix wakeup race when adding new work	Jens Axboe
	When new work is added, io_wqe_enqueue() checks if we need to wake or create a new worker. But that check is done outside the lock that otherwise synchronizes us with a worker going to sleep, so we can end up in the following situation: CPU0 CPU1 lock insert work unlock atomic_read(nr_running) != 0 lock atomic_dec(nr_running) no wakeup needed Hold the wqe lock around the "need to wakeup" check. Then we can also get rid of the temporary work_flags variable, as we know the work will remain valid as long as we hold the lock. Cc: stable@vger.kernel.org Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-30	io-wq: wqe and worker locks no longer need to be IRQ safe	Jens Axboe
	io_uring no longer queues async work off completion handlers that run in hard or soft interrupt context, and that use case was the only reason that io-wq had to use IRQ safe locks for wqe and worker locks. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-30	io-wq: check max_worker limits if a worker transitions bound state	Jens Axboe
	For the two places where new workers are created, we diligently check if we are allowed to create a new worker. If we're currently at the limit of how many workers of a given type we can have, then we don't create any new ones. If you have a mixed workload with various types of bound and unbounded work, then it can happen that a worker finishes one type of work and is then transitioned to the other type. For this case, we don't check if we are actually allowed to do so. This can cause io-wq to temporarily exceed the allowed number of workers for a given type. When retrieving work, check that the types match. If they don't, check if we are allowed to transition to the other type. If not, then don't handle the new work. Cc: stable@vger.kernel.org Reported-by: Johannes Lundberg <johalun0@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-29	io_uring: allow updating linked timeouts	Pavel Begunkov
	We allow updating normal timeouts, add support for adjusting timings of linked timeouts as well. Reported-by: Victor Stewart <v@nametag.social> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-29	io_uring: keep ltimeouts in a list	Pavel Begunkov
	A preparation patch. Keep all queued linked timeout in a list, so they may be found and updated. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-29	io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts	Jens Axboe
	Certain use cases want to use CLOCK_BOOTTIME or CLOCK_REALTIME rather than CLOCK_MONOTONIC, instead of the default CLOCK_MONOTONIC. Add an IORING_TIMEOUT_BOOTTIME and IORING_TIMEOUT_REALTIME flag that allows timeouts and linked timeouts to use the selected clock source. Only one clock source may be selected, and we -EINVAL the request if more than one is given. If neither BOOTIME nor REALTIME are selected, the previous default of MONOTONIC is used. Link: https://github.com/axboe/liburing/issues/369 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-29	io-wq: provide a way to limit max number of workers	Jens Axboe
	io-wq divides work into two categories: 1) Work that completes in a bounded time, like reading from a regular file or a block device. This type of work is limited based on the size of the SQ ring. 2) Work that may never complete, we call this unbounded work. The amount of workers here is just limited by RLIMIT_NPROC. For various uses cases, it's handy to have the kernel limit the maximum amount of pending workers for both categories. Provide a way to do with with a new IORING_REGISTER_IOWQ_MAX_WORKERS operation. IORING_REGISTER_IOWQ_MAX_WORKERS takes an array of two integers and sets the max worker count to what is being passed in for each category. The old values are returned into that same array. If 0 is being passed in for either category, it simply returns the current value. The value is capped at RLIMIT_NPROC. This actually isn't that important as it's more of a hint, if we're exceeding the value then our attempt to fork a new worker will fail. This happens naturally already if more than one node is in the system, as these values are per-node internally for io-wq. Reported-by: Johannes Lundberg <johalun0@gmail.com> Link: https://github.com/axboe/liburing/issues/420 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-28	eventfd: Make signal recursion protection a task bit	Thomas Gleixner
	The recursion protection for eventfd_signal() is based on a per CPU variable and relies on the !RT semantics of spin_lock_irqsave() for protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither disables preemption nor interrupts which allows the spin lock held section to be preempted. If the preempting task invokes eventfd_signal() as well, then the recursion warning triggers. Paolo suggested to protect the per CPU variable with a local lock, but that's heavyweight and actually not necessary. The goal of this protection is to prevent the task stack from overflowing, which can be achieved with a per task recursion protection as well. Replace the per CPU variable with a per task bit similar to other recursion protection bits like task_struct::in_page_owner. This works on both !RT and RT kernels and removes as a side effect the extra per CPU storage. No functional change for !RT kernels. Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx
2021-08-27	NFSv4.1 add network transport when session trunking is detected	Olga Kornievskaia
	After trunking is discovered in nfs4_discover_server_trunking(), add the transport to the old client structure if the allowed limit of transports has not been reached. An example: there exists a multi-homed server and client mounts one server address and some volume and then doest another mount to a different address of the same server and perhaps a different volume. Previously, the client checks that this is a session trunkable servers (same server), and removes the newly created client structure along with its transport. Now, the client adds the connection from the 2nd mount into the xprt switch of the existing client (it leads to having 2 available connections). Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27	SUNRPC enforce creation of no more than max_connect xprts	Olga Kornievskaia
	If we are adding new transports via rpc_clnt_test_and_add_xprt() then check if we've reached the limit. Currently only pnfs path adds transports via that function but this is done in preparation when the client would add new transports when session trunking is detected. A warning is logged if the limit is reached. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27	NFSv4 introduce max_connect mount options	Olga Kornievskaia
	This option will control up to how many xprts can the client establish to the server with a distinct address (that means nconnect connections are not counted towards this new limit). This patch is setting up nfs structures to keeep track of the max_connect limit (does not enforce it). The default value is kept at 1 so that no current mounts that don't want any additional connections would be effected. The maximum value is set at 16. Mounts to DS are not limited to default value of 1 but instead set to the maximum default value of 16 (NFS_MAX_TRANSPORTS). Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27	NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox	Ye Bin
	As eb96d5c97b08 ("SUNRPC handle EKEYEXPIRED in call_refreshresult") commit handle EKEYEXPIRED in call_refreshresult, so there is only handle when "task->tk_status" is equal "-EJUKEBOX" in nfs3_async_handle_jukebox. Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27	ksmbd: fix __write_overflow warning in ndr_read_string	Namjae Jeon
	Dan reported __write_overflow warning in ndr_read_string. CC [M] fs/ksmbd/ndr.o In file included from ./include/linux/string.h:253, from ./include/linux/bitmap.h:11, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/cpumask.h:5, from ./arch/x86/include/asm/msr.h:11, from ./arch/x86/include/asm/processor.h:22, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:60, from ./arch/x86/include/asm/preempt.h:7, from ./include/linux/preempt.h:78, from ./include/linux/spinlock.h:55, from ./include/linux/wait.h:9, from ./include/linux/wait_bit.h:8, from ./include/linux/fs.h:6, from fs/ksmbd/ndr.c:7: In function memcpy, inlined from ndr_read_string at fs/ksmbd/ndr.c:86:2, inlined from ndr_decode_dos_attr at fs/ksmbd/ndr.c:167:2: ./include/linux/fortify-string.h:219:4: error: call to __write_overflow declared with attribute error: detected write beyond size of object __write_overflow(); ^~~~~~~~~~~~~~~~~~ This seems to be a false alarm because hex_attr size is always smaller than n->length. This patch fix this warning by allocation hex_attr with n->length. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-08-27	io_uring: add build check for buf_index overflows	Pavel Begunkov
	req->buf_index is u16 and so we rely on registered buffers indexes fitting into it. Add a build check, so when the upper limit for the number of buffers is lifted we get a compliation fail but not lurking problems. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/787e8e1a17cea51ca6301426b1c4c4887b8bd676.1629920396.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-27	io_uring: clarify io_req_task_cancel() locking	Pavel Begunkov
	It's too easy to forget and misjudge about synchronisation in io_req_task_cancel(), add a comment clarifying it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/71099083835f983a1fd73d5a3da6391924da8300.1629920396.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-27	fs/ntfs3: Fix error handling in indx_insert_into_root()	Dan Carpenter
	There are three bugs in this code: 1) If indx_get_root() fails, then return -EINVAL instead of success. 2) On the "/* make root external */" -EOPNOTSUPP; error path it should free "re" but it has a memory leak. 3) If indx_new() fails then it will lead to an error pointer dereference when we call put_indx_node(). I've re-written the error handling to be more clear. Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Potential NULL dereference in hdr_find_split()	Dan Carpenter
	The "e" pointer is dereferenced before it has been checked for NULL. Move the dereference after the NULL check to prevent an Oops. Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Fix error code in indx_add_allocate()	Dan Carpenter
	Return -EINVAL if ni_find_attr() fails. Don't return success. Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: fix an error code in ntfs_get_acl_ex()	Dan Carpenter
	The ntfs_get_ea() function returns negative error codes or on success it returns the length. In the original code a zero length return was treated as -ENODATA and results in a NULL return. But it should be treated as an invalid length and result in an PTR_ERR(-EINVAL) return. Fixes: be71b5cba2e6 ("fs/ntfs3: Add attrib operations") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: add checks for allocation failure	Dan Carpenter
	Add a check for when the kzalloc() in init_rsttbl() fails. Some of the callers checked for NULL and some did not. I went down the call tree and added NULL checks where ever they were missing. Fixes: b46acd6a6a62 ("fs/ntfs3: Add NTFS journal") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Use kcalloc/kmalloc_array over kzalloc/kmalloc	Kari Argillander
	Use kcalloc/kmalloc_array over kzalloc/kmalloc when we allocate array. Checkpatch found these after we did not use our own defined allocation wrappers. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Do not use driver own alloc wrappers	Kari Argillander
	Problem with these wrapper is that we cannot take off example GFP_NOFS flag. It is not recomended use those in all places. Also if we change one driver specific wrapper to kernel wrapper then it would look really weird. People should be most familiar with kernel wrappers so let's just use those ones. Driver specific alloc wrapper also confuse some static analyzing tools, good example is example kernels checkpatch tool. After we converter these to kernel specific then warnings is showed. Following Coccinelle script was used to automate changing. virtual patch @alloc depends on patch@ expression x; expression y; @@ ( - ntfs_malloc(x) + kmalloc(x, GFP_NOFS) \| - ntfs_zalloc(x) + kzalloc(x, GFP_NOFS) \| - ntfs_vmalloc(x) + kvmalloc(x, GFP_NOFS) \| - ntfs_free(x) + kfree(x) \| - ntfs_vfree(x) + kvfree(x) \| - ntfs_memdup(x, y) + kmemdup(x, y, GFP_NOFS) ) Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Use kernel ALIGN macros over driver specific	Kari Argillander
	The static checkers (Smatch) were complaining because QuadAlign() was buggy. If you try to align something higher than UINT_MAX it got truncated to a u32. Smatch warning was: fs/ntfs3/attrib.c:383 attr_set_size_res() warn: was expecting a 64 bit value instead of '~7' So that this will not happen again we will change all these macros to kernel made ones. This can also help some other static analyzing tools to give us better warnings. Patch was generated with Coccinelle script and after that some style issue was hand fixed. Coccinelle script: virtual patch @alloc depends on patch@ expression x; @@ ( - #define QuadAlign(n) (((n) + 7u) & (~7u)) \| - QuadAlign(x) + ALIGN(x, 8) \| - #define IsQuadAligned(n) (!((size_t)(n)&7u)) \| - IsQuadAligned(x) + IS_ALIGNED(x, 8) \| - #define Quad2Align(n) (((n) + 15u) & (~15u)) \| - Quad2Align(x) + ALIGN(x, 16) \| - #define IsQuad2Aligned(n) (!((size_t)(n)&15u)) \| - IsQuad2Aligned(x) + IS_ALIGNED(x, 16) \| - #define Quad4Align(n) (((n) + 31u) & (~31u)) \| - Quad4Align(x) + ALIGN(x, 32) \| - #define IsSizeTAligned(n) (!((size_t)(n) & (sizeof(size_t) - 1))) \| - IsSizeTAligned(x) + IS_ALIGNED(x, sizeof(size_t)) \| - #define DwordAlign(n) (((n) + 3u) & (~3u)) \| - DwordAlign(x) + ALIGN(x, 4) \| - #define IsDwordAligned(n) (!((size_t)(n)&3u)) \| - IsDwordAligned(x) + IS_ALIGNED(x, 4) \| - #define WordAlign(n) (((n) + 1u) & (~1u)) \| - WordAlign(x) + ALIGN(x, 2) \| - #define IsWordAligned(n) (!((size_t)(n)&1u)) \| - IsWordAligned(x) + IS_ALIGNED(x, 2) \| ) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Restyle comment block in ni_parse_reparse()	Kari Argillander
	First of this fix one none utf8 char in this comment block. Maybe this happened because error in filesystem ;) Also this block was hard to read because long lines so make it max 80 long. And while we doing this stuff make little better grammer. Signed-off-by: Kari Argillander <kari.argillander@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Remove unused including <linux/version.h>	Jiapeng Chong
	Eliminate the follow versioncheck warning: ./fs/ntfs3/inode.c: 16 linux/version.h not needed. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Fix fall-through warnings for Clang	Gustavo A. R. Silva
	Fix the following fallthrough warnings: fs/ntfs3/inode.c:1792:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/ntfs3/index.c:178:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] This helps with the ongoing efforts to globally enable -Wimplicit-fallthrough for Clang. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Fix one none utf8 char in source file	Kari Argillander
	In one source file there is for some reason non utf8 char. But hey this is fs development so this kind of thing might happen. Signed-off-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Remove unused variable cnt in ntfs_security_init()	Nathan Chancellor
	Clang warns: fs/ntfs3/fsntfs.c:1874:9: warning: variable 'cnt' set but not used [-Wunused-but-set-variable] size_t cnt, off; ^ 1 warning generated. It is indeed unused so remove it. Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Fix integer overflow in multiplication	Colin Ian King
	The multiplication of the u32 data_size with a int is being performed using 32 bit arithmetic however the results is being assigned to the variable nbits that is a size_t (64 bit) value. Fix a potential integer overflow by casting the u32 value to a size_t before the multiply to use a size_t sized bit multiply operation. Addresses-Coverity: ("Unintentional integer overflow") Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Add ifndef + define to all header files	Kari Argillander
	Add guards so that compiler will only include header files once. Signed-off-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2021-08-27	fs/ntfs3: Use linux/log2 is_power_of_2 function	Kari Argillander
	We do not need our own implementation for this function in this driver. It is much better to use generic one. Signed-off-by: Kari Argillander <kari.argillander@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>