summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-07-10f2fs: allow all the users to pin a fileJaegeuk Kim
This patch allows users to pin files. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-10xfs: chain bios the right way around in xfs_rw_bdevChristoph Hellwig
We need to chain the earlier bios to the later ones, so that submit_bio_wait waits on the bio that all the completions are dispatched to. Fixes: 6ad5b3255b9e ("xfs: use bios directly to read and write the log recovery buffers") Reported-by: Dave Chinner <david@fromorbit.com> Tested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-07-10udf: Fix incorrect final NOT_ALLOCATED (hole) extent lengthSteven J. Magnani
In some cases, using the 'truncate' command to extend a UDF file results in a mismatch between the length of the file's extents (specifically, due to incorrect length of the final NOT_ALLOCATED extent) and the information (file) length. The discrepancy can prevent other operating systems (i.e., Windows 10) from opening the file. Two particular errors have been observed when extending a file: 1. The final extent is larger than it should be, having been rounded up to a multiple of the block size. B. The final extent is not shorter than it should be, due to not having been updated when the file's information length was increased. [JK: simplified udf_do_extend_final_block(), fixed up some types] Fixes: 2c948b3f86e5 ("udf: Avoid IO in udf_clear_inode") CC: stable@vger.kernel.org Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Link: https://lore.kernel.org/r/1561948775-5878-1-git-send-email-steve@digidescorp.com Signed-off-by: Jan Kara <jack@suse.cz>
2019-07-09nfsd: Make __get_nfsdfs_client() staticYueHaibing
Fix sparse warning: fs/nfsd/nfsctl.c:1221:22: warning: symbol '__get_nfsdfs_client' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-09nfsd: Make two functions staticYueHaibing
Fix sparse warnings: fs/nfsd/nfs4state.c:1908:6: warning: symbol 'drop_client' was not declared. Should it be static? fs/nfsd/nfs4state.c:2518:6: warning: symbol 'force_expire_client' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-09Merge branch 'x86-kdump-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x865 kdump updates from Thomas Gleixner: "Yet more kexec/kdump updates: - Properly support kexec when AMD's memory encryption (SME) is enabled - Pass reserved e820 ranges to the kexec kernel so both PCI and SME can work" * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/proc/vmcore: Enable dumping of encrypted memory when SEV was active x86/kexec: Set the C-bit in the identity map page table when SEV is active x86/kexec: Do not map kexec area as decrypted when SEV is active x86/crash: Add e820 reserved ranges to kdump kernel's e820 table x86/mm: Rework ioremap resource mapping determination x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED x86/mm: Create a workarea in the kernel for SME early encryption x86/mm: Identify the end of the kernel area to be reserved
2019-07-09Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle on the kernel side were: - CPU PMU and uncore driver updates to Intel Snow Ridge, IceLake, KabyLake, AmberLake and WhiskeyLake CPUs. - Rework the MSR probing infrastructure to make it more robust, make it work better on virtualized systems and to better expose it on sysfs. - Rework PMU attributes group support based on the feedback from Greg. The core sysfs patch that adds sysfs_update_groups() was acked by Greg. There's a lot of perf tooling changes as well, all around the place: - vendor updates to Intel, cs-etm (ARM), ARM64, s390, - various enhancements to Intel PT tooling support: - Improve CBR (Core to Bus Ratio) packets support. - Export power and ptwrite events to sqlite and postgresql. - Add support for decoding PEBS via PT packets. - Add support for samples to contain IPC ratio, collecting cycles information from CYC packets, showing the IPC info periodically - Allow using time ranges - lots of updates to perf pmu, perf stat, perf trace, eBPF support, perf record, perf diff, etc. - please see the shortlog and Git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (252 commits) tools arch x86: Sync asm/cpufeatures.h with the with the kernel tools build: Check if gettid() is available before providing helper perf jvmti: Address gcc string overflow warning for strncpy() perf python: Remove -fstack-protector-strong if clang doesn't have it perf annotate TUI browser: Do not use member from variable within its own initialization perf tests: Fix record+probe_libc_inet_pton.sh for powerpc64 perf evsel: Do not rely on errno values for precise_ip fallback perf thread: Allow references to thread objects after machine__exit() perf header: Assign proper ff->ph in perf_event__synthesize_features() tools arch kvm: Sync kvm headers with the kernel sources perf script: Allow specifying the files to process guest samples perf tools metric: Don't include duration_time in group perf list: Avoid extra : for --raw metrics perf vendor events intel: Metric fixes for SKX/CLX perf tools: Fix typos / broken sentences perf jevents: Add support for Hisi hip08 L3C PMU aliasing perf jevents: Add support for Hisi hip08 HHA PMU aliasing perf jevents: Add support for Hisi hip08 DDRC PMU aliasing perf pmu: Support more complex PMU event aliasing perf diff: Documentation -c cycles option ...
2019-07-09Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "This is the main block updates for 5.3. Nothing earth shattering or major in here, just fixes, additions, and improvements all over the map. This contains: - Series of documentation fixes (Bart) - Optimization of the blk-mq ctx get/put (Bart) - null_blk removal race condition fix (Bob) - req/bio_op() cleanups (Chaitanya) - Series cleaning up the segment accounting, and request/bio mapping (Christoph) - Series cleaning up the page getting/putting for bios (Christoph) - block cgroup cleanups and moving it to where it is used (Christoph) - block cgroup fixes (Tejun) - Series of fixes and improvements to bcache, most notably a write deadlock fix (Coly) - blk-iolatency STS_AGAIN and accounting fixes (Dennis) - Series of improvements and fixes to BFQ (Douglas, Paolo) - debugfs_create() return value check removal for drbd (Greg) - Use struct_size(), where appropriate (Gustavo) - Two lighnvm fixes (Heiner, Geert) - MD fixes, including a read balance and corruption fix (Guoqing, Marcos, Xiao, Yufen) - block opal shadow mbr additions (Jonas, Revanth) - sbitmap compare-and-exhange improvemnts (Pavel) - Fix for potential bio->bi_size overflow (Ming) - NVMe pull requests: - improved PCIe suspent support (Keith Busch) - error injection support for the admin queue (Akinobu Mita) - Fibre Channel discovery improvements (James Smart) - tracing improvements including nvmetc tracing support (Minwoo Im) - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya Kulkarni)" - Various little fixes and improvements to drivers and core" * tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits) blk-iolatency: fix STS_AGAIN handling block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES blk-mq: simplify blk_mq_make_request() blk-mq: remove blk_mq_put_ctx() sbitmap: Replace cmpxchg with xchg block: fix .bi_size overflow block: sed-opal: check size of shadow mbr block: sed-opal: ioctl for writing to shadow mbr block: sed-opal: add ioctl for done-mark of shadow mbr block: never take page references for ITER_BVEC direct-io: use bio_release_pages in dio_bio_complete block_dev: use bio_release_pages in bio_unmap_user block_dev: use bio_release_pages in blkdev_bio_end_io iomap: use bio_release_pages in iomap_dio_bio_end_io block: use bio_release_pages in bio_map_user_iov block: use bio_release_pages in bio_unmap_user block: optionally mark pages dirty in bio_release_pages block: move the BIO_NO_PAGE_REF check into bio_release_pages block: skd_main.c: Remove call to memset after dma_alloc_coherent block: mtip32xx: Remove call to memset after dma_alloc_coherent ...
2019-07-09xfs: bump INUMBERS cursor correctly in xfs_inumbers_walkDarrick J. Wong
There's a subtle unit conversion error when we increment the INUMBERS cursor at the end of xfs_inumbers_walk. If there's an inode chunk at the very end of the AG /and/ the AG size is a perfect power of two, the startino of that last chunk (which is in units of AG inodes) will be 63 less than (1 << agino_log). If we add XFS_INODES_PER_CHUNK to the startino, we end up with a startino that's larger than (1 << agino_log) and when we convert that back to fs inode units we'll rip off that upper bit and wind up back at the start of the AG. Fix this by converting to units of fs inodes before adding XFS_INODES_PER_CHUNK so that we'll harmlessly end up pointing to the next AG. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-08Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull force_sig() argument change from Eric Biederman: "A source of error over the years has been that force_sig has taken a task parameter when it is only safe to use force_sig with the current task. The force_sig function is built for delivering synchronous signals such as SIGSEGV where the userspace application caused a synchronous fault (such as a page fault) and the kernel responded with a signal. Because the name force_sig does not make this clear, and because the force_sig takes a task parameter the function force_sig has been abused for sending other kinds of signals over the years. Slowly those have been fixed when the oopses have been tracked down. This set of changes fixes the remaining abusers of force_sig and carefully rips out the task parameter from force_sig and friends making this kind of error almost impossible in the future" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus signal: Remove the signal number and task parameters from force_sig_info signal: Factor force_sig_info_to_task out of force_sig_info signal: Generate the siginfo in force_sig signal: Move the computation of force into send_signal and correct it. signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal signal: Remove the task parameter from force_sig_fault signal: Use force_sig_fault_to_task for the two calls that don't deliver to current signal: Explicitly call force_sig_fault on current signal/unicore32: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from ptrace_break signal/nds32: Remove tsk parameter from send_sigtrap signal/riscv: Remove tsk parameter from do_trap signal/sh: Remove tsk parameter from force_sig_info_fault signal/um: Remove task parameter from send_sigtrap signal/x86: Remove task parameter from send_sigtrap signal: Remove task parameter from force_sig_mceerr signal: Remove task parameter from force_sig signal: Remove task parameter from force_sigsegv ...
2019-07-08pstore: Fix double-free in pstore_mkfile() failure pathNorbert Manthey
The pstore_mkfile() function is passed a pointer to a struct pstore_record. On success it consumes this 'record' pointer and references it from the created inode. On failure, however, it may or may not free the record. There are even two different code paths which return -ENOMEM -- one of which does and the other doesn't free the record. Make the behaviour deterministic by never consuming and freeing the record when returning failure, allowing the caller to do the cleanup consistently. Signed-off-by: Norbert Manthey <nmanthey@amazon.de> Link: https://lore.kernel.org/r/1562331960-26198-1-git-send-email-nmanthey@amazon.de Fixes: 83f70f0769ddd ("pstore: Do not duplicate record metadata") Fixes: 1dfff7dd67d1a ("pstore: Pass record contents instead of copying") Cc: stable@vger.kernel.org [kees: also move "private" allocation location, rename inode cleanup label] Signed-off-by: Kees Cook <keescook@chromium.org>
2019-07-08pstore: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2019-07-08pstore/ram: Improve backward compatibility with older ChromebooksDouglas Anderson
When you try to run an upstream kernel on an old ARM-based Chromebook you'll find that console-ramoops doesn't work. Old ARM-based Chromebooks, before <https://crrev.com/c/439792> ("ramoops: support upstream {console,pmsg,ftrace}-size properties") used to create a "ramoops" node at the top level that looked like: / { ramoops { compatible = "ramoops"; reg = <...>; record-size = <...>; dump-oops; }; }; ...and these Chromebooks assumed that the downstream kernel would make console_size / pmsg_size match the record size. The above ramoops node was added by the firmware so it's not easy to make any changes. Let's match the expected behavior, but only for those using the old backward-compatible way of working where ramoops is right under the root node. NOTE: if there are some out-of-tree devices that had ramoops at the top level, left everything but the record size as 0, and somehow doesn't want this behavior, we can try to add more conditions here. Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2019-07-08Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Here is the crypto update for 5.3: API: - Test shash interface directly in testmgr - cra_driver_name is now mandatory Algorithms: - Replace arc4 crypto_cipher with library helper - Implement 5 way interleave for ECB, CBC and CTR on arm64 - Add xxhash - Add continuous self-test on noise source to drbg - Update jitter RNG Drivers: - Add support for SHA204A random number generator - Add support for 7211 in iproc-rng200 - Fix fuzz test failures in inside-secure - Fix fuzz test failures in talitos - Fix fuzz test failures in qat" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (143 commits) crypto: stm32/hash - remove interruptible condition for dma crypto: stm32/hash - Fix hmac issue more than 256 bytes crypto: stm32/crc32 - rename driver file crypto: amcc - remove memset after dma_alloc_coherent crypto: ccp - Switch to SPDX license identifiers crypto: ccp - Validate the the error value used to index error messages crypto: doc - Fix formatting of new crypto engine content crypto: doc - Add parameter documentation crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR crypto: arm64/aes-ce - add 5 way interleave routines crypto: talitos - drop icv_ool crypto: talitos - fix hash on SEC1. crypto: talitos - move struct talitos_edesc into talitos.h lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE crypto/NX: Set receive window credits to max number of CRBs in RxFIFO crypto: asymmetric_keys - select CRYPTO_HASH where needed crypto: serpent - mark __serpent_setkey_sbox noinline crypto: testmgr - dynamically allocate crypto_shash crypto: testmgr - dynamically allocate testvec_config crypto: talitos - eliminate unneeded 'done' functions at build time ...
2019-07-08nfsd: Fix misuse of strlcpyJoe Perches
Probable cut&paste typo - use the correct field size. (Not currently a practical problem since these two fields have the same size, but we should fix it anyway.) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-08Merge tag 'keys-acl-20190703' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring ACL support from David Howells: "This changes the permissions model used by keys and keyrings to be based on an internal ACL by the following means: - Replace the permissions mask internally with an ACL that contains a list of ACEs, each with a specific subject with a permissions mask. Potted default ACLs are available for new keys and keyrings. ACE subjects can be macroised to indicate the UID and GID specified on the key (which remain). Future commits will be able to add additional subject types, such as specific UIDs or domain tags/namespaces. Also split a number of permissions to give finer control. Examples include splitting the revocation permit from the change-attributes permit, thereby allowing someone to be granted permission to revoke a key without allowing them to change the owner; also the ability to join a keyring is split from the ability to link to it, thereby stopping a process accessing a keyring by joining it and thus acquiring use of possessor permits. - Provide a keyctl to allow the granting or denial of one or more permits to a specific subject. Direct access to the ACL is not granted, and the ACL cannot be viewed" * tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: keys: Provide KEYCTL_GRANT_PERMISSION keys: Replace uid/gid/perm permissions checking with an ACL
2019-07-08Merge tag 'keys-namespace-20190627' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring namespacing from David Howells: "These patches help make keys and keyrings more namespace aware. Firstly some miscellaneous patches to make the process easier: - Simplify key index_key handling so that the word-sized chunks assoc_array requires don't have to be shifted about, making it easier to add more bits into the key. - Cache the hash value in the key so that we don't have to calculate on every key we examine during a search (it involves a bunch of multiplications). - Allow keying_search() to search non-recursively. Then the main patches: - Make it so that keyring names are per-user_namespace from the point of view of KEYCTL_JOIN_SESSION_KEYRING so that they're not accessible cross-user_namespace. keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEYRING_NAME for this. - Move the user and user-session keyrings to the user_namespace rather than the user_struct. This prevents them propagating directly across user_namespaces boundaries (ie. the KEY_SPEC_* flags will only pick from the current user_namespace). - Make it possible to include the target namespace in which the key shall operate in the index_key. This will allow the possibility of multiple keys with the same description, but different target domains to be held in the same keyring. keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEY_TAG for this. - Make it so that keys are implicitly invalidated by removal of a domain tag, causing them to be garbage collected. - Institute a network namespace domain tag that allows keys to be differentiated by the network namespace in which they operate. New keys that are of a type marked 'KEY_TYPE_NET_DOMAIN' are assigned the network domain in force when they are created. - Make it so that the desired network namespace can be handed down into the request_key() mechanism. This allows AFS, NFS, etc. to request keys specific to the network namespace of the superblock. This also means that the keys in the DNS record cache are thenceforth namespaced, provided network filesystems pass the appropriate network namespace down into dns_query(). For DNS, AFS and NFS are good, whilst CIFS and Ceph are not. Other cache keyrings, such as idmapper keyrings, also need to set the domain tag - for which they need access to the network namespace of the superblock" * tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: keys: Pass the network namespace into request_key mechanism keys: Network namespace domain tag keys: Garbage collect keys for which the domain has been removed keys: Include target namespace in match criteria keys: Move the user and user-session keyrings to the user_namespace keys: Namespace keyring names keys: Add a 'recurse' flag for keyring searches keys: Cache the hash value to avoid lots of recalculation keys: Simplify key description management
2019-07-08Merge branch 'x86-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 AVX512 status update from Ingo Molnar: "This adds a new ABI that the main scheduler probably doesn't want to deal with but HPC job schedulers might want to use: the AVX512_elapsed_ms field in the new /proc/<pid>/arch_status task status file, which allows the user-space job scheduler to cluster such tasks, to avoid turbo frequency drops" * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/filesystems/proc.txt: Add arch_status file x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_status proc: Add /proc/<pid>/arch_status
2019-07-08Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Remove the unused per rq load array and all its infrastructure, by Dietmar Eggemann. - Add utilization clamping support by Patrick Bellasi. This is a refinement of the energy aware scheduling framework with support for boosting of interactive and capping of background workloads: to make sure critical GUI threads get maximum frequency ASAP, and to make sure background processing doesn't unnecessarily move to cpufreq governor to higher frequencies and less energy efficient CPU modes. - Add the bare minimum of tracepoints required for LISA EAS regression testing, by Qais Yousef - which allows automated testing of various power management features, including energy aware scheduling. - Restructure the former tsk_nr_cpus_allowed() facility that the -rt kernel used to modify the scheduler's CPU affinity logic such as migrate_disable() - introduce the task->cpus_ptr value instead of taking the address of &task->cpus_allowed directly - by Sebastian Andrzej Siewior. - Misc optimizations, fixes, cleanups and small enhancements - see the Git log for details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) sched/uclamp: Add uclamp support to energy_compute() sched/uclamp: Add uclamp_util_with() sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks sched/uclamp: Set default clamps for RT tasks sched/uclamp: Reset uclamp values on RESET_ON_FORK sched/uclamp: Extend sched_setattr() to support utilization clamping sched/core: Allow sched_setattr() to use the current policy sched/uclamp: Add system default clamps sched/uclamp: Enforce last task's UCLAMP_MAX sched/uclamp: Add bucket local max tracking sched/uclamp: Add CPU's clamp buckets refcounting sched/fair: Rename weighted_cpuload() to cpu_runnable_load() sched/debug: Export the newly added tracepoints sched/debug: Add sched_overutilized tracepoint sched/debug: Add new tracepoint to track PELT at se level sched/debug: Add new tracepoints to track PELT at rq level sched/debug: Add a new sched_trace_*() helper functions sched/autogroup: Make autogroup_path() always available sched/wait: Deduplicate code with do-while sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity() ...
2019-07-08Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - rwsem scalability improvements, phase #2, by Waiman Long, which are rather impressive: "On a 2-socket 40-core 80-thread Skylake system with 40 reader and writer locking threads, the min/mean/max locking operations done in a 5-second testing window before the patchset were: 40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810 40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255 After the patchset, they became: 40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741 40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098" There's a lot of changes to the locking implementation that makes it similar to qrwlock, including owner handoff for more fair locking. Another microbenchmark shows how across the spectrum the improvements are: "With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 2-socket Skylake system with equal numbers of readers and writers (mixed) before and after this patchset were: # of Threads Before Patch After Patch ------------ ------------ ----------- 2 2,618 4,193 4 1,202 3,726 8 802 3,622 16 729 3,359 32 319 2,826 64 102 2,744" The changes are extensive and the patch-set has been through several iterations addressing various locking workloads. There might be more regressions, but unless they are pathological I believe we want to use this new implementation as the baseline going forward. - jump-label optimizations by Daniel Bristot de Oliveira: the primary motivation was to remove IPI disturbance of isolated RT-workload CPUs, which resulted in the implementation of batched jump-label updates. Beyond the improvement of the real-time characteristics kernel, in one test this patchset improved static key update overhead from 57 msecs to just 1.4 msecs - which is a nice speedup as well. - atomic64_t cross-arch type cleanups by Mark Rutland: over the last ~10 years of atomic64_t existence the various types used by the APIs only had to be self-consistent within each architecture - which means they became wildly inconsistent across architectures. Mark puts and end to this by reworking all the atomic64 implementations to use 's64' as the base type for atomic64_t, and to ensure that this type is consistently used for parameters and return values in the API, avoiding further problems in this area. - A large set of small improvements to lockdep by Yuyang Du: type cleanups, output cleanups, function return type and othr cleanups all around the place. - A set of percpu ops cleanups and fixes by Peter Zijlstra. - Misc other changes - please see the Git log for more details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits) locking/lockdep: increase size of counters for lockdep statistics locking/atomics: Use sed(1) instead of non-standard head(1) option locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING x86/jump_label: Make tp_vec_nr static x86/percpu: Optimize raw_cpu_xchg() x86/percpu, sched/fair: Avoid local_clock() x86/percpu, x86/irq: Relax {set,get}_irq_regs() x86/percpu: Relax smp_processor_id() x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}() locking/rwsem: Guard against making count negative locking/rwsem: Adaptive disabling of reader optimistic spinning locking/rwsem: Enable time-based spinning on reader-owned rwsem locking/rwsem: Make rwsem->owner an atomic_long_t locking/rwsem: Enable readers spinning on writer locking/rwsem: Clarify usage of owner's nonspinaable bit locking/rwsem: Wake up almost all readers in wait queue locking/rwsem: More optimal RT task handling of null owner locking/rwsem: Always release wait_lock before waking up tasks locking/rwsem: Implement lock handoff to prevent lock starvation locking/rwsem: Make rwsem_spin_on_owner() return owner state ...
2019-07-08Merge tag 'v5.2' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-08debugfs: make error message a bit more verboseGreg Kroah-Hartman
When a file/directory is already present in debugfs, and it is attempted to be created again, be more specific about what file/directory is being created and where it is trying to be created to give a bit more help to developers to figure out the problem. Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mark Brown <broonie@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20190706154256.GA2683@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-06xfs: don't update lastino for FSBULKSTAT_SINGLEDarrick J. Wong
The kernel test robot found a regression of xfs/054 in the conversion of bulkstat to use the new iwalk infrastructure -- if a caller set *lastip = 128 and invoked FSBULKSTAT_SINGLE, the bstat info would be for inode 128, but *lastip would be increased by the kernel to 129. FSBULKSTAT_SINGLE never incremented lastip before, so it's incorrect to make such an update to the internal lastino value now. Fixes: 2810bd6840e463 ("xfs: convert bulkstat to new iwalk infrastructure") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-06Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixlet from Al Viro: "Fix bogus default y in Kconfig (VALIDATE_FS_PARSER) That thing should not be turned on by default, especially since it's not quiet in case it finds no problems. Geert has sent the obvious fix quite a few times, but it fell through the cracks" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: VALIDATE_FS_PARSER should default to n
2019-07-05Merge tag 'nfsd-5.2-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "Two more quick bugfixes for nfsd: fixing a regression causing mount failures on high-memory machines and fixing the DRC over RDMA" * tag 'nfsd-5.2-2' of git://linux-nfs.org/~bfields/linux: nfsd: Fix overflow causing non-working mounts on 1 TB machines svcrdma: Ignore source port when computing DRC hash
2019-07-05xfs: online scrub needn't bother zeroing its temporary bufferDarrick J. Wong
The xattr scrubber functions use the temporary memory buffer either for storing bitmaps or for testing if attribute value extraction works. The bitmap code always zeroes what it needs and the value extraction sets the buffer contents, so it's not necessary to waste CPU time zeroing on allocation. Note that while we never read the contents that the attr value extraction function sets, we do need to call it to check the remote attribute header and CRCs to check for corruption. A flame graph analysis showed that we were spending 7% of a xfs_scrub run (the whole program, not just the attr scrubber itself) allocating and zeroing 64k segments needlessly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-05xfs: only allocate memory for scrubbing attributes when we need itDarrick J. Wong
In examining a flame graph of time spent running xfs_scrub on various filesystems, I noticed that we spent nearly 7% of the total runtime on allocating a zeroed 65k buffer for every SCRUB_TYPE_XATTR invocation. We do this even if none of the attribute values were anywhere near 64k in size, even if there were no attribute blocks to check space on, and even if it just turns out there are no attributes at all. Therefore, rearrange the xattr buffer setup code to support reallocating with a bigger buffer and redistribute the callers of that function so that we only allocate memory just prior to needing it, and only allocate as much as we need. If we can't get memory with the ILOCK held we'll bail out with EDEADLOCK which will allocate the maximum memory. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-05xfs: refactor attr scrub memory allocation functionDarrick J. Wong
Move the code that allocates memory buffers for the extended attribute scrub code into a separate function so we can reduce memory allocations in the next patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-05xfs: refactor extended attribute buffer pointer functionsDarrick J. Wong
Replace the open-coded attribute buffer pointer calculations with helper functions to make it more obvious what we're doing with our freeform memory allocation w.r.t. either storing xattr values or computing btree block free space. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-05xfs: attribute scrub should use seen_enough to pass error valuesDarrick J. Wong
When we're iterating all the attributes using the built-in xattr iterator, we can use the seen_enough variable to pass error codes back to the main scrub function instead of flattening them into 0/1. This will be used in a more exciting fashion in upcoming patches. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-05fs: VALIDATE_FS_PARSER should default to nGeert Uytterhoeven
CONFIG_VALIDATE_FS_PARSER is a debugging tool to check that the parser tables are vaguely sane. It was set to default to 'Y' for the moment to catch errors in upcoming fs conversion development. Make sure it is not enabled by default in the final release of v5.1. Fixes: 31d921c7fb969172 ("vfs: Add configuration parser helpers") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more fixes from Andrew Morton: "5 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: swap_readpage(): avoid blk_wake_io_task() if !synchronous devres: allow const resource arguments mm/vmscan.c: prevent useless kswapd loops fs/userfaultfd.c: disable irqs for fault_pending and event locks mm/page_alloc.c: fix regression with deferred struct page init
2019-07-05Merge tag 'dax-fix-5.2-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fix from Dan Williams: "A single dax fix that has been soaking awaiting other fixes under discussion to join it. As it is getting late in the cycle lets proceed with this fix and save follow-on changes for post-v5.3-rc1. - Fix xarray entry association for mixed mappings" * tag 'dax-fix-5.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix xarray entry association for mixed mappings
2019-07-05Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull do_move_mount() fix from Al Viro: "Regression fix" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: move_mount: reject moving kernel internal mounts
2019-07-05fs/userfaultfd.c: disable irqs for fault_pending and event locksEric Biggers
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock. This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by userfaultfd_ctx_read(), which in turn can be waiting for userfaultfd_ctx::fault_pending_wqh.lock or userfaultfd_ctx::event_wqh.lock. But elsewhere the fault_pending_wqh and event_wqh locks are taken with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by always disabling IRQs when taking the fault_pending_wqh and event_wqh locks. Commit ae62c16e105a ("userfaultfd: disable irqs when taking the waitqueue lock") didn't fix this because it only accounted for the fd_wqh lock, not the other locks nested inside it. Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-04gfs2: Remove unused gfs2_iomap_alloc argumentAndreas Gruenbacher
Remove the unused flags argument of gfs2_iomap_alloc. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-04xfs: allow single bulkstat of special inodesDarrick J. Wong
Create a new bulk ireq flag that enables userspace to ask us for a special inode number instead of interpreting @ino as a literal inode number. This enables us to query the root inode easily. The reason for adding the ability to query specifically the root directory inode is that certain programs (xfsdump and xfsrestore) want to confirm when they've been pointed to the root directory. The userspace code assumes the root directory is always the first result from calling bulkstat with lastino == 0, but this isn't true if the (initial btree roots + initial AGFL + inode alignment padding) is itself long enough to be allocated to new inodes if all of those blocks should happen to be free at the same time. Rather than make userspace guess at internal filesystem state, we provide a direct query. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-04xfs: specify AG in bulk reqDarrick J. Wong
Add a new xfs_bulk_ireq flag to constrain the iteration to a single AG. If the passed-in startino value is zero then we start with the first inode in the AG that the user passes in; otherwise, we iterate only within the same AG as the passed-in inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-04orangefs: fix build warning from debugfs cleanup patchGreg Kroah-Hartman
Stephen writes: After merging the driver-core tree, today's linux-next build (x86_64 allmodconfig) produced this warning: fs/orangefs/orangefs-debugfs.c: In function 'orangefs_debugfs_init': fs/orangefs/orangefs-debugfs.c:193:1: warning: label 'out' defined but not used [-Wunused-label] out: ^~~ fs/orangefs/orangefs-debugfs.c: In function 'orangefs_kernel_debug_init': fs/orangefs/orangefs-debugfs.c:204:17: warning: unused variable 'ret' [-Wunused-variable] struct dentry *ret; ^~~ Fix this up and change the return type of the function to void as it can not fail, which cleans up some more code and variables as well. Cc: Mike Marshall <hubcap@omnibond.com> Cc: Martin Brandenburg <martin@omnibond.com> Cc: devel@lists.orangefs.org Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: f095adba36bb ("orangefs: no need to check return value of debugfs_create functions") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-04ubifs: fix build warning after debugfs cleanup patchGreg Kroah-Hartman
Stephen writes: After merging the driver-core tree, today's linux-next build (arm multi_v7_defconfig) produced this warning: fs/ubifs/debug.c: In function 'dbg_debugfs_init_fs': fs/ubifs/debug.c:2812:6: warning: unused variable 'err' [-Wunused-variable] int err, n; ^~~ So fix this up properly. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Richard Weinberger <richard@nod.at> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: linux-mtd@lists.infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-03xfs: wire up the v5 inumbers ioctlDarrick J. Wong
Wire up the v5 INUMBERS ioctl. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03xfs: wire up new v5 bulkstat ioctlsDarrick J. Wong
Wire up the new v5 BULKSTAT ioctl. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03xfs: introduce v5 inode group structureDarrick J. Wong
Introduce a new "v5" inode group structure that fixes the alignment and padding problems of the existing structure. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03xfs: introduce new v5 bulkstat structureDarrick J. Wong
Introduce a new version of the in-core bulkstat structure that supports our new v5 format features. This structure also fills the gaps in the previous structure. We leave wiring up the ioctls for the next patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03xfs: rename bulkstat functionsDarrick J. Wong
Rename the bulkstat functions to 'fsbulkstat' so that they match the ioctl names. We will be introducing a new set of bulkstat/inumbers ioctls soon, and it will be important to keep the names straight. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03xfs: remove various bulk request typedef usageDarrick J. Wong
Remove xfs_bstat_t, xfs_fsop_bulkreq_t, xfs_inogrp_t, and similarly named compat typedefs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03nfsd: decode implementation idJ. Bruce Fields
Decode the implementation ID and display in nfsd/clients/#/info. It may be help identify the client. It won't be used otherwise. (When this went into the protocol, I thought the implementation ID would be a slippery slope towards implementation-specific workarounds as with the http user-agent. But I guess I was wrong, the risk seems pretty low now.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: create xdr_netobj_dup helperJ. Bruce Fields
Move some repeated code to a common helper. No change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: allow forced expiration of NFSv4 clientsJ. Bruce Fields
NFSv4 clients are automatically expired and all their locks removed if they don't contact the server for a certain amount of time (the lease period, 90 seconds by default). There can still be situations where that's not enough, so allow userspace to force expiry by writing "expire\n" to the new nfsd/client/#/ctl file. (The generic "ctl" name is because I expect we may want to allow other operations on clients in the future.) The write will not return until the client is expired and all of its locks and other state removed. The fault injection code also provides a way of expiring clients, but it fails if there are any in-progress RPC's referencing the client. Also, its method of selecting a client to expire is a little more primitive--it uses an IP address, which can't always uniquely specify an NFSv4 client. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: create get_nfsdfs_clp helperJ. Bruce Fields
Factor our some common code. No change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>