summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-08-01net: rose: fix netdev reference changesEric Dumazet
Bernard reported that trying to unload rose module would lead to infamous messages: unregistered_netdevice: waiting for rose0 to become free. Usage count = xx This patch solves the issue, by making sure each socket referring to a netdevice holds a reference count on it, and properly releases it in rose_release(). rose_dev_first() is also fixed to take a device reference before leaving the rcu_read_locked section. Following patch will add ref_tracker annotations to ease future bug hunting. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01Merge tag 'sched-core-2022-08-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Load-balancing improvements: - Improve NUMA balancing on AMD Zen systems for affine workloads. - Improve the handling of reduced-capacity CPUs in load-balancing. - Energy Model improvements: fix & refine all the energy fairness metrics (PELT), and remove the conservative threshold requiring 6% energy savings to migrate a task. Doing this improves power efficiency for most workloads, and also increases the reliability of energy-efficiency scheduling. - Optimize/tweak select_idle_cpu() to spend (much) less time searching for an idle CPU on overloaded systems. There's reports of several milliseconds spent there on large systems with large workloads ... [ Since the search logic changed, there might be behavioral side effects. ] - Improve NUMA imbalance behavior. On certain systems with spare capacity, initial placement of tasks is non-deterministic, and such an artificial placement imbalance can persist for a long time, hurting (and sometimes helping) performance. The fix is to make fork-time task placement consistent with runtime NUMA balancing placement. Note that some performance regressions were reported against this, caused by workloads that are not memory bandwith limited, which benefit from the artificial locality of the placement bug(s). Mel Gorman's conclusion, with which we concur, was that consistency is better than random workload benefits from non-deterministic bugs: "Given there is no crystal ball and it's a tradeoff, I think it's better to be consistent and use similar logic at both fork time and runtime even if it doesn't have universal benefit." - Improve core scheduling by fixing a bug in sched_core_update_cookie() that caused unnecessary forced idling. - Improve wakeup-balancing by allowing same-LLC wakeup of idle CPUs for newly woken tasks. - Fix a newidle balancing bug that introduced unnecessary wakeup latencies. ABI improvements/fixes: - Do not check capabilities and do not issue capability check denial messages when a scheduler syscall doesn't require privileges. (Such as increasing niceness.) - Add forced-idle accounting to cgroups too. - Fix/improve the RSEQ ABI to not just silently accept unknown flags. (No existing tooling is known to have learned to rely on the previous behavior.) - Depreciate the (unused) RSEQ_CS_FLAG_NO_RESTART_ON_* flags. Optimizations: - Optimize & simplify leaf_cfs_rq_list() - Micro-optimize set_nr_{and_not,if}_polling() via try_cmpxchg(). Misc fixes & cleanups: - Fix the RSEQ self-tests on RISC-V and Glibc 2.35 systems. - Fix a full-NOHZ bug that can in some cases result in the tick not being re-enabled when the last SCHED_RT task is gone from a runqueue but there's still SCHED_OTHER tasks around. - Various PREEMPT_RT related fixes. - Misc cleanups & smaller fixes" * tag 'sched-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) rseq: Kill process when unknown flags are encountered in ABI structures rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags sched/core: Fix the bug that task won't enqueue into core tree when update cookie nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt() sched/core: Always flush pending blk_plug sched/fair: fix case with reduced capacity CPU sched/core: Use try_cmpxchg in set_nr_{and_not,if}_polling sched/core: add forced idle accounting for cgroups sched/fair: Remove the energy margin in feec() sched/fair: Remove task_util from effective utilization in feec() sched/fair: Use the same cpumask per-PD throughout find_energy_efficient_cpu() sched/fair: Rename select_idle_mask to select_rq_mask sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util() sched/fair: Decay task PELT values during wakeup migration sched/fair: Provide u64 read for 32-bits arch helper sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg sched: only perform capability check on privileged operation sched: Remove unused function group_first_cpu() sched/fair: Remove redundant word " *" selftests/rseq: check if libc rseq support is registered ...
2022-08-01Merge tag 'slab-for-5.20_or_6.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - An addition of 'accounted' flag to slab allocation tracepoints to indicate memcg_kmem accounting, by Vasily - An optimization of memcg handling in freeing paths, by Muchun - Various smaller fixes and cleanups * tag 'slab-for-5.20_or_6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab_common: move generic bulk alloc/free functions to SLOB mm/sl[au]b: use own bulk free function when bulk alloc failed mm: slab: optimize memcg_slab_free_hook() mm/tracing: add 'accounted' entry into output of allocation tracepoints tools/vm/slabinfo: Handle files in debugfs mm/slub: Simplify __kmem_cache_alias() mm, slab: fix bad alignments
2022-08-01smack: Remove the redundant lsm_inode_allocXiu Jianfeng
It's not possible for inode->i_security to be NULL here because every inode will call inode_init_always and then lsm_inode_alloc to alloc memory for inode->security, this is what LSM infrastructure management do, so remove this redundant code. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2022-08-01smack: Replace kzalloc + strncpy with kstrndupGONG, Ruiqi
Simplify the code by using kstrndup instead of kzalloc and strncpy in smk_parse_smack(), which meanwhile remove strncpy as [1] suggests. [1]: https://github.com/KSPP/linux/issues/90 Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2022-08-01Merge branch '1GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2022-07-28 Jacob Keller says: Convert all of the Intel drivers with PTP support to the newer .adjfine implementation which uses scaled parts per million. This improves the precision of the frequency adjustments by taking advantage of the full scaled parts per million input coming from user space. In addition, all implementations are converted to using the mul_u64_u64_div_u64 function which better handles the intermediate value. This function supports architecture specific instructions where possible to avoid loss of precision if the normal 64-bit multiplication would overflow. Of note, the i40e implementation is now able to avoid loss of precision on slower link speeds by taking advantage of this to multiply by the link speed factor first. This results in a significantly more precise adjustment by allowing the calculation to impact the lower bits. This also gets us a step closer to being able to remove the .adjfreq entirely by removing its use from many drivers. I plan to follow this up with a series to update the drivers from other vendors and drop the .adjfreq implementation entirely. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igb: convert .adjfreq to .adjfine ixgbe: convert .adjfreq to .adjfine i40e: convert .adjfreq to .adjfine i40e: use mul_u64_u64_div_u64 for PTP frequency calculation e1000e: convert .adjfreq to .adjfine e1000e: remove unnecessary range check in e1000e_phc_adjfreq ice: implement adjfine with mul_u64_u64_div_u64 ==================== Link: https://lore.kernel.org/r/20220728181836.3387862-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01dt-bindings: net: fsl,fec: Add i.MX8ULP FEC itemsWei Fang
Add fsl,imx8ulp-fec for i.MX8ULP platform. Signed-off-by: Wei Fang <wei.fang@nxp.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/all/20220726143853.23709-2-wei.fang@nxp.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01affs: use memcpy_to_page and remove replace kmap_atomic()David Sterba
The use of kmap() is being deprecated in favor of kmap_local_page() where it is feasible. For kmap around a memcpy there's a convenience helper memcpy_to_page that also makes the flush_dcache_page() redundant. CC: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-01Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "Highlights include a major rework of our kPTI page-table rewriting code (which makes it both more maintainable and considerably faster in the cases where it is required) as well as significant changes to our early boot code to reduce the need for data cache maintenance and greatly simplify the KASLR relocation dance. Summary: - Remove unused generic cpuidle support (replaced by PSCI version) - Fix documentation describing the kernel virtual address space - Handling of some new CPU errata in Arm implementations - Rework of our exception table code in preparation for handling machine checks (i.e. RAS errors) more gracefully - Switch over to the generic implementation of ioremap() - Fix lockdep tracking in NMI context - Instrument our memory barrier macros for KCSAN - Rework of the kPTI G->nG page-table repainting so that the MMU remains enabled and the boot time is no longer slowed to a crawl for systems which require the late remapping - Enable support for direct swapping of 2MiB transparent huge-pages on systems without MTE - Fix handling of MTE tags with allocating new pages with HW KASAN - Expose the SMIDR register to userspace via sysfs - Continued rework of the stack unwinder, particularly improving the behaviour under KASAN - More repainting of our system register definitions to match the architectural terminology - Improvements to the layout of the vDSO objects - Support for allocating additional bits of HWCAP2 and exposing FEAT_EBF16 to userspace on CPUs that support it - Considerable rework and optimisation of our early boot code to reduce the need for cache maintenance and avoid jumping in and out of the kernel when handling relocation under KASLR - Support for disabling SVE and SME support on the kernel command-line - Support for the Hisilicon HNS3 PMU - Miscellanous cleanups, trivial updates and minor fixes" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (136 commits) arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr} arm64: fix KASAN_INLINE arm64/hwcap: Support FEAT_EBF16 arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long arm64/hwcap: Document allocation of upper bits of AT_HWCAP arm64: enable THP_SWAP for arm64 arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52 arm64: errata: Remove AES hwcap for COMPAT tasks arm64: numa: Don't check node against MAX_NUMNODES drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node() docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags" mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON mm: kasan: Skip unpoisoning of user pages mm: kasan: Ensure the tags are visible before the tag in page->flags drivers/perf: hisi: add driver for HNS3 PMU drivers/perf: hisi: Add description for HNS3 PMU driver drivers/perf: riscv_pmu_sbi: perf format perf/arm-cci: Use the bitmap API to allocate bitmaps ...
2022-08-01Merge tag 'm68k-for-v5.20-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - Use RNG seed from bootinfo block on virt platform - defconfig updates - Minor fixes and improvements * tag 'm68k-for-v5.20-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: defconfig: Update defconfigs for v5.19-rc1 m68k: Add common forward declaration for show_registers() m68k: mac: Remove forward declaration for mac_nmi_handler() m68k: virt: Fix missing platform_device_unregister() on error in virt_platform_init() m68k: virt: Use RNG seed from bootinfo block m68k: bitops: Change __fls to return and accept unsigned long m68k: Kconfig.machine: Add endif comment m68k: Kconfig.debug: Replace single quotes m68k: Kconfig.cpu: Fix indentation and add endif comments m68k: q40: Align '*' in comments m68k: sun3: Use __func__ to get function's name in an output message m68k: mac: Fix typos in comments m68k: virt: Kconfig minor fixes
2022-08-01Merge tag 'x86_kdump_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kdump updates from Borislav Petkov: - Add the ability to pass early an RNG seed to the kernel from the boot loader - Add the ability to pass the IMA measurement of kernel and bootloader to the kexec-ed kernel * tag 'x86_kdump_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/setup: Use rng seeds from setup_data x86/kexec: Carry forward IMA measurement log on kexec
2022-08-01Merge tag 'x86_build_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Borislav Petkov: - Fix stack protector builds when cross compiling with Clang - Other Kbuild improvements and fixes * tag 'x86_build_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/purgatory: Omit use of bin2c x86/purgatory: Hard-code obj-y in Makefile x86/build: Remove unused OBJECT_FILES_NON_STANDARD_test_nx.o x86/Kconfig: Fix CONFIG_CC_HAS_SANE_STACKPROTECTOR when cross compiling with clang
2022-08-01Merge tag 'x86_core_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Have invalid MSR accesses warnings appear only once after a pr_warn_once() change broke that - Simplify {JMP,CALL}_NOSPEC and let the objtool retpoline patching infra take care of them instead of having unreadable alternative macros there * tag 'x86_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/extable: Fix ex_handler_msr() print condition x86,nospec: Simplify {JMP,CALL}_NOSPEC
2022-08-01Merge tag 'x86_misc_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Borislav Petkov: - Add a bunch of PCI IDs for new AMD CPUs and use them in k10temp - Free the pmem platform device on the registration error path * tag 'x86_misc_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hwmon: (k10temp): Add support for new family 17h and 19h models x86/amd_nb: Add AMD PCI IDs for SMN communication x86/pmem: Fix platform-device leak in error path
2022-08-01Merge tag 'x86_cpu_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Remove the vendor check when selecting MWAIT as the default idle state - Respect idle=nomwait when supplied on the kernel cmdline - Two small cleanups * tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Use MSR_IA32_MISC_ENABLE constants x86: Fix comment for X86_FEATURE_ZEN x86: Remove vendor checks from prefer_mwait_c1_over_halt x86: Handle idle=nomwait cmdline properly for x86_idle
2022-08-01Merge tag 'x86_fpu_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu update from Borislav Petkov: - Add machinery to initialize AMX register state in order for AMX-capable CPUs to be able to enter deeper low-power state * tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_idle: Add a new flag to initialize the AMX state x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
2022-08-01Merge tag 'x86_mm_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Borislav Petkov: - Rename a PKRU macro to make more sense when reading the code - Update pkeys documentation - Avoid reading contended mm's TLB generation var if not absolutely necessary along with fixing a case where arch_tlbbatch_flush() doesn't adhere to the generation scheme and thus violates the conditions for the above avoidance. * tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/tlb: Ignore f->new_tlb_gen when zero x86/pkeys: Clarify PKRU_AD_KEY macro Documentation/protection-keys: Clean up documentation for User Space pkeys x86/mm/tlb: Avoid reading mm_tlb_gen when possible
2022-08-01Merge tag 'x86_cleanups_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanup from Borislav Petkov: - A single CONFIG_ symbol correction in a comment * tag 'x86_cleanups_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Refer to the intended config STRICT_DEVMEM in a comment
2022-08-01Merge tag 'x86_vmware_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vmware cleanup from Borislav Petkov: - A single statement simplification by using the BIT() macro * tag 'x86_vmware_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vmware: Use BIT() macro for shifting
2022-08-01Merge tag 'ras_core_for_v6.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS update from Borislav Petkov: "A single RAS change: - Probe whether hardware error injection (direct MSR writes) is possible when injecting errors on AMD platforms. In some cases, the platform could prohibit those" * tag 'ras_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Check whether writes to MCA_STATUS are getting ignored
2022-08-01Merge tag 'fs.idmapped.overlay.acl.v5.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull acl updates from Christian Brauner: "Last cycle we introduced support for mounting overlayfs on top of idmapped mounts. While looking into additional testing we realized that posix acls don't really work correctly with stacking filesystems on top of idmapped layers. We already knew what the fix were but it would require work that is more suitable for the merge window so we turned off posix acls for v5.19 for overlayfs on top of idmapped layers with Miklos routing my patch upstream in 72a8e05d4f66 ("Merge tag 'ovl-fixes-5.19-rc7' [..]"). This contains the work to support posix acls for overlayfs on top of idmapped layers. Since the posix acl fixes should use the new vfs{g,u}id_t work the associated branch has been merged in. (We sent a pull request for this earlier.) We've also pulled in Miklos pull request containing my patch to turn of posix acls on top of idmapped layers. This allowed us to avoid rebasing the branch which we didn't like because we were already at rc7 by then. Merging it in allows this branch to first fix posix acls and then to cleanly revert the temporary fix it brought in by commit 4a47c6385bb4 ("ovl: turn of SB_POSIXACL with idmapped layers temporarily"). The last patch in this series adds Seth Forshee as a co-maintainer for idmapped mounts. Seth has been integral to all of this work and is also the main architect behind the filesystem idmapping work which ultimately made filesystems such as FUSE and overlayfs available in containers. He continues to be active in both development and review. I'm very happy he decided to help and he has my full trust. This increases the bus factor which is always great for work like this. I'm honestly very excited about this because I think in general we don't do great in the bringing on new maintainers department" For more explanations of the ACL issues, see https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org/ * tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: Add Seth Forshee as co-maintainer for idmapped mounts Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily" ovl: handle idmappings in ovl_get_acl() acl: make posix_acl_clone() available to overlayfs acl: port to vfs{g,u}id_t acl: move idmapped mount fixup into vfs_{g,s}etxattr() mnt_idmapping: add vfs[g,u]id_into_k[g,u]id()
2022-08-01Merge tag 'fs.idmapped.vfsuid.v5.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull fs idmapping updates from Christian Brauner: "This introduces the new vfs{g,u}id_t types we agreed on. Similar to k{g,u}id_t the new types are just simple wrapper structs around regular {g,u}id_t types. They allow to establish a type safety boundary in the VFS for idmapped mounts preventing confusion betwen {g,u}ids mapped into an idmapped mount and {g,u}ids mapped into the caller's or the filesystem's idmapping. An initial set of helpers is introduced that allows to operate on vfs{g,u}id_t types. We will remove all references to non-type safe idmapped mounts helpers in the very near future. The patches do already exist. This converts the core attribute changing codepaths which become significantly easier to reason about because of this change. Just a few highlights here as the patches give detailed overviews of what is happening in the commit messages: - The kernel internal struct iattr contains type safe vfs{g,u}id_t values clearly communicating that these values have to take a given mount's idmapping into account. - The ownership values placed in struct iattr to change ownership are identical for idmapped and non-idmapped mounts going forward. This also allows to simplify stacking filesystems such as overlayfs that change attributes In other words, they always represent the values. - Instead of open coding checks for whether ownership changes have been requested and an actual update of the inode is required we now have small static inline wrappers that abstract this logic away removing a lot of code duplication from individual filesystems that all open-coded the same checks" * tag 'fs.idmapped.vfsuid.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: mnt_idmapping: align kernel doc and parameter order mnt_idmapping: use new helpers in mapped_fs{g,u}id() fs: port HAS_UNMAPPED_ID() to vfs{g,u}id_t mnt_idmapping: return false when comparing two invalid ids attr: fix kernel doc attr: port attribute changes to new types security: pass down mount idmapping to setattr hook quota: port quota helpers mount ids fs: port to iattr ownership update helpers fs: introduce tiny iattr ownership update helpers fs: use mount types in iattr fs: add two type safe mapping helpers mnt_idmapping: add vfs{g,u}id_t
2022-08-01Merge tag 'filelock-v6.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "Just a couple of flock() patches from Kuniyuki Iwashima. The main change is that this moves a file_lock allocation from the slab to the stack" * tag 'filelock-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs/lock: Rearrange ops in flock syscall. fs/lock: Don't allocate file_lock in flock_make_lock().
2022-08-01Merge tag 'erofs-for-5.20-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "First of all, we'd like to add Yue Hu and Jeffle Xu as two new reviewers. Thank them for spending time working on EROFS! There is no major feature outstanding in this cycle, mainly a patchset I worked on to prepare for rolling hash deduplication and folios for compressed data as the next big features. It kills the unneeded PG_error flag dependency as well. Apart from that, there are bugfixes and cleanups as always. Details are listed below: - Add Yue Hu and Jeffle Xu as reviewers - Add the missing wake_up when updating lzma streams - Avoid consecutive detection for Highmem memory - Prepare for multi-reference pclusters and get rid of PG_error - Fix ctx->pos update for NFS export - minor cleanups" * tag 'erofs-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (23 commits) erofs: update ctx->pos for every emitted dirent erofs: get rid of the leftover PAGE_SIZE in dir.c erofs: get rid of erofs_prepare_dio() helper erofs: introduce multi-reference pclusters (fully-referenced) erofs: record the longest decompressed size in this round erofs: introduce z_erofs_do_decompressed_bvec() erofs: try to leave (de)compressed_pages on stack if possible erofs: introduce struct z_erofs_decompress_backend erofs: get rid of `z_pagemap_global' erofs: clean up `enum z_erofs_collectmode' erofs: get rid of `enum z_erofs_page_type' erofs: rework online page handling erofs: switch compressed_pages[] to bufvec erofs: introduce `z_erofs_parse_in_bvecs' erofs: drop the old pagevec approach erofs: introduce bufvec to store decompressed buffers erofs: introduce `z_erofs_parse_out_bvecs()' erofs: clean up z_erofs_collector_begin() erofs: get rid of unneeded `inode', `map' and `sb' erofs: avoid consecutive detection for Highmem memory ...
2022-08-01Merge tag 'fsnotify_for_v5.20-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: - support for FAN_MARK_IGNORE which untangles some of the not well defined corner cases with fanotify ignore masks - small cleanups * tag 'fsnotify_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: Fix comment typo fanotify: introduce FAN_MARK_IGNORE fanotify: cleanups for fanotify_mark() input validations fanotify: prepare for setting event flags in ignore mask fs: inotify: Fix typo in inotify comment
2022-08-01Merge tag 'fs_for_v5.20-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2 and reiserfs updates from Jan Kara: "A fix for ext2 handling of a corrupted fs image and cleanups in ext2 and reiserfs" * tag 'fs_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Add more validity checks for inode counts fs/reiserfs/inode: remove dead code in _get_block_create_0() fs/ext2: replace ternary operator with min_t()
2022-08-01Merge tag 'dlm-6.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: - Delay the cleanup of interrupted posix lock requests until the user space result arrives. Previously, the immediate cleanup would lead to extraneous warnings when the result arrived. - Tracepoint improvements, e.g. adding the lock resource name. - Delay the completion of lockspace creation until one full recovery cycle has completed. This allows more error cases to be returned to the caller. - Remove warnings from the locking layer about delayed network replies. The recently added midcomms warnings are much more useful. - Begin the process of deprecating two unused lock-timeout-related features. These features now require enabling via a Kconfig option, and enabling them triggers deprecation warnings. We expect to remove the code in v6.2. * tag 'dlm-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: fs: dlm: move kref_put assert for lkb structs fs: dlm: don't use deprecated timeout features by default fs: dlm: add deprecation Kconfig and warnings for timeouts fs: dlm: remove timeout from dlm_user_adopt_orphan fs: dlm: remove waiter warnings fs: dlm: fix grammar in lowcomms output fs: dlm: add comment about lkb IFL flags fs: dlm: handle recovery result outside of ls_recover fs: dlm: make new_lockspace() wait until recovery completes fs: dlm: call dlm_lsop_recover_prep once fs: dlm: update comments about recovery and membership handling fs: dlm: add resource name to tracepoints fs: dlm: remove additional dereference of lksb fs: dlm: change ast and bast trace order fs: dlm: change posix lock sigint handling fs: dlm: use dlm_plock_info for do_unlock_close fs: dlm: change plock interrupted message to debug again fs: dlm: add pid to debug log fs: dlm: plock use list_first_entry
2022-08-01fs: dlm: move kref_put assert for lkb structsAlexander Aring
The unhold_lkb() function decrements the lock's kref, and asserts that the ref count was not the final one. Use the kref_put release function (which should not be called) to call the assert, rather than doing the assert based on the kref_put return value. Using kill_lkb() as the release function doesn't make sense if we only want to assert. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01fs: dlm: don't use deprecated timeout features by defaultAlexander Aring
This patch will disable use of deprecated timeout features if CONFIG_DLM_DEPRECATED_API is not set. The deprecated features will be removed in upcoming kernel release v6.2. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01fs: dlm: add deprecation Kconfig and warnings for timeoutsAlexander Aring
This patch adds a CONFIG_DLM_DEPRECATED_API Kconfig option that must be enabled to use two timeout-related features that we intend to remove in kernel v6.2. Warnings are printed if either is enabled and used. Neither has ever been used as far as we know. . The DLM_LSFL_TIMEWARN lockspace creation flag will be removed, along with the associated configfs entry for setting the timeout. Setting the flag and configfs file would cause dlm to track how long locks were waiting for reply messages. After a timeout, a kernel message would be logged, and a netlink message would be sent to userspace. Recently, midcomms messages have been added that produce much better logging about actual problems with messages. No use has ever been found for the netlink messages. . The userspace libdlm API has allowed the DLM_LKF_TIMEOUT flag with a timeout value to be set in lock requests. The lock request would be cancelled after the timeout. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01rseq: Kill process when unknown flags are encountered in ABI structuresMathieu Desnoyers
rseq_abi()->flags and rseq_abi()->rseq_cs->flags 29 upper bits are currently unused. The current behavior when those bits are set is to ignore them. This is not an ideal behavior, because when future features will start using those flags, if user-space fails to correctly validate that the kernel indeed supports those flags (e.g. with a new sys_rseq flags bit) before using them, it may incorrectly assume that the kernel will handle those flags way when in fact those will be silently ignored on older kernels. Validating that unused flags bits are cleared will allow a smoother transition when those flags will start to be used by allowing applications to fail early, and obviously, when they attempt to use the new flags on an older kernel that does not support them. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20220622194617.1155957-2-mathieu.desnoyers@efficios.com
2022-08-01rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flagsMathieu Desnoyers
The pretty much unused RSEQ_CS_FLAG_NO_RESTART_ON_* flags introduce complexity in rseq, and are subtly buggy [1]. Solving those issues requires introducing additional complexity in the rseq implementation for each supported architecture. Considering that it complexifies the rseq ABI, I am proposing that we deprecate those flags. [2] So far there appears to be consensus from maintainers of user-space projects impacted by this feature that its removal would be a welcome simplification. [3] The deprecation approach proposed here is to issue WARN_ON_ONCE() when encountering those flags and kill the offending process with sigsegv. This should allow us to quickly identify whether anyone yells at us for removing this. Link: https://lore.kernel.org/lkml/20220618182515.95831-1-minhquangbui99@gmail.com/ [1] Link: https://lore.kernel.org/lkml/258546133.12151.1655739550814.JavaMail.zimbra@efficios.com/ [2] Link: https://lore.kernel.org/lkml/87pmj1enjh.fsf@email.froward.int.ebiederm.org/ [3] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/lkml/20220622194617.1155957-1-mathieu.desnoyers@efficios.com
2022-08-01selftests: kvm: set rax before vmcallAndrei Vagin
kvm_hypercall has to place the hypercall number in rax. Trace events show that kvm_pv_test doesn't work properly: kvm_pv_test-53132: kvm_hypercall: nr 0x0 a0 0x0 a1 0x0 a2 0x0 a3 0x0 kvm_pv_test-53132: kvm_hypercall: nr 0x0 a0 0x0 a1 0x0 a2 0x0 a3 0x0 kvm_pv_test-53132: kvm_hypercall: nr 0x0 a0 0x0 a1 0x0 a2 0x0 a3 0x0 With this change, it starts working as expected: kvm_pv_test-54285: kvm_hypercall: nr 0x5 a0 0x0 a1 0x0 a2 0x0 a3 0x0 kvm_pv_test-54285: kvm_hypercall: nr 0xa a0 0x0 a1 0x0 a2 0x0 a3 0x0 kvm_pv_test-54285: kvm_hypercall: nr 0xb a0 0x0 a1 0x0 a2 0x0 a3 0x0 Signed-off-by: Andrei Vagin <avagin@google.com> Message-Id: <20220722230241.1944655-5-avagin@google.com> Fixes: ac4a4d6de22e ("selftests: kvm: test enforcement of paravirtual cpuid features") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-01selftests: KVM: Add exponent check for boolean statsOliver Upton
The only sensible exponent for a boolean stat is 0. Add a test assertion requiring all boolean statistics to have an exponent of 0. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Message-Id: <20220719143134.3246798-4-oliver.upton@linux.dev> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-01selftests: KVM: Provide descriptive assertions in kvm_binary_stats_testOliver Upton
As it turns out, tests sometimes fail. When that is the case, packing the test assertion with as much relevant information helps track down the problem more quickly. Sharpen up the stat descriptor assertions in kvm_binary_stats_test to more precisely describe the reason for the test assertion and which stat is to blame. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Message-Id: <20220719143134.3246798-3-oliver.upton@linux.dev> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-01selftests: KVM: Check stat name before other fieldsOliver Upton
In order to provide more useful test assertions that describe the broken stats descriptor, perform sanity check on the stat name before any other descriptor field. While at it, avoid dereferencing the name field if the sanity check fails as it is more likely to contain garbage. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Message-Id: <20220719143134.3246798-2-oliver.upton@linux.dev> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-01Merge branch 'funeth-tx-xdp-frags'David S. Miller
Dimitris Michailidis says: ==================== net/funeth: Tx support for XDP with frags Support XDP with fragments for XDP_TX and ndo_xdp_xmit. The first three patches rework existing code used by the skb path to make it suitable also for XDP. With these all the callees of the main Tx XDP function, fun_xdp_tx(), are fragment-capable. The last patch updates fun_xdp_tx() to handle fragments. ==================== Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01net/funeth: Tx handling of XDP with fragments.Dimitris Michailidis
By now all the functions fun_xdp_tx() calls are shared with the skb path and thus are fragment-capable. Update fun_xdp_tx(), that up to now has been passing just one buffer, to check for fragments and call accordingly. This makes XDP_TX and ndo_xdp_xmit fragment-capable. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01net/funeth: Unify skb/XDP packet mapping.Dimitris Michailidis
Instead of passing an skb to the mapping function pass an skb_shared_info plus an additional address/length pair. This makes it usable for both skbs and XDP. Call it from the XDP path and adjust the skb path. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01net/funeth: Unify skb/XDP gather list writing.Dimitris Michailidis
Extract the Tx gather list writing code that skbs use into a utility function and use it also for XDP. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01net/funeth: Unify skb/XDP Tx packet unmapping.Dimitris Michailidis
Current XDP unmapping is a subset of its skb analog, dealing with only one buffer. In preparation for multi-frag XDP rename the skb function and use it also for XDP. The XDP version is removed. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01KVM: x86/mmu: remove unused variablePaolo Bonzini
The last use of 'pfn' went away with the same-named argument to host_pfn_mapping_level; now that the hugepage level is obtained exclusively from the host page tables, kvm_mmu_zap_collapsible_spte does not need to know host pfns at all. Fixes: a8ac499bb6ab ("KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-01Merge branch 'devlink-parallel-commands'David S. Miller
Jiri Pirko says: ==================== net: devlink: allow parallel commands on multiple devlinks Aim of this patchset is to remove devlink_mutex and eventually to enable parallel ops on devlink netlink interface. ==================== Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01net: devlink: enable parallel ops on netlink interfaceJiri Pirko
As the devlink_mutex was removed and all devlink instances are protected individually by devlink->lock mutex, allow the netlink ops to run in parallel and therefore allow user to execute commands on multiple devlink instances simultaneously. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01net: devlink: remove devlink_mutexJiri Pirko
All accesses to devlink structure from userspace and drivers are locked with devlink->lock instance mutex. Also, devlinks xa_array iteration is taken care of by iteration helpers taking devlink reference. Therefore, remove devlink_mutex as it is no longer needed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01net: devlink: convert reload command to take implicit devlink->lockJiri Pirko
Convert reload command to behave the same way as the rest of the commands and let if be called with devlink->lock held. Remove the temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK flag is no longer used, remove it alongside. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01net: devlink: introduce "unregistering" mark and use it during devlinks ↵Jiri Pirko
iteration Add new mark called "unregistering" to be set at the beginning of devlink_unregister() function. Check this mark during devlinks iteration in order to prevent getting a reference of devlink which is being currently unregistered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01udp: Remove redundant __udp_sysctl_init() call from udp_init().Kuniyuki Iwashima
__udp_sysctl_init() is called for init_net via udp_sysctl_ops. While at it, we can rename __udp_sysctl_init() to udp_sysctl_init(). Fixes: 1e8029515816 ("udp: Move the udp sysctl to namespace.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-07-29 This series contains updates to iavf driver only. Przemyslaw prevents setting of TC max rate below minimum supported values and reports updated queue values when setting up TCs. --- v2: Dropped patch 3 (hw-tc-offload check) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01net/rds: Use PTR_ERR instead of IS_ERR for rdsdebug()Li Qiong
If 'local_odp_mr->r_trans_private' is a error code, it is better to print the error code than to print the value of IS_ERR(). Signed-off-by: Li Qiong <liqiong@nfschina.com> Signed-off-by: David S. Miller <davem@davemloft.net>