summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2022-05-23Merge tag 'rcu.2022.05.19a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU update from Paul McKenney: - Documentation updates - Miscellaneous fixes - Callback-offloading updates, mainly simplifications - RCU-tasks updates, including some -rt fixups, handling of systems with sparse CPU numbering, and a fix for a boot-time race-condition failure - Put SRCU on a memory diet in order to reduce the size of the srcu_struct structure - Torture-test updates fixing some bugs in tests and closing some testing holes - Torture-test updates for the RCU tasks flavors, most notably ensuring that building rcutorture and friends does not change the RCU-tasks-related Kconfig options - Torture-test scripting updates - Expedited grace-period updates, most notably providing milliseconds-scale (not all that) soft real-time response from synchronize_rcu_expedited(). This is also the first time in almost 30 years of RCU that someone other than me has pushed for a reduction in the RCU CPU stall-warning timeout, in this case by more than three orders of magnitude from 21 seconds to 20 milliseconds. This tighter timeout applies only to expedited grace periods * tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu: Move expedited grace period (GP) work to RT kthread_worker rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT srcu: Drop needless initialization of sdp in srcu_gp_start() srcu: Prevent expedited GPs and blocking readers from consuming CPU srcu: Add contention check to call_srcu() srcu_data ->lock acquisition srcu: Automatically determine size-transition strategy at boot rcutorture: Make torture.sh allow for --kasan rcutorture: Make torture.sh refscale and rcuscale specify Tasks Trace RCU rcutorture: Make kvm.sh allow more memory for --kasan runs torture: Save "make allmodconfig" .config file scftorture: Remove extraneous "scf" from per_version_boot_params rcutorture: Adjust scenarios' Kconfig options for CONFIG_PREEMPT_DYNAMIC torture: Enable CSD-lock stall reports for scftorture torture: Skip vmlinux check for kvm-again.sh runs scftorture: Adjust for TASKS_RCU Kconfig option being selected rcuscale: Allow rcuscale without RCU Tasks Rude/Trace rcuscale: Allow rcuscale without RCU Tasks refscale: Allow refscale without RCU Tasks Rude/Trace refscale: Allow refscale without RCU Tasks rcutorture: Allow specifying per-scenario stat_interval ...
2022-05-23Merge back earlier thermal control updates for 5.19-rc1.Rafael J. Wysocki
2022-05-23Merge tag 'efi-next-for-v5.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Allow runtime services to be re-enabled at boot on RT kernels. - Provide access to secrets injected into the boot image by CoCo hypervisors (COnfidential COmputing) - Use DXE services on x86 to make the boot image executable after relocation, if needed. - Prefer mirrored memory for randomized allocations. - Only randomize the placement of the kernel image on arm64 if the loader has not already done so. - Add support for obtaining the boot hartid from EFI on RISC-V. * tag 'efi-next-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: riscv/efi_stub: Add support for RISCV_EFI_BOOT_PROTOCOL efi: stub: prefer mirrored memory for randomized allocations efi/arm64: libstub: run image in place if randomized by the loader efi: libstub: pass image handle to handle_kernel_image() efi: x86: Set the NX-compatibility flag in the PE header efi: libstub: ensure allocated memory to be executable efi: libstub: declare DXE services table efi: Add missing prototype for efi_capsule_setup_info docs: security: Add secrets/coco documentation efi: Register efi_secret platform device if EFI secret area is declared virt: Add efi_secret module to expose confidential computing secrets efi: Save location of EFI confidential computing area efi: Allow to enable EFI runtime services by default on RT
2022-05-23Merge branch 'pm-domains'Rafael J. Wysocki
Merge generlic power domains update for 5.19-rc1: - Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski). - Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf Hansson). - Improve the way genpd deals with its governors (Ulf Hansson). * pm-domains: PM: domains: Trust domain-idle-states from DT to be correct by genpd PM: domains: Measure power-on/off latencies in genpd based on a governor PM: domains: Allocate governor data dynamically based on a genpd governor PM: domains: Clean up some code in pm_genpd_init() and genpd_remove() PM: domains: Fix initialization of genpd's next_wakeup PM: domains: Fixup QoS latency measurements for IRQ safe devices in genpd PM: domains: Measure suspend/resume latencies in genpd based on governor PM: domains: Move the next_wakeup variable into the struct gpd_timing_data PM: domains: Allocate gpd_timing_data dynamically based on governor PM: domains: Skip another warning in irq_safe_dev_in_sleep_domain() PM: domains: Rename irq_safe_dev_in_no_sleep_domain() in genpd PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd PM: domains: Drop redundant code for genpd always-on governor PM: domains: Add GENPD_FLAG_RPM_ALWAYS_ON for the always-on governor PM: domains: Move genpd's time-accounting to ktime_get_mono_fast_ns() PM: domains: Extend dev_pm_domain_detach() doc
2022-05-23Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge cpufreq updates for 5.19-rc1: - Fix cpufreq governor clean up code to avoid using kfree() directly to free kobject-based items (Kevin Hao). - Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe Leroy). - Make intel_pstate notify frequency invariance code when no_turbo is turned on and off (Chen Yu). - Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas Pandruvada). - Make cpufreq avoid unnecessary frequency updates due to mismatch between hardware and the frequency table (Viresh Kumar). - Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify code (Viresh Kumar). - Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the calling convention for some driver callbacks consistent (Rafael Wysocki). - Avoid accessing half-initialized cpufreq policies from the show() and store() sysfs functions (Schspa Shi). - Rearrange cpufreq_offline() to make the calling convention for some driver callbacks consistent (Schspa Shi). - Update CPPC handling in cpufreq (Pierre Gondois): * Add per_cpu efficiency_class to the CPPC driver. * Make the CPPC driver Register EM based on efficiency class information. * Adjust _OSC for flexible address space in the ACPI platform initialization code and always set CPPC _OSC bits if CPPC_LIB is supported. * Assume no transition latency if no PCCT in the CPPC driver. * Add fast_switch and dvfs_possible_from_any_cpu support to the CPPC driver. * pm-cpufreq: cpufreq: CPPC: Enable dvfs_possible_from_any_cpu cpufreq: CPPC: Enable fast_switch ACPI: CPPC: Assume no transition latency if no PCCT ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported ACPI: CPPC: Check _OSC for flexible address space cpufreq: make interface functions and lock holding state clear cpufreq: Abort show()/store() for half-initialized policies cpufreq: Rearrange locking in cpufreq_remove_dev() cpufreq: Split cpufreq_offline() cpufreq: Reorganize checks in cpufreq_offline() cpufreq: Clear real_cpus mask from remove_cpu_dev_symlink() cpufreq: intel_pstate: Support Sapphire Rapids OOB mode Revert "cpufreq: Fix possible race in cpufreq online error path" cpufreq: CPPC: Register EM based on efficiency class information cpufreq: CPPC: Add per_cpu efficiency_class cpufreq: Avoid unnecessary frequency updates due to mismatch cpufreq: Fix possible race in cpufreq online error path cpufreq: intel_pstate: Handle no_turbo in frequency invariance cpufreq: Prepare cleanup of powerpc's asm/prom.h cpufreq: governor: Use kobject release() method to free dbs_data
2022-05-23Merge branches 'pm-em' and 'pm-cpuidle'Rafael J. Wysocki
Marge Energy Model support updates and cpuidle updates for 5.19-rc1: - Update the Energy Model support code to allow the Energy Model to be artificial, which means that the power values may not be on a uniform scale with other devices providing power information, and update the cpufreq_cooling and devfreq_cooling thermal drivers to support artificial Energy Models (Lukasz Luba). - Make DTPM check the Energy Model type (Lukasz Luba). - Fix policy counter decrementation in cpufreq if Energy Model is in use (Pierre Gondois). - Add AlderLake processor support to the intel_idle driver (Zhang Rui). - Fix regression leading to no genpd governor in the PSCI cpuidle driver and fix the riscv-sbi cpuidle driver to allow a genpd governor to be used (Ulf Hansson). * pm-em: PM: EM: Decrement policy counter powercap: DTPM: Check for Energy Model type thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling Documentation: EM: Add artificial EM registration description PM: EM: Remove old debugfs files and print all 'flags' PM: EM: Change the order of arguments in the .active_power() callback PM: EM: Use the new .get_cost() callback while registering EM PM: EM: Add artificial EM flag PM: EM: Add .get_cost() callback * pm-cpuidle: cpuidle: riscv-sbi: Fix code to allow a genpd governor to be used cpuidle: psci: Fix regression leading to no genpd governor intel_idle: Add AlderLake support
2022-05-23Merge branches 'pm-core', 'pm-sleep' and 'powercap'Rafael J. Wysocki
Merge PM core changes, updates related to system sleep and power capping updates for 5.19-rc1: - Export dev_pm_ops instead of suspend() and resume() in the IIO chemical scd30 driver (Jonathan Cameron). - Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and PM-runtime counterparts (Jonathan Cameron). - Move symbol exports in the IIO chemical scd30 driver into the IIO_SCD30 namespace (Jonathan Cameron). - Avoid device PM-runtime usage count underflows (Rafael Wysocki). - Allow dynamic debug to control printing of PM messages (David Cohen). - Fix some kernel-doc comments in hibernation code (Yang Li, Haowen Bai). - Preserve ACPI-table override during hibernation (Amadeusz Sławiński). - Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson). - Make Intel RAPL power capping driver support the RaptorLake and AlderLake N processors (Zhang Rui, Sumeet Pawnikar). - Remove redundant store to value after multiply in the RAPL power capping driver (Colin Ian King). * pm-core: PM: runtime: Avoid device usage count underflows iio: chemical: scd30: Move symbol exports into IIO_SCD30 namespace PM: core: Add NS varients of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and runtime pm equiv iio: chemical: scd30: Export dev_pm_ops instead of suspend() and resume() * pm-sleep: cpuidle: PSCI: Improve support for suspend-to-RAM for PSCI OSI mode PM: runtime: Allow to call __pm_runtime_set_status() from atomic context PM: hibernate: Don't mark comment as kernel-doc x86/ACPI: Preserve ACPI-table override during hibernation PM: hibernate: Fix some kernel-doc comments PM: sleep: enable dynamic debug support within pm_pr_dbg() PM: sleep: Narrow down -DDEBUG on kernel/power/ files * powercap: powercap: intel_rapl: remove redundant store to value after multiply powercap: intel_rapl: add support for ALDERLAKE_N powercap: RAPL: Add Power Limit4 support for RaptorLake powercap: intel_rapl: add support for RaptorLake
2022-05-23NFSD: Instantiate a struct file when creating a regular NFSv4 fileChuck Lever
There have been reports of races that cause NFSv4 OPEN(CREATE) to return an error even though the requested file was created. NFSv4 does not provide a status code for this case. To mitigate some of these problems, reorganize the NFSv4 OPEN(CREATE) logic to allocate resources before the file is actually created, and open the new file while the parent directory is still locked. Two new APIs are added: + Add an API that works like nfsd_file_acquire() but does not open the underlying file. The OPEN(CREATE) path can use this API when it already has an open file. + Add an API that is kin to dentry_open(). NFSD needs to create a file and grab an open "struct file *" atomically. The alloc_empty_file() has to be done before the inode create. If it fails (for example, because the NFS server has exceeded its max_files limit), we avoid creating the file and can still return an error to the NFS client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: JianHong Yin <jiyin@redhat.com>
2022-05-23Merge tag 'asoc-v5.19' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v5.19 This is quite a big update, partly due to the addition of some larger drivers (more of which is to follow since at least the AVS driver is still a work in progress) and partly due to Charles' work sorting out our handling of endianness. As has been the case recently it's much more about drivers than the core. - Overhaul of endianness specification for data formats, avoiding needless restrictions due to CODECs. - Initial stages of Intel AVS driver merge. - Introduction of v4 IPC mechanism for SOF. - TDM mode support for AK4613. - Support for Analog Devices ADAU1361, Cirrus Logic CS35L45, Maxim MAX98396, MediaTek MT8186, NXP i.MX8 micfil and SAI interfaces, nVidia Tegra186 ASRC, and Texas Instruments TAS2764 and TAS2780
2022-05-23LSM: Remove double path_rename hook calls for RENAME_EXCHANGEMickaël Salaün
In order to be able to identify a file exchange with renameat2(2) and RENAME_EXCHANGE, which will be useful for Landlock [1], propagate the rename flags to LSMs. This may also improve performance because of the switch from two set of LSM hook calls to only one, and because LSMs using this hook may optimize the double check (e.g. only one lock, reduce the number of path walks). AppArmor, Landlock and Tomoyo are updated to leverage this change. This should not change the current behavior (same check order), except (different level of) speed boosts. [1] https://lore.kernel.org/r/20220221212522.320243-1-mic@digikod.net Cc: James Morris <jmorris@namei.org> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Serge E. Hallyn <serge@hallyn.com> Acked-by: John Johansen <john.johansen@canonical.com> Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20220506161102.525323-7-mic@digikod.net
2022-05-23Merge branches 'slab/for-5.19/stackdepot' and 'slab/for-5.19/refactor' into ↵Vlastimil Babka
slab/for-linus
2022-05-23block: add sync_blockdev_range()Yuezhang Mo
sync_blockdev_range() is to support syncing multiple sectors with as few block device requests as possible, it is helpful to make the block device to give full play to its performance. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-05-22net: wrap the wireless pointers in struct net_device in an ifdefJakub Kicinski
Most protocol-specific pointers in struct net_device are under a respective ifdef. Wireless is the notable exception. Since there's a sizable number of custom-built kernels for datacenter workloads which don't build wireless it seems reasonable to ifdefy those pointers as well. While at it move IPv4 and IPv6 pointers up, those are special for obvious reasons. Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> # ieee802154 Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Fix locking issueDavid Howells
There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses read_lock() to access it. However, the timer callback function may trigger a removal by trying to queue a call for processing and finding that it's already queued - at which point it has a spare refcount that it has to do something with. Unfortunately, if it puts the call and this reduces the refcount to 0, the call will be removed from the list. Unfortunately, since the _bh variants of the locking functions aren't used, this can deadlock. ================================ WARNING: inconsistent lock state 5.18.0-rc3-build4+ #10 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b {SOFTIRQ-ON-W} state was registered at: ... Possible unsafe locking scenario: CPU0 ---- lock(&rxnet->call_lock); <Interrupt> lock(&rxnet->call_lock); *** DEADLOCK *** 1 lock held by ksoftirqd/2/25: #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d Changes ======= ver #2) - Changed to using list_next_rcu() rather than rcu_dereference() directly. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: qed: fix typos in commentsJulia Lawall
Spelling mistakes (triple letters) in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22hwmon: Introduce hwmon_device_register_for_thermalGuenter Roeck
The thermal subsystem registers a hwmon driver without providing chip or sysfs group information. This is for legacy reasons and would be difficult to change. At the same time, we want to enforce that chip information is provided when registering a hwmon device using hwmon_device_register_with_info(). To enable this, introduce a special API for use only by the thermal subsystem. Acked-by: Rafael J . Wysocki <rafael@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-05-22lib: add generic polynomial calculationMichael Walle
Some temperature and voltage sensors use a polynomial to convert between raw data points and actual temperature or voltage. The polynomial is usually the result of a curve fitting of the diode characteristic. The BT1 PVT hwmon driver already uses such a polynonmial calculation which is rather generic. Move it to lib/ so other drivers can reuse it. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20220401214032.3738095-2-michael@walle.cc Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-05-22powerpc/powermac: constify device_node in of_irq_parse_oldworld()Krzysztof Kozlowski
The of_irq_parse_oldworld() does not modify passed device_node so make it a pointer to const for safety. Drop the extern while modifying the line. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210924105653.46963-2-krzysztof.kozlowski@canonical.com
2022-05-21x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale DataPawan Gupta
Add the sysfs reporting file for Processor MMIO Stale Data vulnerability. It exposes the vulnerability and mitigation state similar to the existing files for the other hardware vulnerabilities. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-05-20selftests/bpf: Add missing trampoline program type to trampoline_count testYuntao Wang
Currently the trampoline_count test doesn't include any fmod_ret bpf programs, fix it to make the test cover all possible trampoline program types. Since fmod_ret bpf programs can't be attached to __set_task_comm function, as it's neither whitelisted for error injection nor a security hook, change it to bpf_modify_return_test. This patch also does some other cleanups such as removing duplicate code, dropping inconsistent comments, etc. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220519150610.601313-1-ytcoode@gmail.com
2022-05-20bpf: Add bpf_skc_to_mptcp_sock_protoGeliang Tang
This patch implements a new struct bpf_func_proto, named bpf_skc_to_mptcp_sock_proto. Define a new bpf_id BTF_SOCK_TYPE_MPTCP, and a new helper bpf_skc_to_mptcp_sock(), which invokes another new helper bpf_mptcp_sock_from_subflow() in net/mptcp/bpf.c to get struct mptcp_sock from a given subflow socket. v2: Emit BTF type, add func_id checks in verifier.c and bpf_trace.c, remove build check for CONFIG_BPF_JIT v5: Drop EXPORT_SYMBOL (Martin) Co-developed-by: Nicolas Rybowski <nicolas.rybowski@tessares.net> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220519233016.105670-2-mathew.j.martineau@linux.intel.com
2022-05-20Merge tag 'ceph-for-5.18-rc8' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fix from Ilya Dryomov: "A fix for a nasty use-after-free, marked for stable" * tag 'ceph-for-5.18-rc8' of https://github.com/ceph/ceph-client: libceph: fix misleading ceph_osdc_cancel_request() comment libceph: fix potential use-after-free on linger ping and resends
2022-05-20Merge branches 'for-next/sme', 'for-next/stacktrace', ↵Catalin Marinas
'for-next/fault-in-subpage', 'for-next/misc', 'for-next/ftrace' and 'for-next/crashkernel', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf/arm-cmn: Decode CAL devices properly in debugfs perf/arm-cmn: Fix filter_sel lookup perf/marvell_cn10k: Fix tad_pmu_event_init() to check pmu type first drivers/perf: hisi: Add Support for CPA PMU drivers/perf: hisi: Associate PMUs in SICL with CPUs online drivers/perf: arm_spe: Expose saturating counter to 16-bit perf/arm-cmn: Add CMN-700 support perf/arm-cmn: Refactor occupancy filter selector perf/arm-cmn: Add CMN-650 support dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700 perf: check return value of armpmu_request_irq() perf: RISC-V: Remove non-kernel-doc ** comments * for-next/sme: (30 commits) : Scalable Matrix Extensions support. arm64/sve: Move sve_free() into SVE code section arm64/sve: Make kernel FPU protection RT friendly arm64/sve: Delay freeing memory in fpsimd_flush_thread() arm64/sme: More sensibly define the size for the ZA register set arm64/sme: Fix NULL check after kzalloc arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding() arm64/sme: Provide Kconfig for SME KVM: arm64: Handle SME host state when running guests KVM: arm64: Trap SME usage in guest KVM: arm64: Hide SME system registers from guests arm64/sme: Save and restore streaming mode over EFI runtime calls arm64/sme: Disable streaming mode and ZA when flushing CPU state arm64/sme: Add ptrace support for ZA arm64/sme: Implement ptrace support for streaming mode SVE registers arm64/sme: Implement ZA signal handling arm64/sme: Implement streaming SVE signal handling arm64/sme: Disable ZA and streaming mode when handling signals arm64/sme: Implement traps and syscall handling for SME arm64/sme: Implement ZA context switching arm64/sme: Implement streaming SVE context switching ... * for-next/stacktrace: : Stacktrace cleanups. arm64: stacktrace: align with common naming arm64: stacktrace: rename stackframe to unwind_state arm64: stacktrace: rename unwinder functions arm64: stacktrace: make struct stackframe private to stacktrace.c arm64: stacktrace: delete PCS comment arm64: stacktrace: remove NULL task check from unwind_frame() * for-next/fault-in-subpage: : btrfs search_ioctl() live-lock fix using fault_in_subpage_writeable(). btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults arm64: Add support for user sub-page fault probing mm: Add fault_in_subpage_writeable() to probe at sub-page granularity * for-next/misc: : Miscellaneous patches. arm64: Kconfig.platforms: Add comments arm64: Kconfig: Fix indentation and add comments arm64: mm: avoid writable executable mappings in kexec/hibernate code arm64: lds: move special code sections out of kernel exec segment arm64/hugetlb: Implement arm64 specific huge_ptep_get() arm64/hugetlb: Use ptep_get() to get the pte value of a huge page arm64: mm: Make arch_faults_on_old_pte() check for migratability arm64: mte: Clean up user tag accessors arm64/hugetlb: Drop TLB flush from get_clear_flush() arm64: Declare non global symbols as static arm64: mm: Cleanup useless parameters in zone_sizes_init() arm64: fix types in copy_highpage() arm64: Set ARCH_NR_GPIO to 2048 for ARCH_APPLE arm64: cputype: Avoid overflow using MIDR_IMPLEMENTOR_MASK arm64: document the boot requirements for MTE arm64/mm: Compute PTRS_PER_[PMD|PUD] independently of PTRS_PER_PTE * for-next/ftrace: : ftrace cleanups. arm64/ftrace: Make function graph use ftrace directly ftrace: cleanup ftrace_graph_caller enable and disable * for-next/crashkernel: : Support for crashkernel reservations above ZONE_DMA. arm64: kdump: Do not allocate crash low memory if not needed docs: kdump: Update the crashkernel description for arm64 of: Support more than one crash kernel regions for kexec -s of: fdt: Add memory for devices by DT property "linux,usable-memory-range" arm64: kdump: Reimplement crashkernel=X arm64: Use insert_resource() to simplify code kdump: return -ENOENT if required cmdline option does not exist
2022-05-20Merge tag 'nand/for-5.19' into mtd/nextMiquel Raynal
NAND core: * Print offset instead of page number for bad blocks Raw NAND controller drivers: * Cadence: Fix possible null-ptr-deref in cadence_nand_dt_probe() * CS553X: simplify the return expression of cs553x_write_ctrl_byte() * Davinci: Remove redundant unsigned comparison to zero * Denali: Use managed device resources * GPMI: - Add large oob bch setting support - Rename the variable ecc_chunk_size - Uninline the gpmi_check_ecc function - Add strict ecc strength check - Refactor BCH geometry settings function * Intel: Fix possible null-ptr-deref in ebu_nand_probe() * MPC5121: Check before clk_disable_unprepare() not needed * Mtk: - MTD_NAND_ECC_MEDIATEK should depend on ARCH_MEDIATEK - Also parse the default nand-ecc-engine property if available - Make mtk_ecc.c a separated module * OMAP ELM: - Convert the bindings to yaml - Describe the bindings for AM64 ELM - Add support for its compatible * Renesas: Use runtime PM instead of the raw clock API and update the bindings accordingly * Rockchip: Check before clk_disable_unprepare() not needed * TMIO: Check return value after calling platform_get_resource() Raw NAND chip driver: * Kioxia: Add support for TH58NVG3S0HBAI4 and TC58NVG0S3HTA00 SPI-NAND chip drivers: * Gigadevice: - Add support for: - GD5FxGM7xExxG - GD5F{2,4}GQ5xExxG - GD5F1GQ5RExxG - GD5FxGQ4xExxG - Fix Quad IO for GD5F1GQ5UExxG * XTX: Add support for XT26G0xA Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2022-05-20Merge tag 'spi-nor/for-5.19' into mtd/nextMiquel Raynal
SPI NOR core changes: - Read back written SR value to make sure the write was done correctly. - Introduce a common function for Read ID that manufacturer drivers can use to verify the Octal DTR switch worked correctly. - Add helpers for read/write any register commands so manufacturer drivers don't open code it every time. - Clarify rdsr dummy cycles documentation. - Add debugfs entry to expose internal flash parameters and state. SPI NOR manufacturer drivers changes: - Add support for Winbond W25Q512NW-IM, and Eon EN25QH256A. - Move spi_nor_write_ear() to Winbond module since only Winbond flashes use it. - Rework Micron and Cypress Octal DTR enable methods to improve readability. - Use the common Read ID function to verify switch to Octal DTR mode for Micron and Cypress flashes. - Skip polling status on volatile register writes for Micron and Cypress flashes since the operation is instant. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2022-05-20Merge branches 'apple/dart', 'arm/mediatek', 'arm/msm', 'arm/smmu', ↵Joerg Roedel
'ppc/pamu', 'x86/vt-d', 'x86/amd' and 'vfio-notifier-fix' into next
2022-05-19move mount-related externs from fs.h to mount.hAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-05-19linux/mount.h: trim includesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-05-19scsi: nvme-fc: Add new routine nvme_fc_io_getuuid()Muneendra Kumar
Add nvme_fc_io_getuuid() to the nvme-fc transport. The routine is invoked by the FC LLDD on a per-I/O request basis. The routine translates from the FC-specific request structure to the bio and the cgroup structure in order to obtain the FC appid stored in the cgroup structure. If a value is not set or a bio is not found, a NULL appid (aka uuid) will be returned to the LLDD. Link: https://lore.kernel.org/r/20220519123110.17361-2-jsmart2021@gmail.com Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-19net: macb: Fix PTP one step sync supportHarini Katakam
PTP one step sync packets cannot have CSUM padding and insertion in SW since time stamp is inserted on the fly by HW. In addition, ptp4l version 3.0 and above report an error when skb timestamps are reported for packets that not processed for TX TS after transmission. Add a helper to identify PTP one step sync and fix the above two errors. Add a common mask for PTP header flag field "twoStepflag". Also reset ptp OSS bit when one step is not selected. Fixes: ab91f0a9b5f4 ("net: macb: Add hardware PTP support") Fixes: 653e92a9175e ("net: macb: add support for padding and fcs computation") Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Link: https://lore.kernel.org/r/20220518170756.7752-1-harini.katakam@xilinx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19riscv: kexec: add kexec_file_load() supportPalmer Dabbelt
This patch set implements kexec_file_load() for RISC-V, which is currently only allowed on rv64 due to some minor build issues on 32-bit platforms in the generic code. This allows users to kexec() using an FD as opposed to a buffer. Link: https://lore.kernel.org/all/20220408100914.150110-1-lizhengyu3@huawei.com/ * palmer/riscv-kexec_file: RISC-V: Load purgatory in kexec_file RISC-V: Add purgatory RISC-V: Support for kexec_file on panic RISC-V: Add kexec_file support RISC-V: use memcpy for kexec_file mode kexec_file: Fix kexec_file.c build error for riscv platform
2022-05-19topology: Remove unused cpu_cluster_mask()Dietmar Eggemann
default_topology[] uses cpu_clustergroup_mask() for the CLS level (guarded by CONFIG_SCHED_CLUSTER) which is currently provided by x86 (arch/x86/kernel/smpboot.c) and arm64 (drivers/base/arch_topology.c). Fixes: 778c558f49a2c ("sched: Add cluster scheduler level in core and related Kconfig for ARM64") Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Barry Song <baohua@kernel.org> Link: https://lore.kernel.org/r/20220513093433.425163-1-dietmar.eggemann@arm.com
2022-05-19nodemask.h: fix compilation error with GCC12Christophe de Dinechin
With gcc version 12.0.1 20220401 (Red Hat 12.0.1-0), building with defconfig results in the following compilation error: | CC mm/swapfile.o | mm/swapfile.c: In function `setup_swap_info': | mm/swapfile.c:2291:47: error: array subscript -1 is below array bounds | of `struct plist_node[]' [-Werror=array-bounds] | 2291 | p->avail_lists[i].prio = 1; | | ~~~~~~~~~~~~~~^~~ | In file included from mm/swapfile.c:16: | ./include/linux/swap.h:292:27: note: while referencing `avail_lists' | 292 | struct plist_node avail_lists[]; /* | | ^~~~~~~~~~~ This is due to the compiler detecting that the mask in node_states[__state] could theoretically be zero, which would lead to first_node() returning -1 through find_first_bit. I believe that the warning/error is legitimate. I first tried adding a test to check that the node mask is not emtpy, since a similar test exists in the case where MAX_NUMNODES == 1. However, adding the if statement causes other warnings to appear in for_each_cpu_node_but, because it introduces a dangling else ambiguity. And unfortunately, GCC is not smart enough to detect that the added test makes the case where (node) == -1 impossible, so it still complains with the same message. This is why I settled on replacing that with a harmless, but relatively useless (node) >= 0 test. Based on the warning for the dangling else, I also decided to fix the case where MAX_NUMNODES == 1 by moving the condition inside the for loop. It will still only be tested once. This ensures that the meaning of an else following for_each_node_mask or derivatives would not silently have a different meaning depending on the configuration. Link: https://lkml.kernel.org/r/20220414150855.2407137-3-dinechin@redhat.com Signed-off-by: Christophe de Dinechin <christophe@dinechin.org> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ben Segall <bsegall@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm: fix missing handler for __GFP_NOWARNQi Zheng
We expect no warnings to be issued when we specify __GFP_NOWARN, but currently in paths like alloc_pages() and kmalloc(), there are still some warnings printed, fix it. But for some warnings that report usage problems, we don't deal with them. If such warnings are printed, then we should fix the usage problems. Such as the following case: WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); [zhengqi.arch@bytedance.com: v2] Link: https://lkml.kernel.org/r/20220511061951.1114-1-zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/20220510113809.80626-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm: don't be stuck to rmap lock on reclaim pathMinchan Kim
The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended under memory pressure if processes keep working on their vmas(e.g., fork, mmap, munmap). It makes reclaim path stuck. In our real workload traces, we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it makes other processes entering direct reclaim, which were also stuck on the lock. This patch makes lru aging path try_lock mode like shink_page_list so the reclaim context will keep working with next lru pages without being stuck. if it found the rmap lock contended, it rotates the page back to head of lru in both active/inactive lrus to make them consistent behavior, which is basic starting point rather than adding more heristic. Since this patch introduces a new "contended" field as out-param along with try_lock in-param in rmap_walk_control, it's not immutable any longer if the try_lock is set so remove const keywords on rmap related functions. Since rmap walking is already expensive operation, I doubt the const would help sizable benefit( And we didn't have it until 5.17). In a heavy app workload in Android, trace shows following statistics. It almost removes rmap lock contention from reclaim path. Martin Liu reported: Before: max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function 1632 0 1631 151.542173 31672 209 page_lock_anon_vma_read 601 0 601 145.544681 28817 198 rmap_walk_file After: max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function NaN NaN NaN NaN NaN 0.0 NaN 0 0 0 0.127645 1 12 rmap_walk_file [minchan@kernel.org: add comment, per Matthew] Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: John Dias <joaodias@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Martin Liu <liumartin@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19zswap: memcg accountingJohannes Weiner
Applications can currently escape their cgroup memory containment when zswap is enabled. This patch adds per-cgroup tracking and limiting of zswap backend memory to rectify this. The existing cgroup2 memory.stat file is extended to show zswap statistics analogous to what's in meminfo and vmstat. Furthermore, two new control files, memory.zswap.current and memory.zswap.max, are added to allow tuning zswap usage on a per-workload basis. This is important since not all workloads benefit from zswap equally; some even suffer compared to disk swap when memory contents don't compress well. The optimal size of the zswap pool, and the threshold for writeback, also depends on the size of the workload's warm set. The implementation doesn't use a traditional page_counter transaction. zswap is unconventional as a memory consumer in that we only know the amount of memory to charge once expensive compression has occurred. If zwap is disabled or the limit is already exceeded we obviously don't want to compress page upon page only to reject them all. Instead, the limit is checked against current usage, then we compress and charge. This allows some limit overrun, but not enough to matter in practice. [hannes@cmpxchg.org: fix for CONFIG_SLOB builds] Link: https://lkml.kernel.org/r/YnwD14zxYjUJPc2w@cmpxchg.org [hannes@cmpxchg.org: opt out of cgroups v1] Link: https://lkml.kernel.org/r/Yn6it9mBYFA+/lTb@cmpxchg.org Link: https://lkml.kernel.org/r/20220510152847.230957-7-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm: zswap: add basic meminfo and vmstat coverageJohannes Weiner
Currently it requires poking at debugfs to figure out the size and population of the zswap cache on a host. There are no counters for reads and writes against the cache. As a result, it's difficult to understand zswap behavior on production systems. Print zswap memory consumption and how many pages are zswapped out in /proc/meminfo. Count zswapouts and zswapins in /proc/vmstat. Link: https://lkml.kernel.org/r/20220510152847.230957-6-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm/swap: fix comment about swap extentMiaohe Lin
Since commit 4efaceb1c5f8 ("mm, swap: use rbtree for swap_extent"), rbtree is used for swap extent. Also curr_swap_extent is removed at that time. Update the corresponding comment. Link: https://lkml.kernel.org/r/20220509131416.17553-16-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm/swap: fix the obsolete comment for SWP_TYPE_SHIFTMiaohe Lin
Since commit 3159f943aafd ("xarray: Replace exceptional entries"), there is only one bit of 'type' can be shifted up. Update the corresponding comment. Link: https://lkml.kernel.org/r/20220509131416.17553-13-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm/swap: make page_swapcount and __lru_add_drain_all staticMiaohe Lin
Make page_swapcount and __lru_add_drain_all static. They are only used within the file now. Link: https://lkml.kernel.org/r/20220509131416.17553-9-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm/swap: remove unneeded return value of free_swap_slotMiaohe Lin
The return value of free_swap_slot is always 0 and also ignored now. Remove it to clean up the code. Link: https://lkml.kernel.org/r/20220509131416.17553-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm: khugepaged: introduce khugepaged_enter_vma() helperYang Shi
The khugepaged_enter_vma_merge() actually does as the same thing as the khugepaged_enter() section called by shmem_mmap(), so consolidate them into one helper and rename it to khugepaged_enter_vma(). Link: https://lkml.kernel.org/r/20220510203222.24246-8-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Acked-by: Vlastmil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Rik van Riel <riel@surriel.com> Cc: Song Liu <song@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm: khugepaged: make hugepage_vma_check() non-staticYang Shi
The hugepage_vma_check() could be reused by khugepaged_enter() and khugepaged_enter_vma_merge(), but it is static in khugepaged.c. Make it non-static and declare it in khugepaged.h. Link: https://lkml.kernel.org/r/20220510203222.24246-7-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Rik van Riel <riel@surriel.com> Cc: Song Liu <song@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm: khugepaged: make khugepaged_enter() void functionYang Shi
The most callers of khugepaged_enter() don't care about the return value. Only dup_mmap(), anonymous THP page fault and MADV_HUGEPAGE handle the error by returning -ENOMEM. Actually it is not harmful for them to ignore the error case either. It also sounds overkilling to fail fork() and page fault early due to khugepaged_enter() error, and MADV_HUGEPAGE does set VM_HUGEPAGE flag regardless of the error. Link: https://lkml.kernel.org/r/20220510203222.24246-6-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Acked-by: Song Liu <song@kernel.org> Acked-by: Vlastmil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Rik van Riel <riel@surriel.com> Cc: Song Liu <songliubraving@fb.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm: thp: only regular file could be THP eligibleYang Shi
Since commit a4aeaa06d45e ("mm: khugepaged: skip huge page collapse for special files"), khugepaged just collapses THP for regular file which is the intended usecase for readonly fs THP. Only show regular file as THP eligible accordingly. And make file_thp_enabled() available for khugepaged too in order to remove duplicate code. Link: https://lkml.kernel.org/r/20220510203222.24246-5-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Acked-by: Song Liu <song@kernel.org> Acked-by: Vlastmil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Rik van Riel <riel@surriel.com> Cc: Song Liu <songliubraving@fb.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19sched: coredump.h: clarify the use of MMF_VM_HUGEPAGEYang Shi
Patch series "Make khugepaged collapse readonly FS THP more consistent", v4. The readonly FS THP relies on khugepaged to collapse THP for suitable vmas. But the behavior is inconsistent for "always" mode (https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/). The "always" mode means THP allocation should be tried all the time and khugepaged should try to collapse THP all the time. Of course the allocation and collapse may fail due to other factors and conditions. Currently file THP may not be collapsed by khugepaged even though all the conditions are met. That does break the semantics of "always" mode. So make sure readonly FS vmas are registered to khugepaged to fix the break. Register suitable vmas in common mmap path, that could cover both readonly FS vmas and shmem vmas, so remove the khugepaged calls in shmem.c. The patch 1-7 are minor bug fixes, clean up and preparation patches. Patch 8 is the real meat. Tested with khugepaged test in selftests and the testcase provided by Vlastimil Babka in https://lore.kernel.org/lkml/df3b5d1c-a36b-2c73-3e27-99e74983de3a@suse.cz/ by commenting out MADV_HUGEPAGE call. This patch (of 8): MMF_VM_HUGEPAGE is set as long as the mm is available for khugepaged by khugepaged_enter(), not only when VM_HUGEPAGE is set on vma. Correct the comment to avoid confusion. Link: https://lkml.kernel.org/r/20220510203222.24246-1-shy828301@gmail.com Link: https://lkml.kernel.org/r/20220510203222.24246-2-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Song Liu <song@kernel.org> Acked-by: Vlastmil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Song Liu <songliubraving@fb.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19can: can-dev: remove obsolete CAN LED supportOliver Hartkopp
Since commit 30f3b42147ba6f ("can: mark led trigger as broken") the CAN specific LED support was disabled and marked as BROKEN. As the common LED support with CONFIG_LEDS_TRIGGER_NETDEV should do this work now the code can be removed as preparation for a CAN netdevice Kconfig rework. Link: https://lore.kernel.org/all/20220518154527.29046-1-socketcan@hartkopp.net Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> [mkl: remove led.h from MAINTAINERS] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-05-19kexec_file: Fix kexec_file.c build error for riscv platformLiao Chang
When CONFIG_KEXEC_FILE is set for riscv platform, the compilation of kernel/kexec_file.c generate build error: kernel/kexec_file.c: In function 'crash_prepare_elf64_headers': ./arch/riscv/include/asm/page.h:110:71: error: request for member 'virt_addr' in something not a structure or union 110 | ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < kernel_map.virt_addr)) | ^ ./arch/riscv/include/asm/page.h:131:2: note: in expansion of macro 'is_linear_mapping' 131 | is_linear_mapping(_x) ? \ | ^~~~~~~~~~~~~~~~~ ./arch/riscv/include/asm/page.h:140:31: note: in expansion of macro '__va_to_pa_nodebug' 140 | #define __phys_addr_symbol(x) __va_to_pa_nodebug(x) | ^~~~~~~~~~~~~~~~~~ ./arch/riscv/include/asm/page.h:143:24: note: in expansion of macro '__phys_addr_symbol' 143 | #define __pa_symbol(x) __phys_addr_symbol(RELOC_HIDE((unsigned long)(x), 0)) | ^~~~~~~~~~~~~~~~~~ kernel/kexec_file.c:1327:36: note: in expansion of macro '__pa_symbol' 1327 | phdr->p_offset = phdr->p_paddr = __pa_symbol(_text); This occurs is because the "kernel_map" referenced in macro is_linear_mapping() is suppose to be the one of struct kernel_mapping defined in arch/riscv/mm/init.c, but the 2nd argument of crash_prepare_elf64_header() has same symbol name, in expansion of macro is_linear_mapping in function crash_prepare_elf64_header(), "kernel_map" actually is the local variable. Signed-off-by: Liao Chang <liaochang1@huawei.com> Link: https://lore.kernel.org/r/20220408100914.150110-2-lizhengyu3@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-05-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/mellanox/mlx5/core/main.c b33886971dbc ("net/mlx5: Initialize flow steering during driver probe") 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support") f2b41b32cde8 ("net/mlx5: Remove ipsec_ops function table") https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/ 16d42d313350 ("net/mlx5: Drain fw_reset when removing device") 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") https://lore.kernel.org/all/20220519114119.060ce014@canb.auug.org.au/ tools/testing/selftests/net/mptcp/mptcp_join.sh e274f7154008 ("selftests: mptcp: add subflow limits test-cases") b6e074e171bc ("selftests: mptcp: add infinite map testcase") 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type") https://lore.kernel.org/all/20220516111918.366d747f@canb.auug.org.au/ net/mptcp/options.c ba2c89e0ea74 ("mptcp: fix checksum byte order") 1e39e5a32ad7 ("mptcp: infinite mapping sending") ea66758c1795 ("tcp: allow MPTCP to update the announced window") https://lore.kernel.org/all/20220519115146.751c3a37@canb.auug.org.au/ net/mptcp/pm.c 95d686517884 ("mptcp: fix subflow accounting on close") 4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") https://lore.kernel.org/all/20220516111435.72f35dca@canb.auug.org.au/ net/mptcp/subflow.c ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure") 0348c690ed37 ("mptcp: add the fallback check") f8d4bcacff3b ("mptcp: infinite mapping receiving") https://lore.kernel.org/all/20220519115837.380bb8d4@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19PM: domains: Allocate governor data dynamically based on a genpd governorUlf Hansson
If a genpd doesn't have an associated governor assigned, several variables in the struct generic_pm_domain becomes superfluous. Rather than wasting memory in allocated genpds, let's move the variables from the struct generic_pm_domain into a new separate struct. In this way, we can instead dynamically decide when we need to allocate the corresponding data for it. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>