linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-10-07	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/	David S. Miller
	ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-10-07 1) Fix a sysbot reported shift-out-of-bounds in xfrm_get_default. From Pavel Skripkin. 2) Fix XFRM_MSG_MAPPING ABI breakage. The new XFRM_MSG_MAPPING messages were accidentally not paced at the end. Fix by Eugene Syromiatnikov. 3) Fix the uapi for the default policy, use explicit field and macros and make it accessible to userland. From Nicolas Dichtel. 4) Fix a missing rcu lock in xfrm_notify_userpolicy(). From Nicolas Dichtel. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-07	ALSA: hda: intel: Allow repeatedly probing on codec configuration errors	Takashi Iwai
	It seems that a few recent AMD systems show the codec configuration errors at the early boot, while loading the driver at a later stage works magically. Although the root cause of the error isn't clear, it's certainly not bad to allow retrying the codec probe in such a case if that helps. This patch adds the capability for retrying the probe upon codec probe errors on the certain AMD platforms. The probe_work is changed to a delayed work, and at the secondary call, it'll jump to the codec probing. Note that, not only adding the re-probing, this includes the behavior changes in the codec configuration function. Namely, snd_hda_codec_configure() won't unregister the codec at errors any longer. Instead, its caller, azx_codec_configure() unregisters the codecs with the probe failures if any codec has been successfully configured. If all codec probe failed, it doesn't unregister but let it re-probed -- which is the most case we're seeing and this patch tries to improve. Even if the driver doesn't re-probe or give up, it will go to the "free-all" error path, hence the leftover codecs shall be disabled / deleted in anyway. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1190801 Link: https://lore.kernel.org/r/20211006141940.2897-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-10-06	net: mdio: add mdiobus_modify_changed()	Russell King (Oracle)
	Add mdiobus_modify_changed() helper to reflect the phylib and similar equivalents. This will avoid this functionality being open-coded, as has already happened in phylink, and it looks like other users will be appearing soon. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-06	ethtool: Add transceiver module extended state	Ido Schimmel
	Add an extended state and sub-state to describe link issues related to transceiver modules. The 'ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY' extended sub-state tells user space that port is unable to gain a carrier because the CMIS Module State Machine did not reach the ModuleReady (Fully Operational) state. For example, if the module is stuck at ModuleLowPwr or ModuleFault state. In case of the latter, user space can read the fault reason from the module's EEPROM and potentially reset it. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-06	ethtool: Add ability to control transceiver modules' power mode	Ido Schimmel
	Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and 'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver modules parameters and retrieve their status. The first parameter to control is the power mode of the module. It is only relevant for paged memory modules, as flat memory modules always operate in low power mode. When a paged memory module is in low power mode, its power consumption is reduced to the minimum, the management interface towards the host is available and the data path is deactivated. User space can choose to put modules that are not currently in use in low power mode and transition them to high power mode before putting the associated ports administratively up. This is useful for user space that favors reduced power consumption and lower temperatures over reduced link up times. In QSFP-DD modules the transition from low power mode to high power mode can take a few seconds and this transition is only expected to get longer with future / more complex modules. User space can control the power mode of the module via the power mode policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible values: * high: Module is always in high power mode. * auto: Module is transitioned by the host to high power mode when the first port using it is put administratively up and to low power mode when the last port using it is put administratively down. The operational power mode of the module is available to user space via the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not reported to user space when a module is not plugged-in. The user API is designed to be generic enough so that it could be used for modules with different memory maps (e.g., SFF-8636, CMIS). The only implementation of the device driver API in this series is for a MAC driver (mlxsw) where the module is controlled by the device's firmware, but it is designed to be generic enough so that it could also be used by implementations where the module is controlled by the CPU. CMIS testing ============ # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off The module is not in low power mode, as it is not forced by hardware (LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off). The power mode can be queried from the kernel. In case LowPwrAllowRequestHW was on, the kernel would need to take into account the state of the LowPwrRequestHW signal, which is not visible to user space. $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp11 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp11 up Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp11 down Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On SFF-8636 testing ================ # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7733 mW / -1.12 dBm Transmit avg optical power (Channel 2) : 0.7649 mW / -1.16 dBm Transmit avg optical power (Channel 3) : 0.7790 mW / -1.08 dBm Transmit avg optical power (Channel 4) : 0.7837 mW / -1.06 dBm Rcvr signal avg optical power(Channel 1) : 0.9302 mW / -0.31 dBm Rcvr signal avg optical power(Channel 2) : 0.9079 mW / -0.42 dBm Rcvr signal avg optical power(Channel 3) : 0.8993 mW / -0.46 dBm Rcvr signal avg optical power(Channel 4) : 0.8778 mW / -0.57 dBm The module is not in low power mode, as it is not forced by hardware (Power override is on) or by software (Power set is off). The power mode can be queried from the kernel. In case Power override was off, the kernel would need to take into account the state of the LPMode signal, which is not visible to user space. $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp13 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp13 up Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7934 mW / -1.01 dBm Transmit avg optical power (Channel 2) : 0.7859 mW / -1.05 dBm Transmit avg optical power (Channel 3) : 0.7885 mW / -1.03 dBm Transmit avg optical power (Channel 4) : 0.7985 mW / -0.98 dBm Rcvr signal avg optical power(Channel 1) : 0.9325 mW / -0.30 dBm Rcvr signal avg optical power(Channel 2) : 0.9034 mW / -0.44 dBm Rcvr signal avg optical power(Channel 3) : 0.9086 mW / -0.42 dBm Rcvr signal avg optical power(Channel 4) : 0.8885 mW / -0.51 dBm Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp13 down Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-06	kunit: fix kernel-doc warnings due to mismatched arg names	Daniel Latypov
	Commit 7122debb4367 ("kunit: introduce kunit_kmalloc_array/kunit_kcalloc() helpers") added new functions but called last arg `flags`, unlike the existing code that used `gfp`. This only is an issue in test.h, test.c still used `gfp`. But the documentation was copy-pasted with the old names, leading to kernel-doc warnings. Do s/flags/gfp to make the names consistent and fix the warnings. Fixes: 7122debb4367 ("kunit: introduce kunit_kmalloc_array/kunit_kcalloc() helpers") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-10-06	RDMA/efa: CQ notifications	Gal Pressman
	This patch adds support for CQ notifications through the standard verbs api. In order to achieve that, a new event queue (EQ) object is introduced, which is in charge of reporting completion events to the driver. On driver load, EQs are allocated and their affinity is set to a single cpu. When a user app creates a CQ with a completion channel, the completion vector number is converted to a EQ number, which is in charge of reporting the CQ events. In addition, the CQ creation admin command now returns an offset for the CQ doorbell, which is mapped to the userspace provider and is used to arm the CQ when requested by the user. The EQs use a single doorbell (located on the registers BAR), which encodes the EQ number and arm as part of the doorbell value. The EQs are polled by the driver on each new EQE, and arm it when the poll is completed. Link: https://lore.kernel.org/r/20211003105605.29222-1-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-07	Merge branch 'objtool/urgent'	Peter Zijlstra
	Fixup conflicts. # Conflicts: # tools/objtool/check.c
2021-10-06	coredump: Don't perform any cleanups before dumping core	Eric W. Biederman
	Rename coredump_exit_mm to coredump_task_exit and call it from do_exit before PTRACE_EVENT_EXIT, and before any cleanup work for a task happens. This ensures that an accurate copy of the process can be captured in the coredump as no cleanup for the process happens before the coredump completes. This also ensures that PTRACE_EVENT_EXIT will not be visited by any thread until the coredump is complete. Add a new flag PF_POSTCOREDUMP so that tasks that have passed through coredump_task_exit can be recognized and ignored in zap_process. Now that all of the coredumping happens before exit_mm remove code to test for a coredump in progress from mm_release. Replace "may_ptrace_stop()" with a simple test of "current->ptrace". The other tests in may_ptrace_stop all concern avoiding stopping during a coredump. These tests are no longer necessary as it is now guaranteed that fatal_signal_pending will be set if the code enters ptrace_stop during a coredump. The code in ptrace_stop is guaranteed not to stop if fatal_signal_pending returns true. Until this change "ptrace_event(PTRACE_EVENT_EXIT)" could call ptrace_stop without fatal_signal_pending being true, as signals are dequeued in get_signal before calling do_exit. This is no longer an issue as "ptrace_event(PTRACE_EVENT_EXIT)" is no longer reached until after the coredump completes. Link: https://lkml.kernel.org/r/874kaax26c.fsf@disp2133 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-06	ptrace: Remove the unnecessary arguments from arch_ptrace_stop	Eric W. Biederman
	Both arch_ptrace_stop_needed and arch_ptrace_stop are called with an exit_code and a siginfo structure. Neither argument is used by any of the implementations so just remove the unneeded arguments. The two arechitectures that implement arch_ptrace_stop are ia64 and sparc. Both architectures flush their register stacks before a ptrace_stack so that all of the register information can be accessed by debuggers. As the question of if a register stack needs to be flushed is independent of why ptrace is stopping not needing arguments make sense. Cc: David Miller <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Link: https://lkml.kernel.org/r/87lf3mx290.fsf@disp2133 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-06	hyper-v: Replace uuid.h with types.h	Andy Shevchenko
	There is no user of anything in uuid.h in the hyperv.h. Replace it with more appropriate types.h. Fixes: f081bbb3fd03 ("hyper-v: Remove internal types from UAPI header") Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/20211001135544.1823-1-andriy.shevchenko@linux.intel.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-10-06	dma-buf: add dma_resv_for_each_fence_unlocked v8	Christian König
	Abstract the complexity of iterating over all the fences in a dma_resv object. The new loop handles the whole RCU and retry dance and returns only fences where we can be sure we grabbed the right one. v2: fix accessing the shared fences while they might be freed, improve kerneldoc, rename _cursor to _iter, add dma_resv_iter_is_exclusive, add dma_resv_iter_begin/end v3: restructor the code, move rcu_read_lock()/unlock() into the iterator, add dma_resv_iter_is_restarted() v4: fix NULL deref when no explicit fence exists, drop superflous rcu_read_lock()/unlock() calls. v5: fix typos in the documentation v6: fix coding error when excl fence is NULL v7: one more logic fix v8: fix index check in dma_resv_iter_is_exclusive() Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v7) Link: https://patchwork.freedesktop.org/patch/msgid/20211005113742.1101-2-christian.koenig@amd.com
2021-10-05	bpf: selftests: Add selftests for module kfunc support	Kumar Kartikeya Dwivedi
	This adds selftests that tests the success and failure path for modules kfuncs (in presence of invalid kfunc calls) for both libbpf and gen_loader. It also adds a prog_test kfunc_btf_id_list so that we can add module BTF ID set from bpf_testmod. This also introduces a couple of test cases to verifier selftests for validating whether we get an error or not depending on if invalid kfunc call remains after elimination of unreachable instructions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-10-memxor@gmail.com
2021-10-05	bpf: Enable TCP congestion control kfunc from modules	Kumar Kartikeya Dwivedi
	This commit moves BTF ID lookup into the newly added registration helper, in a way that the bbr, cubic, and dctcp implementation set up their sets in the bpf_tcp_ca kfunc_btf_set list, while the ones not dependent on modules are looked up from the wrapper function. This lifts the restriction for them to be compiled as built in objects, and can be loaded as modules if required. Also modify Makefile.modfinal to call resolve_btfids for each module. Note that since kernel kfunc_ids never overlap with module kfunc_ids, we only match the owner for module btf id sets. See following commits for background on use of: CONFIG_X86 ifdef: 569c484f9995 (bpf: Limit static tcp-cc functions in the .BTF_ids list to x86) CONFIG_DYNAMIC_FTRACE ifdef: 7aae231ac93b (bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE) Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-6-memxor@gmail.com
2021-10-05	bpf: btf: Introduce helpers for dynamic BTF set registration	Kumar Kartikeya Dwivedi
	This adds helpers for registering btf_id_set from modules and the bpf_check_mod_kfunc_call callback that can be used to look them up. With in kernel sets, the way this is supposed to work is, in kernel callback looks up within the in-kernel kfunc whitelist, and then defers to the dynamic BTF set lookup if it doesn't find the BTF id. If there is no in-kernel BTF id set, this callback can be used directly. Also fix includes for btf.h and bpfptr.h so that they can included in isolation. This is in preparation for their usage in tcp_bbr, tcp_cubic and tcp_dctcp modules in the next patch. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-4-memxor@gmail.com
2021-10-05	bpf: Introduce BPF support for kernel module function calls	Kumar Kartikeya Dwivedi
	This change adds support on the kernel side to allow for BPF programs to call kernel module functions. Userspace will prepare an array of module BTF fds that is passed in during BPF_PROG_LOAD using fd_array parameter. In the kernel, the module BTFs are placed in the auxilliary struct for bpf_prog, and loaded as needed. The verifier then uses insn->off to index into the fd_array. insn->off 0 is reserved for vmlinux BTF (for backwards compat), so userspace must use an fd_array index > 0 for module kfunc support. kfunc_btf_tab is sorted based on offset in an array, and each offset corresponds to one descriptor, with a max limit up to 256 such module BTFs. We also change existing kfunc_tab to distinguish each element based on imm, off pair as each such call will now be distinct. Another change is to check_kfunc_call callback, which now include a struct module * pointer, this is to be used in later patch such that the kfunc_id and module pointer are matched for dynamically registered BTF sets from loadable modules, so that same kfunc_id in two modules doesn't lead to check_kfunc_call succeeding. For the duration of the check_kfunc_call, the reference to struct module exists, as it returns the pointer stored in kfunc_btf_tab. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-2-memxor@gmail.com
2021-10-05	Merge branch 'for-5.16/soc' into for-5.16/cpuidle	Thierry Reding

2021-10-05	clk: tegra: Add stubs needed for compile testing	Thierry Reding
	These stubs are needed to allow the tegra-cpuidle driver to be compile-tested. Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-10-05	Merge tag 'for-net-next-2021-10-01' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for MediaTek MT7922 and MT7921 - Enable support for AOSP extention in Qualcomm WCN399x and Realtek 8822C/8852A. - Add initial support for link quality and audio/codec offload. - Rework of sockets sendmsg to avoid locking issues. - Add vhci suspend/resume emulation. ==================== Link: https://lore.kernel.org/r/20211001230850.3635543-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-05	cpufreq: Use CPUFREQ_RELATION_E in DVFS governors	Vincent Donnefort
	Let the governors schedutil, conservative and ondemand to work, if possible on efficient frequencies only. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05	cpufreq: Introducing CPUFREQ_RELATION_E	Vincent Donnefort
	This newly introduced flag can be applied by a governor to a CPUFreq relation, when looking for a frequency within the policy table. The resolution would then only walk through efficient frequencies. Even with the flag set, the policy max limit will still be honoured. If no efficient frequencies can be found within the limits of the policy, an inefficient one would be returned. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05	cpufreq: Add an interface to mark inefficient frequencies	Vincent Donnefort
	Some SoCs such as the sd855 have OPPs within the same policy whose cost is higher than others with a higher frequency. Those OPPs are inefficients and it might be interesting for a governor to not use them. cpufreq_table_set_inefficient() allows the caller to identify a specified frequency as being inefficient. Inefficient frequencies are only supported on sorted tables. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05	PM: EM: Allow skipping inefficient states	Vincent Donnefort
	The new performance domain flag EM_PERF_DOMAIN_SKIP_INEFFICIENCIES allows to not take into account inefficient states when estimating energy consumption. This intends to let the Energy Model know that CPUFreq itself will skip inefficiencies and such states don't need to be part of the estimation anymore. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05	PM: EM: Extend em_perf_domain with a flag field	Vincent Donnefort
	Merge the current "milliwatts" option into a "flag" field. This intends to prepare the extension of this structure for inefficient states support in the Energy Model. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05	PM: EM: Mark inefficient states	Vincent Donnefort
	Some SoCs, such as the sd855 have OPPs within the same performance domain, whose cost is higher than others with a higher frequency. Even though those OPPs are interesting from a cooling perspective, it makes no sense to use them when the device can run at full capacity. Those OPPs handicap the performance domain, when choosing the most energy-efficient CPU and are wasting energy. They are inefficient. Hence, add support for such OPPs to the Energy Model. The table can now be read skipping inefficient performance states (and by extension, inefficient OPPs). Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05	virt: acrn: Introduce interfaces for virtual device creating/destroying	Shuo Liu
	The ACRN hypervisor can emulate a virtual device within hypervisor for a Guest VM. The emulated virtual device can work without the ACRN userspace after creation. The hypervisor do the emulation of that device. To support the virtual device creating/destroying, HSM provides the following ioctls: - ACRN_IOCTL_CREATE_VDEV Pass data struct acrn_vdev from userspace to the hypervisor, and inform the hypervisor to create a virtual device for a User VM. - ACRN_IOCTL_DESTROY_VDEV Pass data struct acrn_vdev from userspace to the hypervisor, and inform the hypervisor to destroy a virtual device of a User VM. These new APIs will be used by user space code vm_add_hv_vdev and vm_remove_hv_vdev in https://github.com/projectacrn/acrn-hypervisor/blob/master/devicemodel/core/vmmapi.c Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com> Link: https://lore.kernel.org/r/20210923084128.18902-3-fei1.li@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05	virt: acrn: Introduce interfaces for MMIO device passthrough	Shuo Liu
	MMIO device passthrough enables an OS in a virtual machine to directly access a MMIO device in the host. It promises almost the native performance, which is required in performance-critical scenarios of ACRN. HSM provides the following ioctls: - Assign - ACRN_IOCTL_ASSIGN_MMIODEV Pass data struct acrn_mmiodev from userspace to the hypervisor, and inform the hypervisor to assign a MMIO device to a User VM. - De-assign - ACRN_IOCTL_DEASSIGN_PCIDEV Pass data struct acrn_mmiodev from userspace to the hypervisor, and inform the hypervisor to de-assign a MMIO device from a User VM. These new APIs will be used by user space code vm_assign_mmiodev and vm_deassign_mmiodev in https://github.com/projectacrn/acrn-hypervisor/blob/master/devicemodel/core/vmmapi.c Signed-off-by: Shuo Liu <shuo.a.liu@intel.com> Signed-off-by: Fei Li <fei1.li@intel.com> Link: https://lore.kernel.org/r/20210923084128.18902-2-fei1.li@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05	ACPICA: Update version to 20210930	Bob Moore
	ACPICA commit e01cc6b3d12b5f73f44d46fa15a7f569c793b328 Version 20210930. Link: https://github.com/acpica/acpica/commit/e01cc6b3 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05	ACPICA: iASL table disassembler: Added disassembly support for the NHLT ACPI ↵	Bob Moore
	table ACPICA commit 94abe858583de24a425b37cb8e62d56c65c4f3cf Note: support for Vendor-defined microphone arrays and SNR extensions are not supported at this time -- mostly due to a lack of example tables. Actual compiler support for NHLT is forthcoming. Link: https://github.com/acpica/acpica/commit/94abe858 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05	ACPICA: ACPI 6.4 SRAT: add Generic Port Affinity type	Alison Schofield
	ACPICA commit 777e11b73e60f0eb606cf20142ef634702b09ba1 Add a new subtable type for SRAT Generic Port Affinity. It uses the same subtable structure as the existing Generic Initiator Affinity type. Link: https://github.com/acpica/acpica/commit/777e11b7 Signed-off-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05	ACPICA: Add support for Windows 2020 _OSI string	Mario Limonciello
	ACPICA commit 2dc55de56d2deac30af0b484dd1d65607eb33a9c Link: https://github.com/microsoft_docs/windows-driver-docs/commit/5164e24985e78ef4870d7a5801a5336104f36366 Link: https://github.com/acpica/acpica/commit/2dc55de5 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05	sched: Move mmdrop to RCU on RT	Thomas Gleixner
	mmdrop() is invoked from finish_task_switch() by the incoming task to drop the mm which was handed over by the previous task. mmdrop() can be quite expensive which prevents an incoming real-time task from getting useful work done. Provide mmdrop_sched() which maps to mmdrop() on !RT kernels. On RT kernels it delagates the eventually required invocation of __mmdrop() to RCU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210928122411.648582026@linutronix.de
2021-10-05	sched: Introduce task block time in schedstats	Yafang Shao
	Currently in schedstats we have sum_sleep_runtime and iowait_sum, but there's no metric to show how long the task is in D state. Once a task in D state, it means the task is blocked in the kernel, for example the task may be waiting for a mutex. The D state is more frequent than iowait, and it is more critital than S state. So it is worth to add a metric to measure it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210905143547.4668-5-laoar.shao@gmail.com
2021-10-05	sched: Make struct sched_statistics independent of fair sched class	Yafang Shao
	If we want to use the schedstats facility to trace other sched classes, we should make it independent of fair sched class. The struct sched_statistics is the schedular statistics of a task_struct or a task_group. So we can move it into struct task_struct and struct task_group to achieve the goal. After the patch, schestats are orgnized as follows, struct task_struct { ... struct sched_entity se; struct sched_rt_entity rt; struct sched_dl_entity dl; ... struct sched_statistics stats; ... }; Regarding the task group, schedstats is only supported for fair group sched, and a new struct sched_entity_stats is introduced, suggested by Peter - struct sched_entity_stats { struct sched_entity se; struct sched_statistics stats; } __no_randomize_layout; Then with the se in a task_group, we can easily get the stats. The sched_statistics members may be frequently modified when schedstats is enabled, in order to avoid impacting on random data which may in the same cacheline with them, the struct sched_statistics is defined as cacheline aligned. As this patch changes the core struct of scheduler, so I verified the performance it may impact on the scheduler with 'perf bench sched pipe', suggested by Mel. Below is the result, in which all the values are in usecs/op. Before After kernel.sched_schedstats=0 5.2~5.4 5.2~5.4 kernel.sched_schedstats=1 5.3~5.5 5.3~5.5 [These data is a little difference with the earlier version, that is because my old test machine is destroyed so I have to use a new different test machine.] Almost no impact on the sched performance. No functional change. [lkp@intel.com: reported build failure in earlier version] Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20210905143547.4668-3-laoar.shao@gmail.com
2021-10-05	fs/proc/uptime.c: Fix idle time reporting in /proc/uptime	Josh Don
	/proc/uptime reports idle time by reading the CPUTIME_IDLE field from the per-cpu kcpustats. However, on NO_HZ systems, idle time is not continually updated on idle cpus, leading this value to appear incorrectly small. /proc/stat performs an accounting update when reading idle time; we can use the same approach for uptime. With this patch, /proc/stat and /proc/uptime now agree on idle time. Additionally, the following shows idle time tick up consistently on an idle machine: (while true; do cat /proc/uptime; sleep 1; done) \| awk '{print $2-prev; prev=$2}' Reported-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lkml.kernel.org/r/20210827165438.3280779-1-joshdon@google.com
2021-10-05	ARM: omap1: move omap15xx local bus handling to usb.c	Arnd Bergmann
	Commit 38225f2ef2f4 ("ARM/omap1: switch to use dma_direct_set_offset for lbus DMA offsets") removed a lot of mach/memory.h, but left the USB offset handling split into arch/arm/mach-omap1/usb.c and drivers/usb/host/ohci-omap.c. This can cause a randconfig build warning that now fails the build with -Werror: arch/arm/mach-omap1/usb.c:561:30: error: 'omap_1510_usb_ohci_nb' defined but not used [-Werror=unused-variable] 561 \| static struct notifier_block omap_1510_usb_ohci_nb = { \| ^~~~~~~~~~~~~~~~~~~~~ Move it all into the platform file to get rid of the final location that relies on mach/memory.h. Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20210927144118.2464881-1-arnd@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-05	serial: core: Fix initializing and restoring termios speed	Pali Rohár
	Since commit edc6afc54968 ("tty: switch to ktermios and new framework") termios speed is no longer stored only in c_cflag member but also in new additional c_ispeed and c_ospeed members. If BOTHER flag is set in c_cflag then termios speed is stored only in these new members. Therefore to correctly restore termios speed it is required to store also ispeed and ospeed members, not only cflag member. In case only cflag member with BOTHER flag is restored then functions tty_termios_baud_rate() and tty_termios_input_baud_rate() returns baudrate stored in c_ospeed / c_ispeed member, which is zero as it was not restored too. If reported baudrate is invalid (e.g. zero) then serial core functions report fallback baudrate value 9600. So it means that in this case original baudrate is lost and kernel changes it to value 9600. Simple reproducer of this issue is to boot kernel with following command line argument: "console=ttyXXX,86400" (where ttyXXX is the device name). For speed 86400 there is no Bnnn constant and therefore kernel has to represent this speed via BOTHER c_cflag. Which means that speed is stored only in c_ospeed and c_ispeed members, not in c_cflag anymore. If bootloader correctly configures serial device to speed 86400 then kernel prints boot log to early console at speed speed 86400 without any issue. But after kernel starts initializing real console device ttyXXX then speed is changed to fallback value 9600 because information about speed was lost. This patch fixes above issue by storing and restoring also ispeed and ospeed members, which are required for BOTHER flag. Fixes: edc6afc54968 ("[PATCH] tty: switch to ktermios and new framework") Cc: stable@vger.kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Link: https://lore.kernel.org/r/20211002130900.9518-1-pali@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05	mlx4: constify args for const dev_addr	Jakub Kicinski
	netdev->dev_addr will become const soon. Make sure all functions which pass it around mark appropriate args as const. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05	mlx4: replace mlx4_u64_to_mac() with u64_to_ether_addr()	Jakub Kicinski
	mlx4_u64_to_mac() predates the common helper but doesn't make the argument constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05	mlx4: replace mlx4_mac_to_u64() with ether_addr_to_u64()	Jakub Kicinski
	mlx4_mac_to_u64() predates and opencodes ether_addr_to_u64(). It doesn't make the argument constant so it'll be problematic when dev->dev_addr becomes a const. Convert to the generic helper. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05	ASoC: SOF: dai-intel: add SOF_DAI_INTEL_SSP_CLKCTRL_MCLK/BCLK_ES bits	Bard Liao
	Add two clks_control bits. MCLK and/or BCLK will start during hw_params and stop during hw_free if the corresponding bit is set. While the kernel does not do anything with these bitfields, this is also tagged as part of the ABI 3.19 changes. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Bard Liao <bard.liao@intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Reviewed-by: Brent Lu <brent.lu@intel.com> Link: https://lore.kernel.org/r/20211004171430.103674-5-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-05	ASoC: SOF: dai: include new flags for DAI_CONFIG	Pierre-Louis Bossart
	Mirror changes done in SOF tree. The changes do not rely on BIT/GENMASK on purpose to keep the structure and flags common with the firmware tree. The DAI_CONFIG IPC is currently used in multiple ways. It is sent to the DSP firmware when enabling static or dynamic pipelines, in hw_params or prepare callbacks for Intel SSP, HDaudio and ALH, on trigger_stop and hw_free. This IPC has been abused a bit in the past, i.e. the values used for some of the DAI-specific fields are used to either allocate or free resources. Two typical examples are Intel HDaudio and SoundWire/ALH DAIs, where using a zero DMA channel number or stream tag signals to the firmware the DMA channels or tags allocated earlier can be freed. Rather than add a new IPC for 'hw_params' and 'hw_free', this patch suggests supporting a 2-bit value conveying the 'stage' information in an existing IPC structure. Only 3 possible values are used. The mapping between HW_PARAMS and HW_FREE flags and ALSA definitions is not strictly 1:1, e.g. in some cases the HW_PARAM flag might be set during the .prepare callback, while the HW_FREE might be sent during the ALSA .trigger for stop/suspend. The semantics of the flags is to reserve and start/stop all needed resources, typically hardware related such as DMAs or clocks, when the HW_PARAMS is set, while the HW_FREE flag allows the firmware to release the resources allocated. The data transfers are still controlled within the firmware through the propagation of the trigger command. The driver can then pass information that the DAI_CONFIG was invoked in e.g. a pipeline/DAI setup, hw_params or hw_free stage without having to use a special DAI-specific encoding. Unfortunately we can't remove old encodings due to backwards-compatibility requirements but for new cases, such as the SSP in follow-up patches, we can make the IPC less cryptic. This change is tagged as ABI 3.19 and is completely backwards compatible. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Bard Liao <bard.liao@intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Reviewed-by: Brent Lu <brent.lu@intel.com> Link: https://lore.kernel.org/r/20211004171430.103674-3-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-05	ASoC: SOF: dai: mirror group_id definition added in firmware	Pierre-Louis Bossart
	This was added in ABI 3.17 but never added to the kernel tree. The group_id is not currently used but this patch is required before additional changes. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Bard Liao <bard.liao@intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Reviewed-by: Brent Lu <brent.lu@intel.com> Link: https://lore.kernel.org/r/20211004171430.103674-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-05	netlink: remove netlink_broadcast_filtered	Florian Westphal
	No users in tree since commit a3498436b3a0 ("netns: restrict uevents"), so remove this functionality. Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05	ipmi: Add support for IPMB direct messages	Corey Minyard
	An application has come up that has a device sitting right on the IPMB that would like to communicate with the BMC on the IPMB using normal IPMI commands. Sending these commands and handling the responses is easy enough, no modifications are needed to the IPMI infrastructure. But if this is an application that also needs to receive IPMB commands and respond, some way is needed to handle these incoming commands and send the responses. Currently, the IPMI message handler only sends commands to the interface and only receives responses from interface. This change extends the interface to receive commands/responses and send commands/responses. These are formatted differently in support of receiving/sending IPMB messages directly. Signed-off-by: Corey Minyard <minyard@acm.org> Tested-by: Andrew Manley <andrew.manley@sealingtech.com> Reviewed-by: Andrew Manley <andrew.manley@sealingtech.com>
2021-10-05	ipmi: Export ipmb_checksum()	Corey Minyard
	It will be needed by the upcoming ipmb direct addressing. Signed-off-by: Corey Minyard <minyard@acm.org> Tested-by: Andrew Manley <andrew.manley@sealingtech.com> Reviewed-by: Andrew Manley <andrew.manley@sealingtech.com>
2021-10-05	ipmi: Fix a typo	Corey Minyard
	Spell "RESPONSE" correctly in a comment. Signed-off-by: Corey Minyard <cminyard@mvista.com>
2021-10-05	etherdevice: use __dev_addr_set()	Jakub Kicinski
	Andrew points out that eth_hw_addr_set() replaces memcpy() calls so we can't use ether_addr_copy() which assumes both arguments are 2-bytes aligned. Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05	usb: phy: tegra: Support OTG mode programming	Dmitry Osipenko
	Support programming USB PHY into OTG mode. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20210912181718.1328-5-digetx@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05	cachefiles: Fix oops with cachefiles_cull() due to NULL object	Dave Wysochanski
	When cachefiles_cull() calls cachefiles_bury_object(), it passes a NULL object. When this occurs, either trace_cachefiles_unlink() or trace_cachefiles_rename() may oops due to the NULL object. Check for NULL object in the tracepoint and if so, set debug_id to MAX_UINT as was done in 2908f5e101e3. The following oops was seen with xfstests generic/100. BUG: kernel NULL pointer dereference, address: 0000000000000010 ... RIP: 0010:trace_event_raw_event_cachefiles_unlink+0x4e/0xa0 [cachefiles] ... Call Trace: cachefiles_bury_object+0x242/0x430 [cachefiles] ? __vfs_removexattr_locked+0x10f/0x150 ? vfs_removexattr+0x51/0xd0 cachefiles_cull+0x84/0x120 [cachefiles] cachefiles_daemon_cull+0xd1/0x120 [cachefiles] cachefiles_daemon_write+0x158/0x190 [cachefiles] vfs_write+0xbc/0x260 ksys_write+0x4f/0xc0 do_syscall_64+0x3b/0x90 The following oops was seen with xfstests generic/290. BUG: kernel NULL pointer dereference, address: 0000000000000010 ... RIP: 0010:trace_event_raw_event_cachefiles_rename+0x54/0xa0 [cachefiles] ... Call Trace: cachefiles_bury_object+0x35c/0x430 [cachefiles] cachefiles_cull+0x84/0x120 [cachefiles] cachefiles_daemon_cull+0xd1/0x120 [cachefiles] cachefiles_daemon_write+0x158/0x190 [cachefiles] vfs_write+0xbc/0x260 ksys_write+0x4f/0xc0 do_syscall_64+0x3b/0x90 Fixes: 2908f5e101e3 ("fscache: Add a cookie debug ID and use that in traces") Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://listman.redhat.com/archives/linux-cachefs/2021-October/msg00009.html