linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-05-18	Merge branch 'ib/5.17-cros-ec-keyb' into next	Dmitry Torokhov
	Merge changes to ChromeOS EC Keyboard driver.
2022-05-18	Input: vmmouse - disable vmmouse before entering suspend mode	Zongmin Zhou
	Currently, when trying to suspend and resume with VirtualPS/2 VMMouse there is an error message after resuming: psmouse serio1: vmmouse: Unable to re-enable mouse when reconnecting, err: -6 and the mouse will no longer be operable, requiring full rescan to find a another driver to use for the port. This error is due to QEMU still generating PS2 events which the kernel is not consuming until resume time, where they interfere with mouse identification and ultimately resulting in an error getting VMMOUSE_VERSION_ID. Test scenario: 1) start virtual machine with qemu command "vmport=on" 2) click suspend botton to enter suspend mode 3) resume and observe the error message in the kernel logs Let's fix this by disabling the vmmouse in its reset handler. This will notify qemu to stop vmmouse and remove the handler. Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn> Reviewed-by: Zack Rusin <zackr@vmware.com> Link: https://lore.kernel.org/r/20220322021046.1087954-1-zhouzongmin@kylinos.cn Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-05-18	Input: ili210x - use one common reset implementation	Marek Vasut
	Rename ili251x_hardware_reset() to ili210x_hardware_reset(), change its parameter from struct device * to struct gpio_desc *, and use it as one single consistent reset implementation all over the driver. Also increase the minimum reset duration to 12ms, to make sure the reset is really within the spec. Signed-off-by: Marek Vasut <marex@denx.de> Link: https://lore.kernel.org/r/20220518210423.106555-1-marex@denx.de Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-05-18	Input: ili210x - fix reset timing	Marek Vasut
	According to Ilitek "231x & ILI251x Programming Guide" Version: 2.30 "2.1. Power Sequence", "T4 Chip Reset and discharge time" is minimum 10ms and "T2 Chip initial time" is maximum 150ms. Adjust the reset timings such that T4 is 12ms and T2 is 160ms to fit those figures. This prevents sporadic touch controller start up failures when some systems with at least ILI251x controller boot, without this patch the systems sometimes fail to communicate with the touch controller. Fixes: 201f3c803544c ("Input: ili210x - add reset GPIO support") Signed-off-by: Marek Vasut <marex@denx.de> Link: https://lore.kernel.org/r/20220518204901.93534-1-marex@denx.de Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-05-18	clk: ingenic: Mark critical clocks in Ingenic SoCs	Aidan MacDonald
	Consider CPU, L2 cache, and memory clocks as critical to prevent them -- and the parent clocks -- from being automatically gated, since nothing calls clk_get() on these clocks. Gating the CPU clock hangs the processor, and gating memory makes external DRAM inaccessible. Normal kernel code can't hope to deal with either situation so those clocks have to be critical. The L2 cache is required only if caches are running, and could be gated if the kernel takes care to flush and disable caches before gating the clock. There's no mechanism to do this, and probably no reason to do it, so it's simpler to mark the L2 cache as critical. Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com> Reviewed-by: Paul Cercueil <paul@crapouillou.net> Link: https://lore.kernel.org/r/20220428164454.17908-3-aidanmacdonald.0x0@gmail.com Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> # On X1000 and X1830 Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-05-18	clk: ingenic: Allow specifying common clock flags	Aidan MacDonald
	Provide a flags field for clocks under the ingenic-cgu driver, which can be used to set generic common clock framework flags on the created clocks. For example, the CLK_IS_CRITICAL flag is needed for some clocks (such as CPU or memory) to stop them being automatically disabled. Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com> Reviewed-by: Paul Cercueil <paul@crapouillou.net> Link: https://lore.kernel.org/r/20220428164454.17908-2-aidanmacdonald.0x0@gmail.com Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> # On X1000 and X1830 Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-05-18	clk: ux500: fix a possible off-by-one in u8500_prcc_reset_base()	Hangyu Hua
	Off-by-one will happen when index == ARRAY_SIZE(ur->base). Fixes: b14cbdfd467d ("clk: ux500: Add driver for the reset portions of PRCC") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20220518062537.17933-1-hbh25y@gmail.com Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-05-18	drm/amd: Don't reset dGPUs if the system is going to s2idle	Mario Limonciello
	An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC reset for handling aborted suspend can't work with s2idle. This functionality was introduced in commit daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)"). A few other commits have gone on top of the ASIC reset, but this still doesn't work on the A+A configuration in s2idle. Avoid doing the reset on dGPUs specifically when using s2idle. Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-05-18	drm/amd: Don't reset dGPUs if the system is going to s2idle	Mario Limonciello
	An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC reset for handling aborted suspend can't work with s2idle. This functionality was introduced in commit daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)"). A few other commits have gone on top of the ASIC reset, but this still doesn't work on the A+A configuration in s2idle. Avoid doing the reset on dGPUs specifically when using s2idle. Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-18	drm/amdgpu: Unmap legacy queue when MES is enabled	Luben Tuikov
	This fixes a kernel oops when MES is not enabled. Reported-by: Kenny Ho <Kenny.Ho@amd.com> Suggested-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Fixes: 18ee4ce63e0f32 ("drm/amdgpu: add mes unmap legacy queue routine") Fixes: 3d879e81f0f9ed ("drm/amdgpu: add init support for GFX11 (v2)") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-18	Merge tag 'devfreq-next-for-5.19' of ↵	Rafael J. Wysocki
	git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux Pull devfreq changes for 5.19-rc1 from Chanwoo Choi: "1. Update devfreq core - Add cpu based scaling support to passive governor. Some device like cache might require the dynamic frequency scaling. But, it has very tightly to cpu frequency. So that use passive governor to scale the frequency according to current cpu frequency. To decide the frequency of the device, the governor does one of the following: : Derives the optimal devfreq device opp from required-opps property of the parent cpu opp_table. : Scales the device frequency in proportion to the CPU frequency. So, if the CPUs are running at their max frequency, the device runs at its max frequency. If the CPUs are running at their min frequency, the device runs at its min frequency. It is interpolated for frequencies in between. 2. Update devfreq drivers - Update rk3399_dmc.c as following: : Convert dt-binding document to YAML and deprecate unused properties. : Use Hz units for the device-tree properties of rk3399_dmc. : rk3399_dmc is able to set the idle time before changing the dmc clock. Specify idle time parameters by using nano-second unit on dt bidning. : Add new disable-freq properties to optimize the power-saving feature of rk3399_dmc. : Disable devfreq-event device on remove() to fix unbalanced enable-disable count. : Use devm_pm_opp_of_add_table() : Block PMU (Power-Management Unit) transitions when scaling frequency by ARM Trust Firmware in order to fix the conflict between PMU and DMC (Dynamic Memory Controller)." * tag 'devfreq-next-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux: PM / devfreq: passive: Keep cpufreq_policy for possible cpus PM / devfreq: passive: Reduce duplicate code when passive_devfreq case PM / devfreq: Add cpu based scaling support to passive governor PM / devfreq: Export devfreq_get_freq_range symbol within devfreq PM / devfreq: rk3399_dmc: Block PMU during transitions soc: rockchip: power-domain: Manage resource conflicts with firmware PM / devfreq: rk3399_dmc: Avoid static (reused) profile PM / devfreq: rk3399_dmc: Use devm_pm_opp_of_add_table() PM / devfreq: rk3399_dmc: Disable edev on remove() PM / devfreq: rk3399_dmc: Support new *-ns properties PM / devfreq: rk3399_dmc: Support new disable-freq properties PM / devfreq: rk3399_dmc: Use bitfield macro definitions for ODT_PD PM / devfreq: rk3399_dmc: Drop excess timing properties PM / devfreq: rk3399_dmc: Drop undocumented ondemand DT props dt-bindings: devfreq: rk3399_dmc: Add more disable-freq properties dt-bindings: devfreq: rk3399_dmc: Specify idle params in nanoseconds dt-bindings: devfreq: rk3399_dmc: Fix Hz units dt-bindings: devfreq: rk3399_dmc: Deprecate unused/redundant properties dt-bindings: devfreq: rk3399_dmc: Convert to YAML
2022-05-18	thermal: intel: hfi: remove NULL check after container_of() call	Haowen Bai
	container_of() will never return NULL, so remove useless code. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18	powercap: intel_rapl: add support for ALDERLAKE_N	Zhang Rui
	Add ALDERLAKE_N to the list of supported processor models in the Intel RAPL power capping driver. Signed-off-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18	drm: msm: fix possible memory leak in mdp5_crtc_cursor_set()	Hangyu Hua
	drm_gem_object_lookup will call drm_gem_object_get inside. So cursor_bo needs to be put when msm_gem_get_and_pin_iova fails. Fixes: e172d10a9c4a ("drm/msm/mdp5: Add hardware cursor support") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20220509061125.18585-1-hbh25y@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-05-18	drm/msm: Fix fb plane offset calculation	Rob Clark
	The offset got dropped by accident. Fixes: d413e6f97134 ("drm/msm: Drop msm_gem_iova()") Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Tested-by: Stephen Boyd <swboyd@chromium.org> # CoachZ Link: https://lore.kernel.org/r/20220510165216.3577068-1-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-05-18	drm/msm/a6xx: Fix refcount leak in a6xx_gpu_init	Miaoqian Lin
	of_parse_phandle() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. a6xx_gmu_init() passes the node to of_find_device_by_node() and of_dma_configure(), of_find_device_by_node() will takes its reference, of_dma_configure() doesn't need the node after usage. Add missing of_node_put() to avoid refcount leak. Fixes: 4b565ca5a2cb ("drm/msm: Add A6XX device support") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Link: https://lore.kernel.org/r/20220512121955.56937-1-linmq006@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-05-18	drm/msm/dsi: don't powerup at modeset time for parade-ps8640	Douglas Anderson
	Commit 7d8e9a90509f ("drm/msm/dsi: move DSI host powerup to modeset time") caused sc7180 Chromebooks that use the parade-ps8640 bridge chip to fail to turn the display back on after it turns off. Unfortunately, it doesn't look easy to fix the parade-ps8640 driver to handle the new power sequence. The Linux driver has almost nothing in it and most of the logic for this bridge chip is in black-box firmware that the bridge chip uses. Also unfortunately, reverting the patch will break "tc358762". The long term solution here is probably Dave Stevenson's series [1] that would give more flexibility. However, that is likely not a quick fix. For the short term, we'll look at the compatible of the next bridge in the chain and go back to the old way for the Parade PS8640 bridge chip. If it's found that other bridge chips also need this workaround then we can add them to the list or consider inverting the condition. However, the hope is that the framework will not take too much longer to land and we won't have to add anything other than ps8640 here. [1] https://lore.kernel.org/r/cover.1646406653.git.dave.stevenson@raspberrypi.com Fixes: 7d8e9a90509f ("drm/msm/dsi: move DSI host powerup to modeset time") Suggested-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Link: https://lore.kernel.org/r/20220513131504.v5.1.Ia196e35ad985059e77b038a41662faae9e26f411@changeid Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-05-18	nvme: add support for TP4084 - Time-to-Ready Enhancements	Christoph Hellwig
	Add support for using longer timeouts during controller initialization and letting the controller come up with namespaces that are not ready for I/O yet. We skip these not ready namespaces during scanning and only bring them online once anoter scan is kicked off by the AEN that is set when the NRDY bit gets set in the I/O Command Set Independent Identify Namespace Data Structure. This asynchronous probing avoids blocking the kernel boot when controllers take a very long time to recover after unclean shutdowns (up to minutes). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-05-18	Fix double fget() in vhost_net_set_backend()	Al Viro
	Descriptor table is a shared resource; two fget() on the same descriptor may return different struct file references. get_tap_ptr_ring() is called after we'd found (and pinned) the socket we'll be using and it tries to find the private tun/tap data structures associated with it. Redoing the lookup by the same file descriptor we'd used to get the socket is racy - we need to same struct file. Thanks to Jason for spotting a braino in the original variant of patch - I'd missed the use of fd == -1 for disabling backend, and in that case we can end up with sock == NULL and sock != oldsock. Cc: stable@kernel.org Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-05-18	vdpa/mlx5: Use consistent RQT size	Eli Cohen
	The current code evaluates RQT size based on the configured number of virtqueues. This can raise an issue in the following scenario: Assume MQ was negotiated. 1. mlx5_vdpa_set_map() gets called. 2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower than the configured max VQs. 3. A second set_map gets called, but now a smaller number of VQs is used to evaluate the size of the RQT. 4. handle_ctrl_mq() is called with a value larger than what the RQT can hold. This will emit errors and the driver state is compromised. To fix this, we use a new field in struct mlx5_vdpa_net to hold the required number of entries in the RQT. This value is evaluated in mlx5_vdpa_set_driver_features() where we have the negotiated features all set up. In addition to that, we take into consideration the max capability of RQT entries early when the device is added so we don't need to take consider it when creating the RQT. Last, we remove the use of mlx5_vdpa_max_qps() which just returns the max_vas / 2 and make the code clearer. Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-18	PCI: microchip: Fix potential race in interrupt handling	Daire McNamara
	Clear the MSI bit in ISTATUS_LOCAL register after reading it, but before reading and handling individual MSI bits from the ISTATUS_MSI register. This avoids a potential race where new MSI bits may be set on the ISTATUS_MSI register after it was read and be missed when the MSI bit in the ISTATUS_LOCAL register is cleared. ISTATUS_LOCAL is a read/write/clear register; the register's bits are set when the corresponding interrupt source is activated. Each source is independent and thus multiple sources may be active simultaneously. The processor can monitor and clear status bits. If one or more ISTATUS_LOCAL interrupt sources are active, the RootPort issues an interrupt towards the processor (on the AXI domain). Bit 28 of this register reports an MSI has been received by the RootPort. ISTATUS_MSI is a read/write/clear register. Bits 31-0 are asserted when an MSI with message number 31-0 is received by the RootPort. The processor must monitor and clear these bits. Effectively, Bit 28 of ISTATUS_LOCAL informs the processor that an MSI has arrived at the RootPort and ISTATUS_MSI informs the processor which MSI (in the range 0 - 31) needs handling. Reported by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ Link: https://lore.kernel.org/r/20220517141622.145581-1-daire.mcnamara@microchip.com Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2022-05-18	vfio/pci: Move the unused device into low power state with runtime PM	Abhishek Sahu
	Currently, there is very limited power management support available in the upstream vfio_pci_core based drivers. If there are no users of the device, then the PCI device will be moved into D3hot state by writing directly into PCI PM registers. This D3hot state help in saving power but we can achieve zero power consumption if we go into the D3cold state. The D3cold state cannot be possible with native PCI PM. It requires interaction with platform firmware which is system-specific. To go into low power states (including D3cold), the runtime PM framework can be used which internally interacts with PCI and platform firmware and puts the device into the lowest possible D-States. This patch registers vfio_pci_core based drivers with the runtime PM framework. 1. The PCI core framework takes care of most of the runtime PM related things. For enabling the runtime PM, the PCI driver needs to decrement the usage count and needs to provide 'struct dev_pm_ops' at least. The runtime suspend/resume callbacks are optional and needed only if we need to do any extra handling. Now there are multiple vfio_pci_core based drivers. Instead of assigning the 'struct dev_pm_ops' in individual parent driver, the vfio_pci_core itself assigns the 'struct dev_pm_ops'. There are other drivers where the 'struct dev_pm_ops' is being assigned inside core layer (For example, wlcore_probe() and some sound based driver, etc.). 2. This patch provides the stub implementation of 'struct dev_pm_ops'. The subsequent patch will provide the runtime suspend/resume callbacks. All the config state saving, and PCI power management related things will be done by PCI core framework itself inside its runtime suspend/resume callbacks (pci_pm_runtime_suspend() and pci_pm_runtime_resume()). 3. Inside pci_reset_bus(), all the devices in dev_set needs to be runtime resumed. vfio_pci_dev_set_pm_runtime_get() will take care of the runtime resume and its error handling. 4. Inside vfio_pci_core_disable(), the device usage count always needs to be decremented which was incremented in vfio_pci_core_enable(). 5. Since the runtime PM framework will provide the same functionality, so directly writing into PCI PM config register can be replaced with the use of runtime PM routines. Also, the use of runtime PM can help us in more power saving. In the systems which do not support D3cold, With the existing implementation: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3hot So, with runtime PM, the upstream bridge or root port will also go into lower power state which is not possible with existing implementation. In the systems which support D3cold, // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3cold // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3cold So, with runtime PM, both the PCI device and upstream bridge will go into D3cold state. 6. If 'disable_idle_d3' module parameter is set, then also the runtime PM will be enabled, but in this case, the usage count should not be decremented. 7. vfio_pci_dev_set_try_reset() return value is unused now, so this function return type can be changed to void. 8. Use the runtime PM API's in vfio_pci_core_sriov_configure(). The device can be in low power state either with runtime power management (when there is no user) or PCI_PM_CTRL register write by the user. In both the cases, the PF should be moved to D0 state. For preventing any runtime usage mismatch, pci_num_vf() has been called explicitly during disable. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-5-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18	vfio/pci: Virtualize PME related registers bits and initialize to zero	Abhishek Sahu
	If any PME event will be generated by PCI, then it will be mostly handled in the host by the root port PME code. For example, in the case of PCIe, the PME event will be sent to the root port and then the PME interrupt will be generated. This will be handled in drivers/pci/pcie/pme.c at the host side. Inside this, the pci_check_pme_status() will be called where PME_Status and PME_En bits will be cleared. So, the guest OS which is using vfio-pci device will not come to know about this PME event. To handle these PME events inside guests, we need some framework so that if any PME events will happen, then it needs to be forwarded to virtual machine monitor. We can virtualize PME related registers bits and initialize these bits to zero so vfio-pci device user will assume that it is not capable of asserting the PME# signal from any power state. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-4-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18	vfio/pci: Change the PF power state to D0 before enabling VFs	Abhishek Sahu
	According to [PCIe v5 9.6.2] for PF Device Power Management States "The PF's power management state (D-state) has global impact on its associated VFs. If a VF does not implement the Power Management Capability, then it behaves as if it is in an equivalent power state of its associated PF. If a VF implements the Power Management Capability, the Device behavior is undefined if the PF is placed in a lower power state than the VF. Software should avoid this situation by placing all VFs in lower power state before lowering their associated PF's power state." From the vfio driver side, user can enable SR-IOV when the PF is in D3hot state. If VF does not implement the Power Management Capability, then the VF will be actually in D3hot state and then the VF BAR access will fail. If VF implements the Power Management Capability, then VF will assume that its current power state is D0 when the PF is D3hot and in this case, the behavior is undefined. To support PF power management, we need to create power management dependency between PF and its VF's. The runtime power management support may help with this where power management dependencies are supported through device links. But till we have such support in place, we can disallow the PF to go into low power state, if PF has VF enabled. There can be a case, where user first enables the VF's and then disables the VF's. If there is no user of PF, then the PF can put into D3hot state again. But with this patch, the PF will still be in D0 state after disabling VF's since detecting this case inside vfio_pci_core_sriov_configure() requires access to struct vfio_device::open_count along with its locks. But the subsequent patches related to runtime PM will handle this case since runtime PM maintains its own usage count. Also, vfio_pci_core_sriov_configure() can be called at any time (with and without vfio pci device user), so the power state change and SR-IOV enablement need to be protected with the required locks. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-3-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18	vfio/pci: Invalidate mmaps and block the access in D3hot power state	Abhishek Sahu
	According to [PCIe v5 5.3.1.4.1] for D3hot state "Configuration and Message requests are the only TLPs accepted by a Function in the D3Hot state. All other received Requests must be handled as Unsupported Requests, and all received Completions may optionally be handled as Unexpected Completions." Currently, if the vfio PCI device has been put into D3hot state and if user makes non-config related read/write request in D3hot state, these requests will be forwarded to the host and this access may cause issues on a few systems. This patch leverages the memory-disable support added in commit 'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")' to generate page fault on mmap access and return error for the direct read/write. If the device is D3hot state, then the error will be returned for MMIO access. The IO access generally does not make the system unresponsive so the IO access can still happen in D3hot state. The default value should be returned in this case without bringing down the complete system. Also, the power related structure fields need to be protected so we can use the same 'memory_lock' to protect these fields also. This protection is mainly needed when user changes the PCI power state by writing into PCI_PM_CTRL register. vfio_lock_and_set_power_state() wrapper function will take the required locks and then it will invoke the vfio_pci_set_power_state(). Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-2-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18	random: credit architectural init the exact amount	Jason A. Donenfeld
	RDRAND and RDSEED can fail sometimes, which is fine. We currently initialize the RNG with 512 bits of RDRAND/RDSEED. We only need 256 bits of those to succeed in order to initialize the RNG. Instead of the current "all or nothing" approach, actually credit these contributions the amount that is actually contributed. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	random: handle latent entropy and command line from random_init()	Jason A. Donenfeld
	Currently, start_kernel() adds latent entropy and the command line to the entropy bool after the RNG has been initialized, deferring when it's actually used by things like stack canaries until the next time the pool is seeded. This surely is not intended. Rather than splitting up which entropy gets added where and when between start_kernel() and random_init(), just do everything in random_init(), which should eliminate these kinds of bugs in the future. While we're at it, rename the awkwardly titled "rand_initialize()" to the more standard "random_init()" nomenclature. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	random: use proper jiffies comparison macro	Jason A. Donenfeld
	This expands to exactly the same code that it replaces, but makes things consistent by using the same macro for jiffy comparisons throughout. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	random: remove ratelimiting for in-kernel unseeded randomness	Jason A. Donenfeld
	The CONFIG_WARN_ALL_UNSEEDED_RANDOM debug option controls whether the kernel warns about all unseeded randomness or just the first instance. There's some complicated rate limiting and comparison to the previous caller, such that even with CONFIG_WARN_ALL_UNSEEDED_RANDOM enabled, developers still don't see all the messages or even an accurate count of how many were missed. This is the result of basically parallel mechanisms aimed at accomplishing more or less the same thing, added at different points in random.c history, which sort of compete with the first-instance-only limiting we have now. It turns out, however, that nobody cares about the first unseeded randomness instance of in-kernel users. The same first user has been there for ages now, and nobody is doing anything about it. It isn't even clear that anybody _can_ do anything about it. Most places that can do something about it have switched over to using get_random_bytes_wait() or wait_for_random_bytes(), which is the right thing to do, but there is still much code that needs randomness sometimes during init, and as a geeneral rule, if you're not using one of the _wait functions or the readiness notifier callback, you're bound to be doing it wrong just based on that fact alone. So warning about this same first user that can't easily change is simply not an effective mechanism for anything at all. Users can't do anything about it, as the Kconfig text points out -- the problem isn't in userspace code -- and kernel developers don't or more often can't react to it. Instead, show the warning for all instances when CONFIG_WARN_ALL_UNSEEDED_RANDOM is set, so that developers can debug things need be, or if it isn't set, don't show a warning at all. At the same time, CONFIG_WARN_ALL_UNSEEDED_RANDOM now implies setting random.ratelimit_disable=1 on by default, since if you care about one you probably care about the other too. And we can clean up usage around the related urandom_warning ratelimiter as well (whose behavior isn't changing), so that it properly counts missed messages after the 10 message threshold is reached. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	random: move initialization out of reseeding hot path	Jason A. Donenfeld
	Initialization happens once -- by way of credit_init_bits() -- and then it never happens again. Therefore, it doesn't need to be in crng_reseed(), which is a hot path that is called multiple times. It also doesn't make sense to have there, as initialization activity is better associated with initialization routines. After the prior commit, crng_reseed() now won't be called by multiple concurrent callers, which means that we can safely move the "finialize_init" logic into crng_init_bits() unconditionally. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	random: avoid initializing twice in credit race	Jason A. Donenfeld
	Since all changes of crng_init now go through credit_init_bits(), we can fix a long standing race in which two concurrent callers of credit_init_bits() have the new bit count >= some threshold, but are doing so with crng_init as a lower threshold, checked outside of a lock, resulting in crng_reseed() or similar being called twice. In order to fix this, we can use the original cmpxchg value of the bit count, and only change crng_init when the bit count transitions from below a threshold to meeting the threshold. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	random: use symbolic constants for crng_init states	Jason A. Donenfeld
	crng_init represents a state machine, with three states, and various rules for transitions. For the longest time, we've been managing these with "0", "1", and "2", and expecting people to figure it out. To make the code more obvious, replace these with proper enum values representing the transition, and then redocument what each of these states mean. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	siphash: use one source of truth for siphash permutations	Jason A. Donenfeld
	The SipHash family of permutations is currently used in three places: - siphash.c itself, used in the ordinary way it was intended. - random32.c, in a construction from an anonymous contributor. - random.c, as part of its fast_mix function. Each one of these places reinvents the wheel with the same C code, same rotation constants, and same symmetry-breaking constants. This commit tidies things up a bit by placing macros for the permutations and constants into siphash.h, where each of the three .c users can access them. It also leaves a note dissuading more users of them from emerging. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	random: help compiler out with fast_mix() by using simpler arguments	Jason A. Donenfeld
	Now that fast_mix() has more than one caller, gcc no longer inlines it. That's fine. But it also doesn't handle the compound literal argument we pass it very efficiently, nor does it handle the loop as well as it could. So just expand the code to spell out this function so that it generates the same code as it did before. Performance-wise, this now behaves as it did before the last commit. The difference in actual code size on x86 is 45 bytes, which is less than a cache line. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	random: do not use input pool from hard IRQs	Jason A. Donenfeld
	Years ago, a separate fast pool was added for interrupts, so that the cost associated with taking the input pool spinlocks and mixing into it would be avoided in places where latency is critical. However, one oversight was that add_input_randomness() and add_disk_randomness() still sometimes are called directly from the interrupt handler, rather than being deferred to a thread. This means that some unlucky interrupts will be caught doing a blake2s_compress() call and potentially spinning on input_pool.lock, which can also be taken by unprivileged users by writing into /dev/urandom. In order to fix this, add_timer_randomness() now checks whether it is being called from a hard IRQ and if so, just mixes into the per-cpu IRQ fast pool using fast_mix(), which is much faster and can be done lock-free. A nice consequence of this, as well, is that it means hard IRQ context FPU support is likely no longer useful. The entropy estimation algorithm used by add_timer_randomness() is also somewhat different than the one used for add_interrupt_randomness(). The former looks at deltas of deltas of deltas, while the latter just waits for 64 interrupts for one bit or for one second since the last bit. In order to bridge these, and since add_interrupt_randomness() runs after an add_timer_randomness() that's called from hard IRQ, we add to the fast pool credit the related amount, and then subtract one to account for add_interrupt_randomness()'s contribution. A downside of this, however, is that the num argument is potentially attacker controlled, which puts a bit more pressure on the fast_mix() sponge to do more than it's really intended to do. As a mitigating factor, the first 96 bits of input aren't attacker controlled (a cycle counter followed by zeros), which means it's essentially two rounds of siphash rather than one, which is somewhat better. It's also not that much different from add_interrupt_randomness()'s use of the irq stack instruction pointer register. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Filipe Manana <fdmanana@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18	net: smc911x: Fix min() use in debug code	Geert Uytterhoeven
	If ENABLE_SMC_DEBUG_PKTS=1: drivers/net/ethernet/smsc/smc911x.c: In function ‘smc911x_hardware_send_pkt’: include/linux/minmax.h:20:28: error: comparison of distinct pointer types lacks a cast [-Werror] 20 \| (!!(sizeof((typeof(x) )1 == (typeof(y) )1))) \| ^~ drivers/net/ethernet/smsc/smc911x.c:483:17: note: in expansion of macro ‘min’ 483 \| PRINT_PKT(buf, min(len, 64)); Fix this by making the constant unsigned, to match the type of "len". While at it, replace the other missed ternary operator by min(), too. Convert the dummy PRINT_PKT() from a macro to a static inline function, to catch mistakes like this without having to enable debug options manually. Fixes: 5ff0348b7f755aac ("net: smc911x: replace ternary operator with min()") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18	net: ethernet: sunplus: add missing of_node_put() in spl2sw_mdio_init()	Yang Yingliang
	of_get_child_by_name() returns device node pointer with refcount incremented. The refcount should be decremented before returning from spl2sw_mdio_init(). Fixes: fd3040b9394c ("net: ethernet: Add driver for Sunplus SP7021") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Wells Lu <wellslutw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18	eth: sun: cassini: remove dead code	Martin Liška
	Fixes the following GCC warning: drivers/net/ethernet/sun/cassini.c:1316:29: error: comparison between two arrays [-Werror=array-compare] drivers/net/ethernet/sun/cassini.c:3783:34: error: comparison between two arrays [-Werror=array-compare] Note that 2 arrays should be compared by comparing of their addresses: note: use ‘&cas_prog_workaroundtab[0] == &cas_prog_null[0]’ to compare the addresses Signed-off-by: Martin Liska <mliska@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18	net: ftgmac100: Disable hardware checksum on AST2600	Joel Stanley
	The AST2600 when using the i210 NIC over NC-SI has been observed to produce incorrect checksum results with specific MTU values. This was first observed when sending data across a long distance set of networks. On a local network, the following test was performed using a 1MB file of random data. On the receiver run this script: #!/bin/bash while [ 1 ]; do # Zero the stats nstat -r > /dev/null nc -l 9899 > test-file # Check for checksum errors TcpInCsumErrors=$(nstat \| grep TcpInCsumErrors) if [ -z "$TcpInCsumErrors" ]; then echo No TcpInCsumErrors else echo TcpInCsumErrors = $TcpInCsumErrors fi done On an AST2600 system: # nc <IP of receiver host> 9899 < test-file The test was repeated with various MTU values: # ip link set mtu 1410 dev eth0 The observed results: 1500 - good 1434 - bad 1400 - good 1410 - bad 1420 - good The test was repeated after disabling tx checksumming: # ethtool -K eth0 tx-checksumming off And all MTU values tested resulted in transfers without error. An issue with the driver cannot be ruled out, however there has been no bug discovered so far. David has done the work to take the original bug report of slow data transfer between long distance connections and triaged it down to this test case. The vendor suspects this this is a hardware issue when using NC-SI. The fixes line refers to the patch that introduced AST2600 support. Reported-by: David Wilder <wilder@us.ibm.com> Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18	igb: skip phy status check where unavailable	Kevin Mitchell
	igb_read_phy_reg() will silently return, leaving phy_data untouched, if hw->ops.read_reg isn't set. Depending on the uninitialized value of phy_data, this led to the phy status check either succeeding immediately or looping continuously for 2 seconds before emitting a noisy err-level timeout. This message went out to the console even though there was no actual problem. Instead, first check if there is read_reg function pointer. If not, proceed without trying to check the phy status register. Fixes: b72f3f72005d ("igb: When GbE link up, wait for Remote receiver status condition") Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18	net: stmmac: remove unused get_addr() callback	Vincent Whitchurch
	The last caller of the stmmac_desc_ops::get_addr() callback was removed a while ago, so remove the unused callback. Note that the callback also only gets half the descriptor address on systems with 64-bit descriptor addresses, so that should be fixed if it needs to be resurrected later. Fixes: ec222003bd948de8f3 ("net: stmmac: Prepare to add Split Header support") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18	nfc: pn533: Fix buggy cleanup order	Lin Ma
	When removing the pn533 device (i2c or USB), there is a logic error. The original code first cancels the worker (flush_delayed_work) and then destroys the workqueue (destroy_workqueue), leaving the timer the last one to be deleted (del_timer). This result in a possible race condition in a multi-core preempt-able kernel. That is, if the cleanup (pn53x_common_clean) is concurrently run with the timer handler (pn533_listen_mode_timer), the timer can queue the poll_work to the already destroyed workqueue, causing use-after-free. This patch reorder the cleanup: it uses the del_timer_sync to make sure the handler is finished before the routine will destroy the workqueue. Note that the timer cannot be activated by the worker again. static void pn533_wq_poll(struct work_struct *work) ... rc = pn533_send_poll_frame(dev); if (rc) return; if (cur_mod->len == 0 && dev->poll_mod_count > 1) mod_timer(&dev->listen_timer, ...); That is, the mod_timer can be called only when pn533_send_poll_frame() returns no error, which is impossible because the device is detaching and the lower driver should return ENODEV code. Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18	rtc: gamecube: Add missing iounmap in gamecube_rtc_read_offset_from_sram	Yuan Can
	The hw_srnprot needs to be unmapped when gamecube_rtc_read_offset_from_sram returns. Fixs: 86559400b3ef9d (rtc: gamecube: Add a RTC driver for the GameCube, Wii and Wii U) Signed-off-by: Yuan Can <yuancan@huawei.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20220511071354.46202-1-yuancan@huawei.com
2022-05-18	pinctrl: intel: Drop unused irqchip member in struct intel_pinctrl	Andy Shevchenko
	There is no users of irqchip member in struct intel_pinctrl. Drop it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-05-18	pinctrl: intel: make irq_chip immutable	Andy Shevchenko
	Since recently, the kernel is nagging about mutable irq_chips: "not an immutable chip, please consider fixing it!" Drop the unneeded copy, flag it as IRQCHIP_IMMUTABLE, add the new helper functions and call the appropriate gpiolib functions. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-05-18	Merge tag 'nvme-5.19-2022-05-18' of git://git.infradead.org/nvme into ↵	Jens Axboe
	for-5.19/drivers Pull NVMe updates from Christoph: "nvme updates for Linux 5.19 - tighten the PCI presence check (Stefan Roese): - fix a potential NULL pointer dereference in an error path (Kyle Miller Smith) - fix interpretation of the DMRSL field (Tom Yan) - relax the data transfer alignment (Keith Busch) - verbose error logging improvements (Max Gurtovoy, Chaitanya Kulkarni) - misc cleanups (Chaitanya Kulkarni, me)" * tag 'nvme-5.19-2022-05-18' of git://git.infradead.org/nvme: nvme: split the enum used for various register constants nvme-fabrics: add a request timeout helper nvme-pci: harden drive presence detect in nvme_dev_disable() nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags nvme: mark internal passthru request RQF_QUIET nvme: remove unneeded include from constants file nvme: add missing status values to verbose logging nvme: set dma alignment to dword nvme: fix interpretation of DMRSL
2022-05-18	clocksource/drivers/timer-gxp: Add HPE GXP Timer	Nick Hawkins
	Add support for the HPE GXP SOC timer. The GXP supports several different kinds of timers but for the purpose of this driver there is only support for the General Timer. The timer has a 1us resolution and is 32 bits. The timer also creates a child watchdog device as the register region is the same. Signed-off-by: Nick Hawkins <nick.hawkins@hpe.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-18	watchdog: hpe-wdt: Introduce HPE GXP Watchdog	Nick Hawkins
	Add support for the HPE GXP Watchdog. The GXP asic contains a full complement of timers one of which is the watchdog timer. The watchdog timer is 16 bit and has 10ms resolution. The watchdog is created as a child device of timer since the same register range is used. Signed-off-by: Nick Hawkins <nick.hawkins@hpe.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Guenter Roeck <linux@roeck-us.net>
2022-05-18	Merge branch '100GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-05-17 This series contains updates to ice driver only. Arkadiusz prevents writing of timestamps when rings are being configured to resolve null pointer dereference. Paul changes a delayed call to baseline statistics to occur immediately which was causing misreporting of statistics due to the delay. Michal fixes incorrect restoration of interrupt moderation settings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-18	USB: serial: pl2303: fix type detection for odd device	Johan Hovold
	At least one pl2303 device has a bcdUSB of 1.0.1 which most likely was was intended as 1.1. Allow bcdDevice 1.0.1 but interpret it as 1.1. Fixes: 1e9faef4d26d ("USB: serial: pl2303: fix HX type detection") Cc: stable@vger.kernel.org # 5.13 Link: https://lore.kernel.org/linux-usb/CAJixRzqf4a9-ZKZDgWxicc_BpfdZVE9qqGmkiO7xEstOXUbGvQ@mail.gmail.com Reported-by: Gary van der Merwe <gary.vandermerwe@fnb.co.za> Link: https://lore.kernel.org/r/20220517161736.13313-1-johan@kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>