linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-08-26	drm/amdgpu: Clear RAS interrupt status on aldebaran	John Clements
	resolve register address issue for detecting/clearing RAS interrupt Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-26	drm/amdgpu: Add support for RAS XGMI err query	John Clements
	Update XGMI RAS to support error query on aldebaran Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-26	drm/amdkfd: Account for SH/SE count when setting up cu masks.	Sean Keely
	On systems with multiple SH per SE compute_static_thread_mgmt_se# is split into independent masks, one for each SH, in the upper and lower 16 bits. We need to detect this and apply cu masking to each SH. The cu mask bits are assigned first to each SE, then to alternate SHs, then finally to higher CU id. This ensures that the maximum number of SPIs are engaged as early as possible while balancing CU assignment to each SH. v2: Use max SH/SE rather than max SH in cu_per_sh. v3: Fix comment blocks, ensure se_mask is initially zero filled, and correctly assign se.sh.cu positions to unset bits in cu_mask. Signed-off-by: Sean Keely <Sean.Keely@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-26	Merge branches 'v5.15/vfio/spdx-license-cleanups', ↵	Alex Williamson
	'v5.15/vfio/dma-valid-waited-v3', 'v5.15/vfio/vfio-pci-core-v5' and 'v5.15/vfio/vfio-ap' into v5.15/vfio/next
2021-08-26	sched: Fix get_push_task() vs migrate_disable()	Sebastian Andrzej Siewior
	push_rt_task() attempts to move the currently running task away if the next runnable task has migration disabled and therefore is pinned on the current CPU. The current task is retrieved via get_push_task() which only checks for nr_cpus_allowed == 1, but does not check whether the task has migration disabled and therefore cannot be moved either. The consequence is a pointless invocation of the migration thread which correctly observes that the task cannot be moved. Return NULL if the task has migration disabled and cannot be moved to another CPU. Fixes: a7c81556ec4d3 ("sched: Fix migrate_disable() vs rt/dl balancing") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210826133738.yiotqbtdaxzjsnfj@linutronix.de
2021-08-26	media: ipu3-cio2: Drop reference on error path in cio2_bridge_connect_sensor()	Andy Shevchenko
	The commit 71f642833284 ("ACPI: utils: Fix reference counting in for_each_acpi_dev_match()") moved adev assignment outside of error path and hence made acpi_dev_put(sensor->adev) a no-op. We still need to drop reference count on error path, and to achieve that, replace sensor->adev by locally assigned adev. Fixes: 71f642833284 ("ACPI: utils: Fix reference counting in for_each_acpi_dev_match()") Depends-on: fc68f42aa737 ("ACPI: fix NULL pointer dereference") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-26	ASoC: soc-pcm: test refcount before triggering	Pierre-Louis Bossart
	On start/pause_release/resume, when more than one FE is connected to the same BE, it's possible that the trigger is sent more than once. This is not desirable, we only want to trigger a BE once, which is straightforward to implement with a refcount. For stop/pause/suspend, the problem is more complicated: the check implemented in snd_soc_dpcm_can_be_free_stop() may fail due to a conceptual deadlock when we trigger the BE before the FE. In this case, the FE states have not yet changed, so there are corner cases where the TRIGGER_STOP is never sent - the dual case of start where multiple triggers might be sent. This patch suggests an unconditional trigger in all cases, without checking the FE states, using a refcount protected by a spinlock. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Message-Id: <20210817164054.250028-3-pierre-louis.bossart@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-26	ASoC: soc-pcm: protect BE dailink state changes in trigger	Pierre-Louis Bossart
	When more than one FE is connected to a BE, e.g. in a mixing use case, the BE can be triggered multiple times when the FE are opened/started concurrently. This race condition is problematic in the case of SoundWire BE dailinks, and this is not desirable in a general case. The code carefully checks when the BE can be stopped or hw_free'ed, but the trigger code does not use any mutual exclusion. Fix by using the same spinlock already used to check FE states, and set the state before the trigger. In case of errors, the initial state will be restored. This patch does not change how the triggers are handled, it only makes sure the states are handled in critical sections. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Message-Id: <20210817164054.250028-2-pierre-louis.bossart@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-26	ASoC: wcd9335: Disable irq on slave ports in the remove function	Christophe JAILLET
	The probe calls 'wcd9335_setup_irqs()' to enable interrupts on all slave ports. This must be undone in the remove function. Add a 'wcd9335_teardown_irqs()' function that undoes 'wcd9335_setup_irqs()' function, and call it from the remove function. Fixes: 20aedafdf492 ("ASoC: wcd9335: add support to wcd9335 codec") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <8f761244d79bd4c098af8a482be9121d3a486d1b.1629091028.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-26	ASoC: wcd9335: Fix a memory leak in the error handling path of the probe ↵	Christophe JAILLET
	function If 'wcd9335_setup_irqs()' fails, me must release the memory allocated in 'wcd_clsh_ctrl_alloc()', as already done in the remove function. Add an error handling path and the missing 'wcd_clsh_ctrl_free()' call. Fixes: 20aedafdf492 ("ASoC: wcd9335: add support to wcd9335 codec") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <6dc12372f09fabb70bf05941dbe6a1382dc93e43.1629091028.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-26	ASoC: wcd9335: Fix a double irq free in the remove function	Christophe JAILLET
	There is no point in calling 'free_irq()' explicitly for 'WCD9335_IRQ_SLIMBUS' in the remove function. The irqs are requested in 'wcd9335_setup_irqs()' using a resource managed function (i.e. 'devm_request_threaded_irq()'). 'wcd9335_setup_irqs()' requests all what is defined in the 'wcd9335_irqs' structure. This structure has only one entry for 'WCD9335_IRQ_SLIMBUS'. So 'devm_request...irq()' + explicit 'free_irq()' would lead to a double free. Remove the unneeded 'free_irq()' from the remove function. Fixes: 20aedafdf492 ("ASoC: wcd9335: add support to wcd9335 codec") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <0614d63bc00edd7e81dd367504128f3d84f72efa.1629091028.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-26	vfio/pci: Introduce vfio_pci_core.ko	Max Gurtovoy
	Now that vfio_pci has been split into two source modules, one focusing on the "struct pci_driver" (vfio_pci.c) and a toolbox library of code (vfio_pci_core.c), complete the split and move them into two different kernel modules. As before vfio_pci.ko continues to present the same interface under sysfs and this change will have no functional impact. Splitting into another module and adding exports allows creating new HW specific VFIO PCI drivers that can implement device specific functionality, such as VFIO migration interfaces or specialized device requirements. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-14-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio: Use kconfig if XX/endif blocks instead of repeating 'depends on'	Jason Gunthorpe
	This results in less kconfig wordage and a simpler understanding of the required "depends on" to create the menu structure. The next patch increases the nesting level a lot so this is a nice preparatory simplification. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-13-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio: Use select for eventfd	Jason Gunthorpe
	If VFIO_VIRQFD is required then turn on eventfd automatically. The majority of kconfig users of the EVENTFD use select not depends on. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-12-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	PCI / VFIO: Add 'override_only' support for VFIO PCI sub system	Max Gurtovoy
	Expose an 'override_only' helper macro (i.e. PCI_DRIVER_OVERRIDE_DEVICE_VFIO) for VFIO PCI sub system and add the required code to prefix its matching entries with "vfio_" in modules.alias file. It allows VFIO device drivers to include match entries in the modules.alias file produced by kbuild that are not used for normal driver autoprobing and module autoloading. Drivers using these match entries can be connected to the PCI device manually, by userspace, using the existing driver_override sysfs. For example the resulting modules.alias may have: alias pci:v000015B3d00001021svsdbcsci* mlx5_core alias vfio_pci:v000015B3d00001021svsdbcsci* mlx5_vfio_pci alias vfio_pci:vdsvsdbcsci* vfio_pci In this example mlx5_core and mlx5_vfio_pci match to the same PCI device. The kernel will autoload and autobind to mlx5_core but the kernel and udev mechanisms will ignore mlx5_vfio_pci. When userspace wants to change a device to the VFIO subsystem it can implement a generic algorithm: 1) Identify the sysfs path to the device: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 2) Get the modalias string from the kernel: $ cat /sys/bus/pci/devices/0000:01:00.0/modalias pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 3) Prefix it with vfio_: vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 4) Search modules.alias for the above string and select the entry that has the fewest 's: alias vfio_pci:v000015B3d00001021svsdbcsci mlx5_vfio_pci 5) modprobe the matched module name: $ modprobe mlx5_vfio_pci 6) cat the matched module name to driver_override: echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override 7) unbind device from original module echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind 8) probe PCI drivers (or explicitly bind to mlx5_vfio_pci) echo 0000:01:00.0 > /sys/bus/pci/drivers_probe The algorithm is independent of bus type. In future the other buses with VFIO device drivers, like platform and ACPI, can use this algorithm as well. This patch is the infrastructure to provide the information in the modules.alias to userspace. Convert the only VFIO pci_driver which results in one new line in the modules.alias: alias vfio_pci:vdsvsdbcsci* vfio_pci Later series introduce additional HW specific VFIO PCI drivers, such as mlx5_vfio_pci. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # for pci.h Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-11-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	PCI: Add 'override_only' field to struct pci_device_id	Max Gurtovoy
	Add 'override_only' field to struct pci_device_id to be used as part of pci_match_device(). When set, a driver only matches the entry when dev->driver_override is set to that driver. In addition, add a helper macro named 'PCI_DEVICE_DRIVER_OVERRIDE' to enable setting some data on it. Next patch from this series will use the above functionality. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-10-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Move module parameters to vfio_pci.c	Yishai Hadas
	This is a preparation before splitting vfio_pci.ko to 2 modules. As module parameters are a kind of uAPI they need to stay on vfio_pci.ko to avoid a user visible impact. For now continue to keep the implementation of these options in vfio_pci_core.c. Arguably they are vfio_pci functionality, but further splitting of vfio_pci_core.c will be better done in another series Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-9-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Move igd initialization to vfio_pci.c	Max Gurtovoy
	igd is related to the vfio_pci pci_driver implementation, move it out of vfio_pci_core.c. This is preparation for splitting vfio_pci.ko into 2 drivers. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-8-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Split the pci_driver code out of vfio_pci_core.c	Max Gurtovoy
	Split the vfio_pci driver into two logical parts, the 'struct pci_driver' (vfio_pci.c) which implements "Generic VFIO support for any PCI device" and a library of code (vfio_pci_core.c) that helps implementing a struct vfio_device on top of a PCI device. vfio_pci.ko continues to present the same interface under sysfs and this change should have no functional impact. Following patches will turn vfio_pci and vfio_pci_core into a separate module. This is a preparation for allowing another module to provide the pci_driver and allow that module to customize how VFIO is setup, inject its own operations, and easily extend vendor specific functionality. At this point the vfio_pci_core still contains a lot of vfio_pci functionality mixed into it. Following patches will move more of the large scale items out, but another cleanup series will be needed to get everything. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-7-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Include vfio header in vfio_pci_core.h	Max Gurtovoy
	The vfio_device structure is embedded into the vfio_pci_core_device structure, so there is no reason for not including the header file in the vfio_pci_core header as well. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-6-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Rename ops functions to fit core namings	Max Gurtovoy
	This is another preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Rename vfio_pci_device to vfio_pci_core_device	Max Gurtovoy
	This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. The new vfio_pci_core_device structure will be the main structure of the core driver and later on vfio_pci_device structure will be the main structure of the generic vfio_pci driver. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Rename vfio_pci_private.h to vfio_pci_core.h	Max Gurtovoy
	This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Rename vfio_pci.c to vfio_pci_core.c	Max Gurtovoy
	This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	Merge tag 'timers-v5.15' of ↵	Thomas Gleixner
	https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull timer driver updates from Daniel Lezcano: - Prioritize the ARM architected timer on Exynos platform when the architecture is ARM64 (Will Deacon) - Mark the Exynos timer as a per CPU timer (Will Deacon) - DT conversion to yaml for the rockchip platform (Ezequiel Garcia) - Fix IRQ setup if there are two channels on the sh_cmt timer (Phong Hoang) - Use bitfield helper macros in the Ingenic timer (Zhou Yanjie) - Clear any pending interrupt to prevent an abort of the suspend on the Mediatek platform (Fengquan Chen) - Add DT bindings for new Ingenic SoCs (Zhou Yanjie) Link: https://lore.kernel.org/r/c14ad27a-b1c6-6043-0f5e-71dd984bb4ba@linaro.org
2021-08-26	iomap: standardize tracepoint formatting and storage	Darrick J. Wong
	Print all the offset, pos, and length quantities in hexadecimal. While we're at it, update the types of the tracepoint structure fields to match the types of the values being recorded in them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-08-26	md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard	Xiao Ni
	We are seeing the following warning in raid10_handle_discard. [ 695.110751] ============================= [ 695.131439] WARNING: suspicious RCU usage [ 695.151389] 4.18.0-319.el8.x86_64+debug #1 Not tainted [ 695.174413] ----------------------------- [ 695.192603] drivers/md/raid10.c:1776 suspicious rcu_dereference_check() usage! [ 695.225107] other info that might help us debug this: [ 695.260940] rcu_scheduler_active = 2, debug_locks = 1 [ 695.290157] no locks held by mkfs.xfs/10186. In the first loop of function raid10_handle_discard. It already determines which disk need to handle discard request and add the rdev reference count rdev->nr_pending. So the conf->mirrors will not change until all bios come back from underlayer disks. It doesn't need to use rcu_dereference to get rdev. Cc: stable@vger.kernel.org Fixes: d30588b2731f ('md/raid10: improve raid10 discard request') Signed-off-by: Xiao Ni <xni@redhat.com> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-08-26	ALSA: hda: Disable runtime resume at shutdown	Takashi Iwai
	Although we modified the codec shutdown callback to perform runtime-suspend, it's still not fully effective, as this may be resumed again at any time later. For fixing such an unwanted resume, this patch replaces pm_runtime_suspend() with pm_runtime_force_suspend(), and call pm_runtime_disable() afterward. It assures to keep the device suspended. Also for code simplification, we apply the code unconditionally; when it's been already suspended, nothing would happen by calls of snd_pcm_suspend_all() and pm_runtime_force_suspend(), just proceed to pm_runtime_disable(). Fixes: b98444ed597d ("ALSA: hda: Suspend codec at shutdown") Reported-and-tested-by: Vitaly Rodionov <vitalyr@opensource.cirrus.com> Link: https://lore.kernel.org/r/20210826154752.25674-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-08-26	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Jakub Kicinski
	Alexei Starovoitov says: ==================== bpf 2021-08-26 We've added 1 non-merge commit during the last 1 day(s): 1) Fix ringbuf helper function compatibility, from Daniel. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix ringbuf helper function compatibility ==================== Link: https://lore.kernel.org/r/20210826153720.19083-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	signal/seccomp: Refactor seccomp signal and coredump generation	Eric W. Biederman
	Factor out force_sig_seccomp from the seccomp signal generation and place it in kernel/signal.c. The function force_sig_seccomp takes a parameter force_coredump to indicate that the sigaction field should be reset to SIGDFL so that a coredump will be generated when the signal is delivered. force_sig_seccomp is then used to replace both seccomp_send_sigsys and seccomp_init_siginfo. force_sig_info_to_task gains an extra parameter to force using the default signal action. With this change seccomp is no longer a special case and there becomes exactly one place do_coredump is called from. Further it no longer becomes necessary for __seccomp_filter to call do_group_exit. Acked-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87r1gr6qc4.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-08-26	RDMA/hns: Delete unnecessary blank lines.	Xinhao Liu
	Just delete unnecessary blank lines. Link: https://lore.kernel.org/r/1629985056-57004-8-git-send-email-liangwenpeng@huawei.com Signed-off-by: Xinhao Liu <liuxinhao5@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Encapsulate the qp db as a function	Yixing Liu
	Encapsulate qp db into two functions: user and kernel. Link: https://lore.kernel.org/r/1629985056-57004-7-git-send-email-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Adjust the order in which irq are requested and enabled	Wenpeng Liang
	It should first alloc workqueue and request irq, and finally enable irq. Link: https://lore.kernel.org/r/1629985056-57004-6-git-send-email-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Remove RST2RST error prints for hw v1	Weihang Li
	There is no need to prints error for hw_v1. Link: https://lore.kernel.org/r/1629985056-57004-5-git-send-email-liangwenpeng@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Remove dqpn filling when modify qp from Init to Init	Wenpeng Liang
	According to the IB specification, the destination qpn is allowed to be filled into the qpc only when the qp transitions from Init to RTR, so this code is unused. Link: https://lore.kernel.org/r/1629985056-57004-4-git-send-email-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Fix QP's resp incomplete assignment	Wenpeng Liang
	The resp passed to the user space represents the enable flag of qp, incomplete assignment will cause some features of the user space to be disabled. Fixes: 90ae0b57e4a5 ("RDMA/hns: Combine enable flags of qp") Fixes: aba457ca890c ("RDMA/hns: Support owner mode doorbell") Link: https://lore.kernel.org/r/1629985056-57004-3-git-send-email-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	RDMA/hns: Fix query destination qpn	Wenpeng Liang
	The bit width of dqpn is 24 bits, using u8 will cause truncation error. Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/1629985056-57004-2-git-send-email-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26	signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die	Eric W. Biederman
	In the fpsp040 code when copyin or copyout fails call force_sigsegv(SIGSEGV) instead of do_exit(SIGSEGV). This solves a couple of problems. Because do_exit embeds the ptrace stop PTRACE_EVENT_EXIT a complete stack frame needs to be present for that to work correctly. There is always the information needed for a ptrace stop where get_signal is called. So exiting with a signal solves the ptrace issue. Further exiting with a signal ensures that all of the threads in a process are killed not just the thread that malfunctioned. Which avoids confusing userspace. To make force_sigsegv(SIGSEGV) work in fpsp040_die modify the code to save all of the registers and jump to ret_from_exception (which ultimately calls get_signal) after fpsp040_die returns. v2: Updated the branches to use gas's pseudo ops that automatically calculate the best branch instruction to use for the purpose. v1: https://lkml.kernel.org/r/87a6m8kgtx.fsf_-_@disp2133 Link: https://lkml.kernel.org/r/87tukghjfs.fsf_-_@disp2133 Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-08-27	powerpc/pseries/iommu: Rename "direct window" to "dma window"	Leonardo Bras
	A previous change introduced the usage of DDW as a bigger indirect DMA mapping when the DDW available size does not map the whole partition. As most of the code that manipulates direct mappings was reused for indirect mappings, it's necessary to rename all names and debug/info messages to reflect that it can be used for both kinds of mapping. This should cause no behavioural change, just adjust naming. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-12-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Make use of DDW for indirect mapping	Leonardo Bras
	So far it's assumed possible to map the guest RAM 1:1 to the bus, which works with a small number of devices. SRIOV changes it as the user can configure hundreds VFs and since phyp preallocates TCEs and does not allow IOMMU pages bigger than 64K, it has to limit the number of TCEs per a PE to limit waste of physical pages. As of today, if the assumed direct mapping is not possible, DDW creation is skipped and the default DMA window "ibm,dma-window" is used instead. By using DDW, indirect mapping can get more TCEs than available for the default DMA window, and also get access to using much larger pagesizes (16MB as implemented in qemu vs 4k from default DMA window), causing a significant increase on the maximum amount of memory that can be IOMMU mapped at the same time. Indirect mapping will only be used if direct mapping is not a possibility. For indirect mapping, it's necessary to re-create the iommu_table with the new DMA window parameters, so iommu_alloc() can use it. Removing the default DMA window for using DDW with indirect mapping is only allowed if there is no current IOMMU memory allocated in the iommu_table. enable_ddw() is aborted otherwise. Even though there won't be both direct and indirect mappings at the same time, we can't reuse the DIRECT64_PROPNAME property name, or else an older kexec()ed kernel can assume direct mapping, and skip iommu_alloc(), causing undesirable behavior. So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info" was created to represent a DDW that does not allow direct mapping. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-11-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Find existing DDW with given property name	Leonardo Bras
	At the moment pseries stores information about created directly mapped DDW window in DIRECT64_PROPNAME. With the objective of implementing indirect DMA mapping with DDW, it's necessary to have another propriety name to make sure kexec'ing into older kernels does not break, as it would if we reuse DIRECT64_PROPNAME. In order to have this, find_existing_ddw_windows() needs to be able to look for different property names. Extract find_existing_ddw_windows() into find_existing_ddw_windows_named() and calls it with current property name. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-10-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Update remove_dma_window() to accept property name	Leonardo Bras
	Update remove_dma_window() so it can be used to remove DDW with a given property name. This enables the creation of new property names for DDW, so we can have different usage for it, like indirect mapping. Also, add return values to it so we can check if the property was found while removing the active DDW. This allows skipping the remaining property names while reducing the impact of multiple property names. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-9-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper	Leonardo Bras
	Add a new helper _iommu_table_setparms(), and use it in iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated code. Also, setting tbl->it_ops was happening outsite iommu_table_setparms(), so move it to the new helper. Since we need the iommu_table_ops to be declared before used, declare iommu_table_lpar_multi_ops and iommu_table_pseries_ops to before their respective iommu_table_setparms(). Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-8-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()	Leonardo Bras
	Code used to create a ddw property that was previously scattered in enable_ddw() is now gathered in ddw_property_create(), which deals with allocation and filling the property, letting it ready for of_property_add(), which now occurs in sequence. This created an opportunity to reorganize the second part of enable_ddw(): Without this patch enable_ddw() does, in order: kzalloc() property & members, create_ddw(), fill ddwprop inside property, ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory, of_add_property(), and list_add(). With this patch enable_ddw() does, in order: create_ddw(), ddw_property_create(), of_add_property(), ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory, and list_add(). This change requires of_remove_property() in case anything fails after of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk in all memory, which looks the most expensive operation, only if everything else succeeds. Also, the error path got remove_ddw() replaced by a new helper __remove_dma_window(), which only removes the new DDW with an rtas-call. For this, a new helper clean_dma_window() was needed to clean anything that could left if walk_system_ram_range() fails. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-7-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Allow DDW windows starting at 0x00	Leonardo Bras
	enable_ddw() currently returns the address of the DMA window, which is considered invalid if has the value 0x00. Also, it only considers valid an address returned from find_existing_ddw if it's not 0x00. Changing this behavior makes sense, given the users of enable_ddw() only need to know if direct mapping is possible. It can also allow a DMA window starting at 0x00 to be used. This will be helpful for using a DDW with indirect mapping, as the window address will be different than 0x00, but it will not map the whole partition. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-6-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Add ddw_list_new_entry() helper	Leonardo Bras
	There are two functions creating direct_window_list entries in a similar way, so create a ddw_list_new_entry() to avoid duplicity and simplify those functions. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-5-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper	Leonardo Bras
	Creates a helper to allow allocating a new iommu_table without the need to reallocate the iommu_group. This will be helpful for replacing the iommu_table for the new DMA window, after we remove the old one with iommu_tce_table_put(). Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-4-leobras.c@gmail.com
2021-08-27	powerpc/kernel/iommu: Add new iommu_table_in_use() helper	Leonardo Bras
	Having a function to check if the iommu table has any allocation helps deciding if a tbl can be reset for using a new DMA window. It should be enough to replace all instances of !bitmap_empty(tbl...). iommu_table_in_use() skips reserved memory, so we don't need to worry about releasing it before testing. This causes iommu_table_release_pages() to become unnecessary, given it is only used to remove reserved memory for testing. Also, only allow storing reserved memory values in tbl if they are valid in the table, so there is no need to check it in the new helper. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-3-leobras.c@gmail.com
2021-08-27	powerpc/pseries/iommu: Replace hard-coded page shift	Leonardo Bras
	Some functions assume IOMMU page size can only be 4K (pageshift == 12). Update them to accept any page size passed, so we can use 64K pages. In the process, some defines like TCE_SHIFT were made obsolete, and then removed. IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show a RPN of 52-bit, and considers a 12-bit pageshift, so there should be no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn. It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and tce_buildmulti_pSeriesLP(). Most places had a tbl struct, so using tbl->it_page_shift was simple. tce_free_pSeriesLP() was a special case, since callers not always have a tbl struct, so adding a tceshift parameter seems the right thing to do. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-2-leobras.c@gmail.com
2021-08-27	powerpc/numa: Update cpu_cpu_map on CPU online/offline	Srikar Dronamraju
	cpu_cpu_map holds all the CPUs in the DIE. However in PowerPC, when onlining/offlining of CPUs, this mask doesn't get updated. This mask is however updated when CPUs are added/removed. So when both operations like online/offline of CPUs and adding/removing of CPUs are done simultaneously, then cpumaps end up broken. WARNING: CPU: 13 PID: 1142 at kernel/sched/topology.c:898 build_sched_domains+0xd48/0x1720 Modules linked in: rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 13 PID: 1142 Comm: kworker/13:2 Not tainted 5.13.0-rc6+ #28 Workqueue: events cpuset_hotplug_workfn NIP: c0000000001caac8 LR: c0000000001caac4 CTR: 00000000007088ec REGS: c00000005596f220 TRAP: 0700 Not tainted (5.13.0-rc6+) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 48828222 XER: 00000009 CFAR: c0000000001ea698 IRQMASK: 0 GPR00: c0000000001caac4 c00000005596f4c0 c000000001c4a400 0000000000000036 GPR04: 00000000fffdffff c00000005596f1d0 0000000000000027 c0000018cfd07f90 GPR08: 0000000000000023 0000000000000001 0000000000000027 c0000018fe68ffe8 GPR12: 0000000000008000 c00000001e9d1880 c00000013a047200 0000000000000800 GPR16: c000000001d3c7d0 0000000000000240 0000000000000048 c000000010aacd18 GPR20: 0000000000000001 c000000010aacc18 c00000013a047c00 c000000139ec2400 GPR24: 0000000000000280 c000000139ec2520 c000000136c1b400 c000000001c93060 GPR28: c00000013a047c20 c000000001d3c6c0 c000000001c978a0 000000000000000d NIP [c0000000001caac8] build_sched_domains+0xd48/0x1720 LR [c0000000001caac4] build_sched_domains+0xd44/0x1720 Call Trace: [c00000005596f4c0] [c0000000001caac4] build_sched_domains+0xd44/0x1720 (unreliable) [c00000005596f670] [c0000000001cc5ec] partition_sched_domains_locked+0x3ac/0x4b0 [c00000005596f710] [c0000000002804e4] rebuild_sched_domains_locked+0x404/0x9e0 [c00000005596f810] [c000000000283e60] rebuild_sched_domains+0x40/0x70 [c00000005596f840] [c000000000284124] cpuset_hotplug_workfn+0x294/0xf10 [c00000005596fc60] [c000000000175040] process_one_work+0x290/0x590 [c00000005596fd00] [c0000000001753c8] worker_thread+0x88/0x620 [c00000005596fda0] [c000000000181704] kthread+0x194/0x1a0 [c00000005596fe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70 Instruction dump: 485af049 60000000 2fa30800 409e0028 80fe0000 e89a00f8 e86100e8 38da0120 7f88e378 7ce53b78 4801fb91 60000000 <0fe00000> 39000000 38e00000 38c00000 Fix this by updating cpu_cpu_map aka cpumask_of_node() on every CPU online/offline. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100521.412639-5-srikar@linux.vnet.ibm.com