linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-08-20	KVM: stats: Add halt_wait_ns stats for all architectures	Jing Zhang
	Add simple stats halt_wait_ns to record the time a VCPU has spent on waiting for all architectures (not just powerpc). Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-5-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20	KVM: stats: Support linear and logarithmic histogram statistics	Jing Zhang
	Add new types of KVM stats, linear and logarithmic histogram. Histogram are very useful for observing the value distribution of time or size related stats. Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-2-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20	KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range	Maxim Levitsky
	This together with previous patch, ensures that kvm_zap_gfn_range doesn't race with page fault running on another vcpu, and will make this page fault code retry instead. This is based on a patch suggested by Sean Christopherson: https://lkml.org/lkml/2021/7/22/1025 Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20	kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE	Marco Elver
	Originally the addr != NULL check was meant to take care of the case where __kfence_pool == NULL (KFENCE is disabled). However, this does not work for addresses where addr > 0 && addr < KFENCE_POOL_SIZE. This can be the case on NULL-deref where addr > 0 && addr < PAGE_SIZE or any other faulting access with addr < KFENCE_POOL_SIZE. While the kernel would likely crash, the stack traces and report might be confusing due to double faults upon KFENCE's attempt to unprotect such an address. Fix it by just checking that __kfence_pool != NULL instead. Link: https://lkml.kernel.org/r/20210818130300.2482437-1-elver@google.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Acked-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> [5.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-20	mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim	Johannes Weiner
	We've noticed occasional OOM killing when memory.low settings are in effect for cgroups. This is unexpected and undesirable as memory.low is supposed to express non-OOMing memory priorities between cgroups. The reason for this is proportional memory.low reclaim. When cgroups are below their memory.low threshold, reclaim passes them over in the first round, and then retries if it couldn't find pages anywhere else. But when cgroups are slightly above their memory.low setting, page scan force is scaled down and diminished in proportion to the overage, to the point where it can cause reclaim to fail as well - only in that case we currently don't retry, and instead trigger OOM. To fix this, hook proportional reclaim into the same retry logic we have in place for when cgroups are skipped entirely. This way if reclaim fails and some cgroups were scanned with diminished pressure, we'll try another full-force cycle before giving up and OOMing. [akpm@linux-foundation.org: coding-style fixes] Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Leon Yang <lnyng@fb.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-20	tracing: Add a probe that attaches to trace events	Tzvetomir Stoyanov (VMware)
	A new dynamic event is introduced: event probe. The event is attached to an existing tracepoint and uses its fields as arguments. The user can specify custom format string of the new event, select what tracepoint arguments will be printed and how to print them. An event probe is created by writing configuration string in 'dynamic_events' ftrace file: e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe -:SNAME/ENAME - Delete an event probe Where: SNAME - System name, if omitted 'eprobes' is used. ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used. SYSTEM - Name of the system, where the tracepoint is defined, mandatory. EVENT - Name of the tracepoint event in SYSTEM, mandatory. FETCHARGS - Arguments: <name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print it as given TYPE with given name. Supported types are: (u8/u16/u32/u64/s8/s16/s32/s64), basic type (x8/x16/x32/x64), hexadecimal types "string", "ustring" and bitfield. Example, attach an event probe on openat system call and print name of the file that will be opened: echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events A new dynamic event is created in events/esys/eopen/ directory. It can be deleted with: echo "-:esys/eopen" >> dynamic_events Filters, triggers and histograms can be attached to the new event, it can be matched in synthetic events. There is one limitation - an event probe can not be attached to kprobe, uprobe or another event probe. Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-20	SUNRPC: Move client-side disconnect injection	Chuck Lever
	Disconnect injection stress-tests the ability for both client and server implementations to behave resiliently in the face of network instability. Convert the existing client-side disconnect injection infrastructure to use the kernel's generic error injection facility. The generic facility has a richer set of injection criteria. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-20	Merge branches 'apple/dart', 'arm/smmu', 'iommu/fixes', 'x86/amd', ↵	Joerg Roedel
	'x86/vt-d' and 'core' into next
2021-08-20	iommu/io-pgtable: Abstract iommu_iotlb_gather access	Robin Murphy
	Previously io-pgtable merely passed the iommu_iotlb_gather pointer through to helpers, but now it has grown its own direct dereference. This turns out to break the build for !IOMMU_API configs where the structure only has a dummy definition. It will probably also crash drivers who don't use the gather mechanism and simply pass in NULL. Wrap this dereference in a suitable helper which can both be stubbed out for !IOMMU_API and encapsulate a NULL check otherwise. Fixes: 7a7c5badf858 ("iommu: Indicate queued flushes via gather data") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/83672ee76f6405c82845a55c148fa836f56fbbc1.1629465282.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-20	Merge branch kvm-arm64/generic-entry into kvmarm-master/next	Marc Zyngier
	Switch KVM/arm64 to the generic entry code, courtesy of Oliver Upton * kvm-arm64/generic-entry: KVM: arm64: Use generic KVM xfer to guest work function entry: KVM: Allow use of generic KVM entry w/o full generic support KVM: arm64: Record number of signal exits as a vCPU stat Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20	ARM: 9114/1: oabi-compat: rework sys_semtimedop emulation	Arnd Bergmann
	sys_oabi_semtimedop() is one of the last users of set_fs() on Arm. To remove this one, expose the internal code of the actual implementation that operates on a kernel pointer and call it directly after copying. There should be no measurable impact on the normal execution of this function, and it makes the overly long function a little shorter, which may help readability. While reworking the oabi version, make it behave a little more like the native one, using kvmalloc_array() and restructure the code flow in a similar way. The naming of __do_semtimedop() is not very good, I hope someone can come up with a better name. One regression was spotted by kernel test robot <rong.a.chen@intel.com> and fixed before the first mailing list submission. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20	ARM: 9108/1: oabi-compat: rework epoll_wait/epoll_pwait emulation	Arnd Bergmann
	The epoll_wait() system call wrapper is one of the remaining users of the set_fs() infrasturcture for Arm. Changing it to not require set_fs() is rather complex unfortunately. The approach I'm taking here is to allow architectures to override the code that copies the output to user space, and let the oabi-compat implementation check whether it is getting called from an EABI or OABI system call based on the thread_info->syscall value. The in_oabi_syscall() check here mirrors the in_compat_syscall() and in_x32_syscall() helpers for 32-bit compat implementations on other architectures. Overall, the amount of code goes down, at least with the newly added sys_oabi_epoll_pwait() helper getting removed again. The downside is added complexity in the source code for the native implementation. There should be no difference in runtime performance except for Arm kernels with CONFIG_OABI_COMPAT enabled that now have to go through an external function call to check which of the two variants to use. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20	Merge branch 'sched/core'	Peter Zijlstra

2021-08-20	sched: Introduce dl_task_check_affinity() to check proposed affinity	Will Deacon
	In preparation for restricting the affinity of a task during execve() on arm64, introduce a new dl_task_check_affinity() helper function to give an indication as to whether the restricted mask is admissible for a deadline task. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lore.kernel.org/r/20210730112443.23245-10-will@kernel.org
2021-08-20	sched: Allow task CPU affinity to be restricted on asymmetric systems	Will Deacon
	Asymmetric systems may not offer the same level of userspace ISA support across all CPUs, meaning that some applications cannot be executed by some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do not feature support for 32-bit applications on both clusters. Although userspace can carefully manage the affinity masks for such tasks, one place where it is particularly problematic is execve() because the CPU on which the execve() is occurring may be incompatible with the new application image. In such a situation, it is desirable to restrict the affinity mask of the task and ensure that the new image is entered on a compatible CPU. From userspace's point of view, this looks the same as if the incompatible CPUs have been hotplugged off in the task's affinity mask. Similarly, if a subsequent execve() reverts to a compatible image, then the old affinity is restored if it is still valid. In preparation for restricting the affinity mask for compat tasks on arm64 systems without uniform support for 32-bit applications, introduce {force,relax}_compatible_cpus_allowed_ptr(), which respectively restrict and restore the affinity mask for a task based on the compatible CPUs. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20210730112443.23245-9-will@kernel.org
2021-08-20	sched: Introduce task_struct::user_cpus_ptr to track requested affinity	Will Deacon
	In preparation for saving and restoring the user-requested CPU affinity mask of a task, add a new cpumask_t pointer to 'struct task_struct'. If the pointer is non-NULL, then the mask is copied across fork() and freed on task exit. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Link: https://lore.kernel.org/r/20210730112443.23245-7-will@kernel.org
2021-08-20	cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()	Will Deacon
	select_fallback_rq() only needs to recheck for an allowed CPU if the affinity mask of the task has changed since the last check. Return a 'bool' from cpuset_cpus_allowed_fallback() to indicate whether the affinity mask was updated, and use this to elide the allowed check when the mask has been left alone. No functional change. Suggested-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lore.kernel.org/r/20210730112443.23245-5-will@kernel.org
2021-08-20	cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()	Will Deacon
	Asymmetric systems may not offer the same level of userspace ISA support across all CPUs, meaning that some applications cannot be executed by some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do not feature support for 32-bit applications on both clusters. Modify guarantee_online_cpus() to take task_cpu_possible_mask() into account when trying to find a suitable set of online CPUs for a given task. This will avoid passing an invalid mask to set_cpus_allowed_ptr() during ->attach() and will subsequently allow the cpuset hierarchy to be taken into account when forcefully overriding the affinity mask for a task which requires migration to a compatible CPU. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Link: https://lkml.kernel.org/r/20210730112443.23245-4-will@kernel.org
2021-08-20	cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1	Will Deacon
	If the scheduler cannot find an allowed CPU for a task, cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask if cgroup v1 is in use. In preparation for allowing architectures to provide their own fallback mask, just return early if we're either using cgroup v1 or we're using cgroup v2 with a mask that contains invalid CPUs. This will allow select_fallback_rq() to figure out the mask by itself. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://lkml.kernel.org/r/20210730112443.23245-3-will@kernel.org
2021-08-20	sched: Introduce task_cpu_possible_mask() to limit fallback rq selection	Will Deacon
	Asymmetric systems may not offer the same level of userspace ISA support across all CPUs, meaning that some applications cannot be executed by some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do not feature support for 32-bit applications on both clusters. On such a system, we must take care not to migrate a task to an unsupported CPU when forcefully moving tasks in select_fallback_rq() in response to a CPU hot-unplug operation. Introduce a task_cpu_possible_mask() hook which, given a task argument, allows an architecture to return a cpumask of CPUs that are capable of executing that task. The default implementation returns the cpu_possible_mask, since sane machines do not suffer from per-cpu ISA limitations that affect scheduling. The new mask is used when selecting the fallback runqueue as a last resort before forcing a migration to the first active CPU. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20210730112443.23245-2-will@kernel.org
2021-08-19	net/mlx5: E-switch, Introduce rate limiting groups API	Dmytro Linkin
	Extend eswitch API with rate limiting groups: - Define new struct mlx5_esw_rate_group that is used to hold all internal group data. - Implement functions that allow creation, destruction and cleanup of groups. - Assign all vports to internal unlimited zero group by default. This commit lays the groundwork for group rate limiting by implementing devlink_ops->rate_node_{new\|del}() callbacks to support creating and deleting groups through devlink rate node objects. APIs that allows setting rates and adding/removing members are implemented in following patches. Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Huy Nguyen <huyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	drivers/ptp/Kconfig: 55c8fca1dae1 ("ptp_pch: Restore dependency on PCI") e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-20	bpf: Use kvmalloc for map keys in syscalls	Stanislav Fomichev
	Same as previous patch but for the keys. memdup_bpfptr is renamed to kvmemdup_bpfptr (and converted to kvmalloc). Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818235216.1159202-2-sdf@google.com
2021-08-19	Merge tag 'linux-can-next-for-5.15-20210819' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== linux-can-next-for-5.15-20210819 The first patch is by me, for the mailmap file and maps the email address of two former ESD employees to a newly created role account. The next 3 patches are by Oleksij Rempel and add support for GPIO based switchable CAN bus termination. The next 3 patches are by Vincent Mailhol. The first one changes the CAN netlink interface to not bail out if the user switched off unsupported features. The next one adds Vincent as the maintainer of the etas_es58x driver and the last one cleans up the documentation of struct es58x_fd_tx_conf_msg. The next patch is by me, for the mcp251xfd driver and marks some instances of struct mcp251xfd_priv as const. Lad Prabhakar contributes 2 patches for the rcar_canfd driver, that add support for RZ/G2L family. The next 5 patches target the m_can/tcan45x5 driver. 2 are by me an fix trivial checkpatch warnings. The remaining 3 patches are by Matt Kline and improve the performance on the SPI based tcan4x5x chip by batching FIFO reads and writes. The last 7 patches are for the c_can driver. Dario Binacchi's patch converts the DT bindings to yaml, 2 patches by me fix a typo and rename a macro to properly represent the usage. The last 4 patches are again by Dario Binacchi and provide a performance improvement for the TX path by operating the TX mailboxes as a true FIFO. * tag 'linux-can-next-for-5.15-20210819' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (22 commits) can: c_can: cache frames to operate as a true FIFO can: c_can: support tx ring algorithm can: c_can: exit c_can_do_tx() early if no frames have been sent can: c_can: remove struct c_can_priv::priv field can: c_can: rename IF_RX -> IF_NAPI can: c_can: c_can_do_tx(): fix typo in comment dt-bindings: net: can: c_can: convert to json-schema can: m_can: Batch FIFO writes during CAN transmit can: m_can: Batch FIFO reads during CAN receive can: m_can: Disable IRQs on FIFO bus errors can: m_can: fix block comment style can: tcan4x5x: cdev_to_priv(): remove stray empty line can: rcar_canfd: Add support for RZ/G2L family dt-bindings: net: can: renesas,rcar-canfd: Document RZ/G2L SoC can: mcp251xfd: mark some instances of struct mcp251xfd_priv as const can: etas_es58x: clean-up documentation of struct es58x_fd_tx_conf_msg MAINTAINERS: add Vincent MAILHOL as maintainer for the ETAS ES58X CAN/USB driver can: netlink: allow user to turn off unsupported features can: dev: provide optional GPIO based termination support dt-bindings: can: fsl,flexcan: enable termination-* bindings ... ==================== Link: https://lore.kernel.org/r/20210819133913.657715-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19	PCI: endpoint: Add virtual function number in pci_epc ops	Kishon Vijay Abraham I
	Add virtual function number in pci_epc ops. EPC controller driver can perform virtual function specific initialization based on the virtual function number. Link: https://lore.kernel.org/r/20210819123343.1951-5-kishon@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2021-08-19	PCI: endpoint: Add support to add virtual function in endpoint core	Kishon Vijay Abraham I
	Add support to add virtual function in endpoint core. The virtual function can only be associated with a physical function instead of a endpoint controller. Provide APIs to associate a virtual function with a physical function here. [weiyongjun1@huawei.com: PCI: endpoint: Fix missing unlock on error in pci_epf_add_vepf() - Reported-by: Hulk Robot <hulkci@huawei.com>] Link: https://lore.kernel.org/r/20210819123343.1951-3-kishon@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2021-08-19	can: dev: provide optional GPIO based termination support	Oleksij Rempel
	For CAN buses to work, a termination resistor has to be present at both ends of the bus. This resistor is usually 120 Ohms, other values may be required for special bus topologies. This patch adds support for a generic GPIO based CAN termination. The resistor value has to be specified via device tree, and it can only be attached to or detached from the bus. By default the termination is not active. Link: https://lore.kernel.org/r/20210818071232.20585-4-o.rempel@pengutronix.de Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-08-19	net: mii: make mii_ethtool_gset() return void	Pavel Skripkin
	mii_ethtool_gset() does not return any errors. Since there are no users of this function that rely on its return value, it can be made void. Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19	entry: KVM: Allow use of generic KVM entry w/o full generic support	Oliver Upton
	Some architectures (e.g. arm64) have yet to adopt the generic entry infrastructure. Despite that, it would be nice to use some common plumbing for guest entry/exit handling. For example, KVM/arm64 currently does not handle TIF_NOTIFY_PENDING correctly. Allow use of only the generic KVM entry code by tightening up the include list. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210802192809.1851010-3-oupton@google.com
2021-08-19	iommu/vt-d: Allow devices to have more than 32 outstanding PRs	Lu Baolu
	The minimum per-IOMMU PRQ queue size is one 4K page, this is more entries than the hardcoded limit of 32 in the current VT-d code. Some devices can support up to 512 outstanding PRQs but underutilized by this limit of 32. Although, 32 gives some rough fairness when multiple devices share the same IOMMU PRQ queue, but far from optimal for customized use case. This extends the per-IOMMU PRQ queue size to four 4K pages and let the devices have as many outstanding page requests as they can. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19	iommu/vt-d: Update the virtual command related registers	Lu Baolu
	The VT-d spec Revision 3.3 updated the virtual command registers, virtual command opcode B register, virtual command response register and virtual command capability register (Section 10.4.43, 10.4.44, 10.4.45, 10.4.46). This updates the virtual command interface implementation in the Intel IOMMU driver accordingly. Fixes: 24f27d32ab6b7 ("iommu/vt-d: Enlightened PASID allocation") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Sanjay Kumar <sanjay.k.kumar@intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210713042649.3547403-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-19	dma-mapping: make the global coherent pool conditional	Christoph Hellwig
	Only build the code to support the global coherent pool if support for it is enabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Dillon Min <dillon.minfei@gmail.com>
2021-08-19	isystem: ship and use stdarg.h	Alexey Dobriyan
	Ship minimal stdarg.h (1 type, 4 macros) as <linux/stdarg.h>. stdarg.h is the only userspace header commonly used in the kernel. GPL 2 version of <stdarg.h> can be extracted from http://archive.debian.org/debian/pool/main/g/gcc-4.2/gcc-4.2_4.2.4.orig.tar.gz Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-08-19	isystem: trim/fixup stdarg.h and other headers	Alexey Dobriyan
	Delete/fixup few includes in anticipation of global -isystem compile option removal. Note: crypto/aegis128-neon-inner.c keeps <stddef.h> due to redefinition of uintptr_t error (one definition comes from <stddef.h>, another from <linux/types.h>). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-08-18	PCI: Change the type of probe argument in reset functions	Amey Narkhede
	Change the type of probe argument in functions which implement reset methods from int to bool to make the context and intent clear. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20210817180500.1253-10-ameynarkhede03@gmail.com Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-08-18	PCI: Add support for ACPI _RST reset method	Shanker Donthineni
	_RST is a standard ACPI method that performs a function level reset of a device (ACPI v6.3, sec 7.3.25). Add pci_dev_acpi_reset() to probe for _RST method and execute if present. The default priority of this reset is set to below device-specific and above hardware resets. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20210817180500.1253-9-ameynarkhede03@gmail.com Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-18	tracing: Have dynamic events have a ref counter	Steven Rostedt (VMware)
	As dynamic events are not created by modules, if something is attached to one, calling "try_module_get()" on its "mod" field, is not going to keep the dynamic event from going away. Since dynamic events do not need the "mod" pointer of the event structure, make a union out of it in order to save memory (there's one structure for each of the thousand+ events in the kernel), and have any event with the DYNAMIC flag set to use a ref counter instead. Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/ Link: https://lkml.kernel.org/r/20210817035027.174869074@goodmis.org Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-18	tracing: Add DYNAMIC flag for dynamic events	Steven Rostedt (VMware)
	To differentiate between static and dynamic events, add a new flag DYNAMIC to the event flags that all dynamic events have set. This will allow to differentiate when attaching to a dynamic event from a static event. Static events have a mod pointer that references the module they were created in (or NULL for core kernel). This can be incremented when the event has something attached to it. But there exists no such mechanism for dynamic events. This is dangerous as the dynamic events may now disappear without the "attachment" knowing that it no longer exists. To enforce the dynamic flag, change dyn_event_add() to pass the event that is being created such that it can set the DYNAMIC flag of the event. This helps make sure that no location that creates a dynamic event misses setting this flag. Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/ Link: https://lkml.kernel.org/r/20210817035026.936958254@goodmis.org Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-18	coresight: syscfg: Add API to activate and enable configurations	Mike Leach
	Configurations are first activated, then when any coresight device is enabled, the active configurations are checked and any matching one is enabled. This patch provides the activation / enable API. Link: https://lore.kernel.org/r/20210723165444.1048-6-mike.leach@linaro.org Signed-off-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20210818194022.379573-6-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18	coresight: syscfg: Add registration and feature loading for cs devices	Mike Leach
	API for individual devices to register with the syscfg management system is added. Devices register with matching information, and any features or configurations that match will be loaded into the device. The feature and configuration loading is extended so that on load these are loaded into any currently registered devices. This allows configuration loading after devices have been registered. Link: https://lore.kernel.org/r/20210723165444.1048-3-mike.leach@linaro.org Signed-off-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20210818194022.379573-3-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18	USB: EHCI: Add alias for Broadcom INSNREG	Kees Cook
	Refactor struct ehci_regs to avoid accessing beyond the end of port_status. This change results in no difference in the final object code. Avoids several warnings when building with -Warray-bounds: drivers/usb/host/ehci-brcm.c: In function 'ehci_brcm_reset': drivers/usb/host/ehci-brcm.c:113:32: warning: array subscript 16 is above array bounds of 'u32[15]' {aka 'unsigned int[15]'} [-Warray-bounds] 113 \| ehci_writel(ehci, 0x00800040, &ehci->regs->port_status[0x10]); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/usb/host/ehci.h:274, from drivers/usb/host/ehci-brcm.c:15: ./include/linux/usb/ehci_def.h:132:7: note: while referencing 'port_status' 132 \| u32 port_status[HCS_N_PORTS_MAX]; \| ^~~~~~~~~~~ Note that the documentation around this proprietary register was confusing. If "USB_EHCI_INSNREG00" is at port_status[0x0f], its offset would be 0x80 (not 0x90). The comments have been adjusted to fix this apparent typo. Fixes: 9df231511bd6 ("usb: ehci: Add new EHCI driver for Broadcom STB SoC's") Cc: Al Cooper <alcooperx@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-usb@vger.kernel.org Cc: bcm-kernel-feedback-list@broadcom.com Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210818173018.2259231-3-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18	USB: EHCI: Add register array bounds to HCS ports	Kees Cook
	The original EHCI register struct used a trailing 0-element array for addressing the N_PORTS-many available registers. However, after commit a46af4ebf9ff ("USB: EHCI: define extension registers like normal ones") the 0-element array started to overlap the USBMODE extension register. To avoid future compile-time warnings about accessing indexes within a 0-element array, rearrange the struct to actually describe the expected layout (max 15 registers) with a union. All offsets remain the same, and bounds checking becomes possible on accesses to port_status and hostpc. There are no binary differences, and struct offsets continue to match. "pahole --hex -C ehci_regs" before: struct ehci_regs { u32 command; /* 0 0x4 / u32 status; / 0x4 0x4 / u32 intr_enable; / 0x8 0x4 / u32 frame_index; / 0xc 0x4 / u32 segment; / 0x10 0x4 / u32 frame_list; / 0x14 0x4 / u32 async_next; / 0x18 0x4 / u32 reserved1[2]; / 0x1c 0x8 / u32 txfill_tuning; / 0x24 0x4 / u32 reserved2[6]; / 0x28 0x18 / / --- cacheline 1 boundary (64 bytes) --- / u32 configured_flag; / 0x40 0x4 / u32 port_status[0]; / 0x44 0 / u32 reserved3[9]; / 0x44 0x24 / u32 usbmode; / 0x68 0x4 / u32 reserved4[6]; / 0x6c 0x18 / / --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- / u32 hostpc[0]; / 0x84 0 / u32 reserved5[17]; / 0x84 0x44 / / --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- / u32 usbmode_ex; / 0xc8 0x4 / / size: 204, cachelines: 4, members: 18 / / last cacheline: 12 bytes / }; after: struct ehci_regs { u32 command; / 0 0x4 / u32 status; / 0x4 0x4 / u32 intr_enable; / 0x8 0x4 / u32 frame_index; / 0xc 0x4 / u32 segment; / 0x10 0x4 / u32 frame_list; / 0x14 0x4 / u32 async_next; / 0x18 0x4 / u32 reserved1[2]; / 0x1c 0x8 / u32 txfill_tuning; / 0x24 0x4 / u32 reserved2[6]; / 0x28 0x18 / / --- cacheline 1 boundary (64 bytes) --- / u32 configured_flag; / 0x40 0x4 / union { u32 port_status[15]; / 0x44 0x3c / struct { u32 reserved3[9]; / 0x44 0x24 / u32 usbmode; / 0x68 0x4 / }; / 0x44 0x28 / }; / 0x44 0x3c / / --- cacheline 2 boundary (128 bytes) --- / u32 reserved4; / 0x80 0x4 / u32 hostpc[15]; / 0x84 0x3c / / --- cacheline 3 boundary (192 bytes) --- / u32 reserved5[2]; / 0xc0 0x8 / u32 usbmode_ex; / 0xc8 0x4 / / size: 204, cachelines: 4, members: 16 / / last cacheline: 12 bytes / }; With this fixed, adding -Wzero-length-bounds to the build no longer produces several warnings like this: In file included from drivers/usb/host/ehci-hcd.c:306: drivers/usb/host/ehci-hub.c: In function 'ehci_port_handed_over': drivers/usb/host/ehci-hub.c:1194:8: warning: array subscript '<unknown>' is outside the bounds of an interior zero-length array 'u32[0]' {aka 'unsigned int[]'} [-Wzero-length-bounds] 1194 \| reg = &ehci->regs->port_status[portnum - 1]; \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/usb/host/ehci.h:274, from drivers/usb/host/ehci-hcd.c:97: ./include/linux/usb/ehci_def.h:130:7: note: while referencing 'port_status' 130 \| u32 port_status[0]; / up to N_PORTS */ \| ^~~~~~~~~~~ Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Al Cooper <alcooperx@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: linux-usb@vger.kernel.org Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210818173018.2259231-2-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18	ovl: enable RCU'd ->get_acl()	Miklos Szeredi
	Overlayfs does not cache ACL's (to avoid double caching). Instead it just calls the underlying filesystem's i_op->get_acl(), which will return the cached value, if possible. In rcu path walk, however, get_cached_acl_rcu() is employed to get the value from the cache, which will fail on overlayfs resulting in dropping out of rcu walk mode. This can result in a big performance hit in certain situations. Fix by calling ->get_acl() with rcu=true in case of ACL_DONT_CACHE (which indicates pass-through) Reported-by: garyhuang <zjh.20052005@163.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-18	vfs: add rcu argument to ->get_acl() callback	Miklos Szeredi
	Add a rcu argument to the ->get_acl() callback to allow get_cached_acl_rcu() to call the ->get_acl() method in the next patch. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-18	pipe: avoid unnecessary EPOLLET wakeups under normal loads	Linus Torvalds
	I had forgotten just how sensitive hackbench is to extra pipe wakeups, and commit 3a34b13a88ca ("pipe: make pipe writes always wake up readers") ended up causing a quite noticeable regression on larger machines. Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's not clear that this matters in real life all that much, but as Mel points out, it's used often enough when comparing kernels and so the performance regression shows up like a sore thumb. It's easy enough to fix at least for the common cases where pipes are used purely for data transfer, and you never have any exciting poll usage at all. So set a special 'poll_usage' flag when there is polling activity, and make the ugly "EPOLLET has crazy legacy expectations" semantics explicit to only that case. I would love to limit it to just the broken EPOLLET case, but the pipe code can't see the difference between epoll and regular select/poll, so any non-read/write waiting will trigger the extra wakeup behavior. That is sufficient for at least the hackbench case. Apart from making the odd extra wakeup cases more explicitly about EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe write, not at the first write chunk. That is actually much saner semantics (as much as you can call any of the legacy edge-triggered expectations for EPOLLET "sane") since it means that you know the wakeup will happen once the write is done, rather than possibly in the middle of one. [ For stable people: I'm putting a "Fixes" tag on this, but I leave it up to you to decide whether you actually want to backport it or not. It likely has no impact outside of synthetic benchmarks - Linus ] Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/ Fixes: 3a34b13a88ca ("pipe: make pipe writes always wake up readers") Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Sandeep Patil <sspatil@android.com> Tested-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-18	dma-mapping: add a dma_init_global_coherent helper	Christoph Hellwig
	Add a new helper to initialize the global coherent pool. This both cleans up the existing initialization which indirects through the reserved_mem_ops that are normally only used for struct device, and also allows using the global pool for non-devicetree architectures. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Dillon Min <dillon.minfei@gmail.com>
2021-08-18	driver core: platform: Remove platform_device_add_properties()	Heikki Krogerus
	There are no more users for it. The last place where it's called is in platform_device_register_full(). Replacing that call with device_create_managed_software_node() and removing the function. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20210817102449.39994-3-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18	Merge tag 'qcom-drivers-for-5.15' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/drivers Qualcomm driver updates for v5.15 This fixes the "shared memory state machine" (SMSM) interrupt logic to avoid missing transitions happening while the interrupts are masked. SM6115 support is added to smd-rpm and rpmpd. The Qualcomm SCM firmware driver is once again made possible to compile and load as a kernel module. An out-of-bounds error related to the cooling devices of the AOSS driver is corrected. The binding is converted to YAML and a generic compatible is introduced to reduce the driver churn. The GENI wrapper gains a helper function used in I2C and SPI for switching the serial engine hardware to use the wrapper's DMA-engine. Lastly it contains a number of cleanups and smaller fixes for rpmhpd, socinfo, CPR, mdt_loader and the GENI DT binding. * tag 'qcom-drivers-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: soc: qcom: smsm: Fix missed interrupts if state changes while masked soc: qcom: smsm: Implement support for get_irqchip_state soc: qcom: mdt_loader: be more informative on errors dt-bindings: qcom: geni-se: document iommus soc: qcom: smd-rpm: Add SM6115 compatible soc: qcom: geni: Add support for gpi dma soc: qcom: geni: move GENI_IF_DISABLE_RO to common header PM: AVS: qcom-cpr: Use nvmem_cell_read_variable_le_u32() drivers: soc: qcom: rpmpd: Add SM6115 RPM Power Domains dt-bindings: power: rpmpd: Add SM6115 to rpmpd binding dt-bindings: soc: qcom: smd-rpm: Add SM6115 compatible soc: qcom: aoss: Fix the out of bound usage of cooling_devs firmware: qcom_scm: Allow qcom_scm driver to be loadable as a permenent module soc: qcom: socinfo: Don't print anything if nothing found soc: qcom: rpmhpd: Use corner in power_off soc: qcom: aoss: Add generic compatible dt-bindings: soc: qcom: aoss: Convert to YAML dt-bindings: soc: qcom: aoss: Add SC8180X and generic compatible firmware: qcom_scm: remove a duplicative condition firmware: qcom_scm: Mark string array const Link: https://lore.kernel.org/r/20210816214840.581244-1-bjorn.andersson@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-08-18	block: fix default IO priority handling	Damien Le Moal
	The default IO priority is the best effort (BE) class with the normal priority level IOPRIO_NORM (4). However, get_task_ioprio() returns IOPRIO_CLASS_NONE/IOPRIO_NORM as the default priority and get_current_ioprio() returns IOPRIO_CLASS_NONE/0. Let's be consistent with the defined default and have both of these functions return the default priority IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM) when the user did not define another default IO priority for the task. In include/uapi/linux/ioprio.h, introduce the IOPRIO_BE_NORM macro as an alias to IOPRIO_NORM to clarify that this default level applies to the BE priotity class. In include/linux/ioprio.h, define the macro IOPRIO_DEFAULT as IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM) and use this new macro when setting a priority to the default. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210811033702.368488-7-damien.lemoal@wdc.com [axboe: drop unnecessary lightnvm change] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-18	block: change ioprio_valid() to an inline function	Damien Le Moal
	Change the ioprio_valid() macro in include/usapi/linux/ioprio.h to an inline function declared on the kernel side in include/linux/ioprio.h. Also improve checks on the class value by checking the upper bound value. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210811033702.368488-4-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>