linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-01-14	Merge tag 'regulator-eq' of ↵	Mark Brown
	https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into asoc-5.6 regulator: add regulator_equal()
2020-01-14	regulator: core: Add regulator_is_equal() helper	Marek Vasut
	Add regulator_is_equal() helper to compare whether two regulators are the same. This is useful for checking whether two separate regulators in a driver are actually the same supply. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: Igor Opaniuk <igor.opaniuk@toradex.com> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Marcel Ziswiler <marcel.ziswiler@toradex.com> Cc: Mark Brown <broonie@kernel.org> Cc: Oleksandr Suvorov <oleksandr.suvorov@toradex.com> Link: https://lore.kernel.org/r/20191220164450.1395038-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14	misc: alcor_pci: Add AU6625 to list of supported PCI_IDs	Rhys Perry
	I have added the AU6625 PCI_ID to the list of supported IDs: alcor_pci.c // Added au6625s ID to the array of supported devices alcor_pci.h // Added entry to define the PCI ID Made it fit in with the already submitted code: alcor_pci.c // Added config entry to that matches the one for au6601 >From general usage there seems to be no problems. Signed-off-by: Rhys Perry <rhysperry111@gmail.com> Link: https://lore.kernel.org/r/20191229171824.10308-1-rhysperry111@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14	misc: pvpanic: add crash loaded event	zhenwei pi
	Some users prefer kdump tools to generate guest kernel dumpfile, at the same time, need a out-of-band kernel panic event. Currently if booting guest kernel with 'crash_kexec_post_notifiers', QEMU will receive PVPANIC_PANICKED event and stop VM. If booting guest kernel without 'crash_kexec_post_notifiers', guest will not call notifier chain. Add PVPANIC_CRASH_LOADED bit for pvpanic event, it means that guest kernel actually hit a kernel panic, but the guest kernel wants to handle by itself. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Link: https://lore.kernel.org/r/20200102023513.318836-3-pizhenwei@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14	misc: pvpanic: move bit definition to uapi header file	zhenwei pi
	Some processes outside of the kernel(Ex, QEMU) should know what the value really is for, so move the bit definition to a uapi file. Suggested-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Link: https://lore.kernel.org/r/20200102023513.318836-2-pizhenwei@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14	fs/proc: Introduce /proc/pid/timens_offsets	Andrei Vagin
	API to set time namespace offsets for children processes, i.e.: echo "$clockid $offset_sec $offset_nsec" > /proc/self/timens_offsets Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-28-dima@arista.com
2020-01-14	x86/vdso: Zap vvar pages when switching to a time namespace	Dmitry Safonov
	The VVAR page layout depends on whether a task belongs to the root or non-root time namespace. Whenever a task changes its namespace, the VVAR page tables are cleared and then they will be re-faulted with a corresponding layout. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-27-dima@arista.com
2020-01-14	time: Allocate per-timens vvar page	Dmitry Safonov
	VDSO support for Time namespace needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page contains time namespace clock offsets and it has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. Allocate the timens page during namespace creation. Setup the offsets when the first task enters the ns and freeze them to guarantee the pace of monotonic/boottime clocks and to avoid breakage of applications. The design decision is to have a global offset_lock which is used during namespace offsets setup and to freeze offsets when the first task joins the new time namespace. That is better in terms of memory usage compared to having a per namespace mutex that's used only during the setup period. Suggested-by: Andy Lutomirski <luto@kernel.org> Based-on-work-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-24-dima@arista.com
2020-01-14	x86/vdso: Provide vdso_data offset on vvar_page	Dmitry Safonov
	VDSO support for time namespaces needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. To prepare the time namespace page the kernel needs to know the vdso_data offset. Provide arch_get_vdso_data() helper for locating vdso_data on VVAR page. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com
2020-01-14	lib/vdso: Prepare for time namespace support	Thomas Gleixner
	To support time namespaces in the vdso with a minimal impact on regular non time namespace affected tasks, the namespace handling needs to be hidden in a slow path. The most obvious place is vdso_seq_begin(). If a task belongs to a time namespace then the VVAR page which contains the system wide vdso data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the vdso data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. If VDSO time namespace support is disabled the whole magic is compiled out. Initial testing shows that the disabled case is almost identical to the host case which does not take the slow timens path. With the special timens page installed the performance hit is constant time and in the range of 5-7%. For the vdso functions which are not using the sequence count an unconditional check for vdso_data->clock_mode is added which switches to the real vdso when the clock_mode is VCLOCK_TIMENS. [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data pointer by CS_RAW offset.] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com
2020-01-14	hrtimers: Prepare hrtimer_nanosleep() for time namespaces	Andrei Vagin
	clock_nanosleep() accepts absolute values of expiration time when TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace, and has to be converted to the host's time. There is timens_ktime_to_host() helper for converting time, but it accepts ktime argument. As a preparation, make hrtimer_nanosleep() accept a clock value in ktime instead of timespec64. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-17-dima@arista.com
2020-01-14	time: Add do_timens_ktime_to_host() helper	Andrei Vagin
	The helper subtracts namespace's clock offset from the given time and ensures that the result is within [0, KTIME_MAX]. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-13-dima@arista.com
2020-01-14	time: Add timens_offsets to be used for tasks in time namespace	Andrei Vagin
	Introduce offsets for time namespace. They will contain an adjustment needed to convert clocks to/from host's. A new namespace is created with the same offsets as the time namespace of the current process. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-5-dima@arista.com
2020-01-14	ns: Introduce Time Namespace	Andrei Vagin
	Time Namespace isolates clock values. The kernel provides access to several clocks CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc. CLOCK_REALTIME System-wide clock that measures real (i.e., wall-clock) time. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. CLOCK_BOOTTIME Identical to CLOCK_MONOTONIC, except it also includes any time that the system is suspended. For many users, the time namespace means the ability to changes date and time in a container (CLOCK_REALTIME). Providing per namespace notions of CLOCK_REALTIME would be complex with a massive overhead, but has a dubious value. But in the context of checkpoint/restore functionality, monotonic and boottime clocks become interesting. Both clocks are monotonic with unspecified starting points. These clocks are widely used to measure time slices and set timers. After restoring or migrating processes, it has to be guaranteed that they never go backward. In an ideal case, the behavior of these clocks should be the same as for a case when a whole system is suspended. All this means that it is required to set CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace offsets for clocks. A time namespace is similar to a pid namespace in the way how it is created: unshare(CLONE_NEWTIME) system call creates a new time namespace, but doesn't set it to the current process. Then all children of the process will be born in the new time namespace, or a process can use the setns() system call to join a namespace. This scheme allows setting clock offsets for a namespace, before any processes appear in it. All available clone flags have been used, so CLONE_NEWTIME uses the highest bit of CSIGNAL. It means that it can be used only with the unshare() and the clone3() system calls. [ tglx: Adjusted paragraph about clone3() to reality and massaged the changelog a bit. ] Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://criu.org/Time_namespace Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
2020-01-14	lib/vdso: Add unlikely() hint into vdso_read_begin()	Andrei Vagin
	Place the branch with no concurrent write before the contended case. Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (more clock_gettime() cycles - the better): \| before \| after ----------------------------------- \| 150252214 \| 153242367 \| 150301112 \| 153324800 \| 150392773 \| 153125401 \| 150373957 \| 153399355 \| 150303157 \| 153489417 \| 150365237 \| 153494270 ----------------------------------- avg \| 150331408 \| 153345935 diff % \| 2 \| 0 ----------------------------------- stdev % \| 0.3 \| 0.1 Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20191112012724.250792-2-dima@arista.com
2020-01-14	soundwire: bus: fix device number leak on errors	Pierre-Louis Bossart
	If the programming of the dev_number fails due to an IO error, a new device_number will be assigned, resulting in a leak. Make sure we only assign a device_number once per Slave device. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200113225637.17313-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14	phy: Add DisplayPort configuration options	Yuti Amonkar
	Allow DisplayPort PHYs to be configured through the generic functions through a custom structure added to the generic union. The configuration structure is used for reconfiguration of DisplayPort PHYs during link training operation. The parameters added here are the ones defined in the DisplayPort spec v1.4 which include link rate, number of lanes, voltage swing and pre-emphasis. Add the DisplayPort phy mode to the generic phy_mode enum. Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Jyri Sarha <jsarha@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-13	net: stmmac: Initial support for TBS	Jose Abreu
	Adds the initial hooks for TBS support. This needs a 32 byte descriptor in order for it to work with current HW. Adds all the logic for Enhanced Descriptors in main core but no HW related logic for now. Changes from v2: - Use bitfield for TBS status / support (Jakub) - Remove unneeded cache alignment (Jakub) - Fix checkpatch issues Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13	mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE	Yang Shi
	Commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") introduced a new khugepaged scan result: SCAN_PAGE_HAS_PRIVATE, but the corresponding description for trace events were not added. Link: http://lkml.kernel.org/r/1574793844-2914-1-git-send-email-yang.shi@linux.alibaba.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Song Liu <songliubraving@fb.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13	mm, debug_pagealloc: don't rely on static keys too early	Vlastimil Babka
	Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") has introduced a static key to reduce overhead when debug_pagealloc is compiled in but not enabled. It relied on the assumption that jump_label_init() is called before parse_early_param() as in start_kernel(), so when the "debug_pagealloc=on" option is parsed, it is safe to enable the static key. However, it turns out multiple architectures call parse_early_param() earlier from their setup_arch(). x86 also calls jump_label_init() even earlier, so no issue was found while testing the commit, but same is not true for e.g. ppc64 and s390 where the kernel would not boot with debug_pagealloc=on as found by our QA. To fix this without tricky changes to init code of multiple architectures, this patch partially reverts the static key conversion from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch code) of debug_pagealloc_enabled() will again test a simple bool variable. Fastpath mm code is converted to a new debug_pagealloc_enabled_static() variant that relies on the static key, which is enabled in a well-defined point in mm_init() where it's guaranteed that jump_label_init() has been called, regardless of architecture. [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early] Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13	mm: memcg/slab: fix percpu slab vmstats flushing	Roman Gushchin
	Currently slab percpu vmstats are flushed twice: during the memcg offlining and just before freeing the memcg structure. Each time percpu counters are summed, added to the atomic counterparts and propagated up by the cgroup tree. The second flushing is required due to how recursive vmstats are implemented: counters are batched in percpu variables on a local level, and once a percpu value is crossing some predefined threshold, it spills over to atomic values on the local and each ascendant levels. It means that without flushing some numbers cached in percpu variables will be dropped on floor each time a cgroup is destroyed. And with uptime the error on upper levels might become noticeable. The first flushing aims to make counters on ancestor levels more precise. Dying cgroups may resume in the dying state for a long time. After kmem_cache reparenting which is performed during the offlining slab counters of the dying cgroup don't have any chances to be updated, because any slab operations will be performed on the parent level. It means that the inaccuracy caused by percpu batching will not decrease up to the final destruction of the cgroup. By the original idea flushing slab counters during the offlining should minimize the visible inaccuracy of slab counters on the parent level. The problem is that percpu counters are not zeroed after the first flushing. So every cached percpu value is summed twice. It creates a small error (up to 32 pages per cpu, but usually less) which accumulates on parent cgroup level. After creating and destroying of thousands of child cgroups, slab counter on parent level can be way off the real value. For now, let's just stop flushing slab counters on memcg offlining. It can't be done correctly without scheduling a work on each cpu: reading and zeroing it during css offlining can race with an asynchronous update, which doesn't expect values to be changed underneath. With this change, slab counters on parent level will become eventually consistent. Once all dying children are gone, values are correct. And if not, the error is capped by 32 * NR_CPUS pages per dying cgroup. It's not perfect, as slab are reparented, so any updates after the reparenting will happen on the parent level. It means that if a slab page was allocated, a counter on child level was bumped, then the page was reparented and freed, the annihilation of positive and negative counter values will not happen until the child cgroup is released. It makes slab counters different from others, and it might want us to implement flushing in a correct form again. But it's also a question of performance: scheduling a work on each cpu isn't free, and it's an open question if the benefit of having more accurate counters is worth it. We might also consider flushing all counters on offlining, not only slab counters. So let's fix the main problem now: make the slab counters eventually consistent, so at least the error won't grow with uptime (or more precisely the number of created and destroyed cgroups). And think about the accuracy of counters separately. Link: http://lkml.kernel.org/r/20191220042728.1045881-1-guro@fb.com Fixes: bee07b33db78 ("mm: memcontrol: flush percpu slab vmstats on kmem offlining") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13	ptr_ring: add include of linux/mm.h	Jesper Dangaard Brouer
	Commit 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails") started to use kvmalloc_array and kvfree, which are defined in mm.h, the previous functions kcalloc and kfree, which are defined in slab.h. Add the missing include of linux/mm.h. This went unnoticed as other include files happened to include mm.h. Fixes: 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13	iio: st_sensors: Drop redundant parameter from st_sensors_of_name_probe()	Andy Shevchenko
	Since we have access to the struct device_driver and thus to the ID table, there is no need to supply special parameters to st_sensors_of_name_probe(). Besides that we have a common API to get driver match data, there is no need to do matching separately for OF and ACPI. Taking into consideration above, simplify the ST sensors code. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2020-01-13	arch: wire up pidfd_getfd syscall	Sargun Dhillon
	This wires up the pidfd_getfd syscall for all architectures. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-13	vfs, fdtable: Add fget_task helper	Sargun Dhillon
	This introduces a function which can be used to fetch a file, given an arbitrary task. As long as the user holds a reference (refcnt) to the task_struct it is safe to call, and will either return NULL on failure, or a pointer to the file, with a refcnt. This patch is based on Oleg Nesterov's (cf. [1]) patch from September 2018. [1]: Link: https://lore.kernel.org/r/20180915160423.GA31461@redhat.com Signed-off-by: Sargun Dhillon <sargun@sargun.me> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-2-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-13	RDMA/core: Make ib_uverbs_async_event_file into a uobject	Jason Gunthorpe
	This makes async events aligned with completion events as both are full uobjects of FD type and use the same uobject lifecycle. A bunch of duplicate code is consolidated and the general flow between the two FDs is now very similar. Link: https://lore.kernel.org/r/1578504126-9400-14-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13	RDMA/core: Remove the ufile arg from rdma_alloc_begin_uobject	Jason Gunthorpe
	Now that all callers provide a non-NULL attrs the ufile is redundant. Adjust things so that the context handling is done inside alloc_uobj, and the ib_uverbs_get_ucontext_file() is avoided if we already have the context. Link: https://lore.kernel.org/r/1578504126-9400-13-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13	RDMA/core: Do not erase the type of ib_wq.uobject	Jason Gunthorpe
	This is a struct ib_uwq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-10-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13	RDMA/core: Do not erase the type of ib_srq.uobject	Jason Gunthorpe
	This is a struct ib_usrq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-9-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13	RDMA/core: Do not erase the type of ib_qp.uobject	Jason Gunthorpe
	This is a struct ib_uqp_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-8-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13	RDMA/core: Do not erase the type of ib_cq.uobject	Jason Gunthorpe
	This is a struct ib_ucq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-7-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13	RDMA/core: Do not allow alloc_commit to fail	Jason Gunthorpe
	This is a left over from an earlier version that creates a lot of complexity for error unwind, particularly for FD uobjects. The only reason this was done is so that anon_inode_get_file() could be called with the final fops and a fully setup uobject. Both need to be setup since unwinding anon_inode_get_file() via fput will call the driver's release(). Now that the driver does not provide release, we no longer need to worry about this complicated sequence, simply create the struct file at the start and allow the core code's release function to deal with the abort case. This allows all the confusing error paths around commit to be removed. Link: https://lore.kernel.org/r/1578504126-9400-5-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13	RDMA/core: Simplify destruction of FD uobjects	Jason Gunthorpe
	FD uobjects have a weird split between the struct file and uobject world. Simplify this to make them pure uobjects and use a generic release method for all struct file operations. This fixes the control flow so that mlx5_cmd_cleanup_async_ctx() is always called before erasing the linked list contents to make the concurrancy simpler to understand. For this to work the uobject destruction must fence anything that it is cleaning up - the design must not rely on struct file lifetime. Only deliver_event() relies on the struct file to when adding new events to the queue, add a is_destroyed check under lock to block it. Link: https://lore.kernel.org/r/1578504126-9400-3-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13	RDMA/mlx5: Use RCU and direct refcounts to keep memory alive	Jason Gunthorpe
	dispatch_event_fd() runs from a notifier with minimal locking, and relies on RCU and a file refcount to keep the uobject and eventfd alive. As the next patch wants to remove the file_operations release function from the drivers, re-organize things so that the devx_event_notifier() path uses the existing RCU to manage the lifetime of the uobject and eventfd. Move the refcount puts to a call_rcu so that the objects are guaranteed to exist and remove the indirect file refcount. Link: https://lore.kernel.org/r/1578504126-9400-2-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13	RDMA/uverbs: Remove needs_kfree_rcu from uverbs_obj_type_class	Jason Gunthorpe
	After device disassociation the uapi_objects are destroyed and freed, however it is still possible that core code can be holding a kref on the uobject. When it finally goes to uverbs_uobject_free() via the kref_put() it can trigger a use-after-free on the uapi_object. Since needs_kfree_rcu is a micro optimization that only benefits file uobjects, just get rid of it. There is no harm in using kfree_rcu even if it isn't required, and the number of involved objects is small. Link: https://lore.kernel.org/r/20200113143306.GA28717@ziepe.ca Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13	md/raid6: fix algorithm choice under larger PAGE_SIZE	Zhengyuan Liu
	There are several algorithms available for raid6 to generate xor and syndrome parity, including basic int1, int2 ... int32 and SIMD optimized implementation like sse and neon. To test and choose the best algorithms at the initial stage, we need provide enough disk data to feed the algorithms. However, the disk number we provided depends on page size and gfmul table, seeing bellow: const int disks = (65536/PAGE_SIZE) + 2; So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk, as a result the chosed algorithm is not reliable. For example, on my arm64 machine with 64K page enabled, it will choose intx32 as the best one, although the NEON implementation is better. This patch tries to fix the problem by defining a constant raid6 disk number to supporting arbitrary page size. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13	raid6/test: fix a compilation warning	Zhengyuan Liu
	The compilation warning is redefination showed as following: In file included from tables.c:2: ../../../include/linux/export.h:180: warning: "EXPORT_SYMBOL" redefined #define EXPORT_SYMBOL(sym) __EXPORT_SYMBOL(sym, "") In file included from tables.c:1: ../../../include/linux/raid/pq.h:61: note: this is the location of the previous definition #define EXPORT_SYMBOL(sym) Fixes: 69a94abb82ee ("export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols") Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13	raid6/test: fix a compilation error	Zhengyuan Liu
	The compilation error is redeclaration showed as following: In file included from ../../../include/linux/limits.h:6, from /usr/include/x86_64-linux-gnu/bits/local_lim.h:38, from /usr/include/x86_64-linux-gnu/bits/posix1_lim.h:161, from /usr/include/limits.h:183, from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:194, from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/syslimits.h:7, from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:34, from ../../../include/linux/raid/pq.h:30, from algos.c:14: ../../../include/linux/types.h:114:15: error: conflicting types for ‘int64_t’ typedef s64 int64_t; ^~~~~~~ In file included from /usr/include/stdint.h:34, from /usr/lib/gcc/x86_64-linux-gnu/8/include/stdint.h:9, from /usr/include/inttypes.h:27, from ../../../include/linux/raid/pq.h:29, from algos.c:14: /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous \ declaration of ‘int64_t’ was here typedef __int64_t int64_t; Fixes: 54d50897d544 ("linux/kernel.h: split _MAX and _MIN macros into <linux/limits.h>") Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13	platform/mellanox: mlxreg-hotplug: Add support for new capability register	Vadim Pasternak
	Add support for capability register, which is used for detection of the actual number of interrupt capable components within the particular group, supported by the specific system. Such components could be for example the number of power units and interrupts related to these units. The motivation is to avoid adding a new code in the future in order to distinct between the systems type supported different number of the components like power supplies, FANs, ASICs, line cards. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-13	tracing: kprobes: Output kprobe event to printk buffer	Masami Hiramatsu
	Since kprobe-events use event_trigger_unlock_commit_regs() directly, that events doesn't show up in printk buffer if "tp_printk" is set. Use trace_event_buffer_commit() in kprobe events so that it can invoke output_printk() as same as other trace events. Link: http://lkml.kernel.org/r/157867233085.17873.5210928676787339604.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> [ Adjusted data var declaration placement in __kretprobe_trace_func() ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13	bootconfig: Add Extra Boot Config support	Masami Hiramatsu
	Extra Boot Config (XBC) allows admin to pass a tree-structured boot configuration file when boot up the kernel. This extends the kernel command line in an efficient way. Boot config will contain some key-value commands, e.g. key.word = value1 another.key.word = value2 It can fold same keys with braces, also you can write array data. For example, key { word1 { setting1 = data setting2 } word2.array = "val1", "val2" } User can access these key-value pair and tree structure via SKC APIs. Link: http://lkml.kernel.org/r/157867221257.17873.1775090991929862549.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13	tracing: Make struct ring_buffer less ambiguous	Steven Rostedt (VMware)
	As there's two struct ring_buffers in the kernel, it causes some confusion. The other one being the perf ring buffer. It was agreed upon that as neither of the ring buffers are generic enough to be used globally, they should be renamed as: perf's ring_buffer -> perf_buffer ftrace's ring_buffer -> trace_buffer This implements the changes to the ring buffer that ftrace uses. Link: https://lore.kernel.org/r/20191213140531.116b3200@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13	tracing: Rename trace_buffer to array_buffer	Steven Rostedt (VMware)
	As we are working to remove the generic "ring_buffer" name that is used by both tracing and perf, the ring_buffer name for tracing will be renamed to trace_buffer, and perf's ring buffer will be renamed to perf_buffer. As there already exists a trace_buffer that is used by the trace_arrays, it needs to be first renamed to array_buffer. Link: https://lore.kernel.org/r/20191213153553.GE20583@krava Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13	perf: Make struct ring_buffer less ambiguous	Steven Rostedt (VMware)
	eBPF requires needing to know the size of the perf ring buffer structure. But it unfortunately has the same name as the generic ring buffer used by tracing and oprofile. To make it less ambiguous, rename the perf ring buffer structure to "perf_buffer". As other parts of the ring buffer code has "perf_" as the prefix, it only makes sense to give the ring buffer the "perf_" prefix as well. Link: https://lore.kernel.org/r/20191213153553.GE20583@krava Acked-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13	ASoC: Intel: common: soc-acpi: declare new tables for SoundWire	Pierre-Louis Bossart
	We cannot really lump SoundWire-based configurations into the same tables since the mechanisms to identify boards is based on link configurations and _ADR instead of _HID for I2S, so define new tables Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200110222530.30303-3-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-13	ASoC: soc-acpi: add _ADR-based link descriptors	Pierre-Louis Bossart
	For SoundWire support, we added a 'link_mask' to describe the PCB hardware layout. This helped form a signature that can be used as a first-order way of detecting the hardware and selecting the machine driver. The concept of link_mask is however not enough. Some BIOS enable all links, even when there are no devices physically connected. We can also see variations with multiple devices attached on one link, or different types of devices connected on the same link. To accurately represent the hardware, we need to build static tables where each link exposes a list of expected devices represented by the 64-bit _ADR field (which uniquely identifies each device). The new 'links' field is optional when the link_mask is sufficient to represent a platform in a unique way. The existing mechanism to support I2C devices is left as is, it'd be too invasive to change the existing support for _HID and the notion of link is not relevant either. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200110222530.30303-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-13	ALSA: hda: Manage concurrent reg access more properly	Takashi Iwai
	In the commit 8e85def5723e ("ALSA: hda: enable regmap internal locking"), we re-enabled the regmap lock due to the reported regression that showed the possible concurrent accesses. It was a temporary workaround, and there are still a few opened races even after the revert. In this patch, we cover those still opened windows with a proper mutex lock and disable the regmap internal lock again. First off, the patch introduces a new snd_hdac_device.regmap_lock mutex that is applied for each snd_hdac_regmap_() call, including read, write and update helpers. The mutex is applied carefully so that it won't block the self-power-up procedure in the helper function. Also, this assures the protection for the accesses without regmap, too. The snd_hdac_regmap_update_raw() is refactored to use the standard regmap_update_bits_check() function instead of the open-code. The non-regmap case is still open-coded but it's an easy part. The all read and write operations are in the single mutex protection, so it's now race-free. In addition, a couple of new helper functions are added: snd_hdac_regmap_update_raw_once() and snd_hdac_regmap_sync(). Both are called from HD-audio legacy driver. The former is to initialize the given verb bits but only once when it's not initialized yet. Due to this condition, the function invokes regcache_cache_only(), and it's now performed inside the regmap_lock (formerly it was racy) too. The latter function is for simply invoking regcache_sync() inside the regmap_lock, which is called from the codec resume call path. Along with that, the HD-audio codec driver code is slightly modified / simplified to adapt those new functions. And finally, snd_hdac_regmap_read_raw(), _write_raw(), etc are rewritten with the helper macro. It's just for simplification because the code logic is identical among all those functions. Tested-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Link: https://lore.kernel.org/r/20200109090104.26073-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-01-13	PM-runtime: add tracepoints for usage_count changes	Michał Mirosław
	Add tracepoints to remaining places where device's power.usage_count is changed. This helps debugging where and why autosuspend is prevented. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-13	Merge 5.5-rc6 into tty-next	Greg Kroah-Hartman
	We need the serial/tty fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-13	Merge 5.5-rc6 into usb-next	Greg Kroah-Hartman
	We need the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>