summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2020-01-14tracing: xen: Ordered comparison of function pointersChangbin Du
Just as commit 0566e40ce7 ("tracing: initcall: Ordered comparison of function pointers"), this patch fixes another remaining one in xen.h found by clang-9. In file included from arch/x86/xen/trace.c:21: In file included from ./include/trace/events/xen.h:475: In file included from ./include/trace/define_trace.h:102: In file included from ./include/trace/trace_events.h:473: ./include/trace/events/xen.h:69:7: warning: ordered comparison of function \ pointers ('xen_mc_callback_fn_t' (aka 'void (*)(void *)') and 'xen_mc_callback_fn_t') [-Wordered-compare-function-pointers] __field(xen_mc_callback_fn_t, fn) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/trace/trace_events.h:421:29: note: expanded from macro '__field' ^ ./include/trace/trace_events.h:407:6: note: expanded from macro '__field_ext' is_signed_type(type), filter_type); \ ^ ./include/linux/trace_events.h:554:44: note: expanded from macro 'is_signed_type' ^ Fixes: c796f213a6934 ("xen/trace: add multicall tracing") Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-14net: skbuff: disambiguate argument and member for skb_list_walk_safe helperJason A. Donenfeld
This worked before, because we made all callers name their next pointer "next". But in trying to be more "drop-in" ready, the silliness here is revealed. This commit fixes the problem by making the macro argument and the member use different names. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: macsec: PN wrap callbackAntoine Tenart
Allow to call macsec_pn_wrapped from hardware drivers to notify when a PN rolls over. Some drivers might used an interrupt to implement this. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: macsec: add nla support for changing the offloading selectionAntoine Tenart
MACsec offloading to underlying hardware devices is disabled by default (the software implementation is used). This patch adds support for changing this setting through the MACsec netlink interface. Many checks are done when enabling offloading on a given MACsec interface as there are limitations (it must be supported by the hardware, only a single interface can be offloaded on a given physical device at a time, rules can't be moved for now). Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: phy: add MACsec ops in phy_deviceAntoine Tenart
This patch adds a reference to MACsec ops in the phy_device, to allow PHYs to support offloading MACsec operations. The phydev lock will be held while calling those helpers. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: macsec: introduce MACsec opsAntoine Tenart
This patch introduces MACsec ops for drivers to support offloading MACsec operations. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: macsec: introduce the macsec_context structureAntoine Tenart
This patch introduces the macsec_context structure. It will be used in the kernel to exchange information between the common MACsec implementation (macsec.c) and the MACsec hardware offloading implementations. This structure contains pointers to MACsec specific structures which contain the actual MACsec configuration, and to the underlying device (phydev for now). Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: macsec: move some definitions in a dedicated headerAntoine Tenart
This patch moves some structure, type and identifier definitions into a MACsec specific header. This patch does not modify how the MACsec code is running and only move things around. This is a preparation for the future MACsec hardware offloading support, which will re-use those definitions outside macsec.c. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: phy: Added IRQ print to phylink_bringup_phy()Florian Fainelli
The information about the PHY attached to the PHYLINK instance is useful but is missing the IRQ prints that phy_attached_info() adds. phy_attached_info() is a bit long and it would not be possible to use phylink_info() anyway. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14Merge tag 'asm-generic-5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull asm-generic fixes from Arnd Bergmann: "Here are two bugfixes from Mike Rapoport, both fixing compile-time errors for the nds32 architecture that were recently introduced" * tag 'asm-generic-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: nds32: fix build failure caused by page table folding updates asm-generic/nds32: don't redefine cacheflush primitives
2020-01-14Merge branch 'dhowells' (patches from DavidH)Linus Torvalds
Merge misc fixes from David Howells. Two afs fixes and a key refcounting fix. * dhowells: afs: Fix afs_lookup() to not clobber the version on a new dentry afs: Fix use-after-loss-of-ref keys: Fix request_key() cache
2020-01-14afs: Fix use-after-loss-of-refDavid Howells
afs_lookup() has a tracepoint to indicate the outcome of d_splice_alias(), passing it the inode to retrieve the fid from. However, the function gave up its ref on that inode when it called d_splice_alias(), which may have failed and dropped the inode. Fix this by caching the fid. Fixes: 80548b03991f ("afs: Add more tracepoints") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14Merge tag 'regulator-eq' of ↵Mark Brown
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into asoc-5.6 regulator: add regulator_equal()
2020-01-14regulator: core: Add regulator_is_equal() helperMarek Vasut
Add regulator_is_equal() helper to compare whether two regulators are the same. This is useful for checking whether two separate regulators in a driver are actually the same supply. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: Igor Opaniuk <igor.opaniuk@toradex.com> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Marcel Ziswiler <marcel.ziswiler@toradex.com> Cc: Mark Brown <broonie@kernel.org> Cc: Oleksandr Suvorov <oleksandr.suvorov@toradex.com> Link: https://lore.kernel.org/r/20191220164450.1395038-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14misc: alcor_pci: Add AU6625 to list of supported PCI_IDsRhys Perry
I have added the AU6625 PCI_ID to the list of supported IDs: alcor_pci.c // Added au6625s ID to the array of supported devices alcor_pci.h // Added entry to define the PCI ID Made it fit in with the already submitted code: alcor_pci.c // Added config entry to that matches the one for au6601 >From general usage there seems to be no problems. Signed-off-by: Rhys Perry <rhysperry111@gmail.com> Link: https://lore.kernel.org/r/20191229171824.10308-1-rhysperry111@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14misc: pvpanic: add crash loaded eventzhenwei pi
Some users prefer kdump tools to generate guest kernel dumpfile, at the same time, need a out-of-band kernel panic event. Currently if booting guest kernel with 'crash_kexec_post_notifiers', QEMU will receive PVPANIC_PANICKED event and stop VM. If booting guest kernel without 'crash_kexec_post_notifiers', guest will not call notifier chain. Add PVPANIC_CRASH_LOADED bit for pvpanic event, it means that guest kernel actually hit a kernel panic, but the guest kernel wants to handle by itself. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Link: https://lore.kernel.org/r/20200102023513.318836-3-pizhenwei@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14misc: pvpanic: move bit definition to uapi header filezhenwei pi
Some processes outside of the kernel(Ex, QEMU) should know what the value really is for, so move the bit definition to a uapi file. Suggested-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Link: https://lore.kernel.org/r/20200102023513.318836-2-pizhenwei@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14fs/proc: Introduce /proc/pid/timens_offsetsAndrei Vagin
API to set time namespace offsets for children processes, i.e.: echo "$clockid $offset_sec $offset_nsec" > /proc/self/timens_offsets Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-28-dima@arista.com
2020-01-14x86/vdso: Zap vvar pages when switching to a time namespaceDmitry Safonov
The VVAR page layout depends on whether a task belongs to the root or non-root time namespace. Whenever a task changes its namespace, the VVAR page tables are cleared and then they will be re-faulted with a corresponding layout. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-27-dima@arista.com
2020-01-14time: Allocate per-timens vvar pageDmitry Safonov
VDSO support for Time namespace needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page contains time namespace clock offsets and it has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. Allocate the timens page during namespace creation. Setup the offsets when the first task enters the ns and freeze them to guarantee the pace of monotonic/boottime clocks and to avoid breakage of applications. The design decision is to have a global offset_lock which is used during namespace offsets setup and to freeze offsets when the first task joins the new time namespace. That is better in terms of memory usage compared to having a per namespace mutex that's used only during the setup period. Suggested-by: Andy Lutomirski <luto@kernel.org> Based-on-work-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-24-dima@arista.com
2020-01-14x86/vdso: Provide vdso_data offset on vvar_pageDmitry Safonov
VDSO support for time namespaces needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. To prepare the time namespace page the kernel needs to know the vdso_data offset. Provide arch_get_vdso_data() helper for locating vdso_data on VVAR page. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com
2020-01-14lib/vdso: Prepare for time namespace supportThomas Gleixner
To support time namespaces in the vdso with a minimal impact on regular non time namespace affected tasks, the namespace handling needs to be hidden in a slow path. The most obvious place is vdso_seq_begin(). If a task belongs to a time namespace then the VVAR page which contains the system wide vdso data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the vdso data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. If VDSO time namespace support is disabled the whole magic is compiled out. Initial testing shows that the disabled case is almost identical to the host case which does not take the slow timens path. With the special timens page installed the performance hit is constant time and in the range of 5-7%. For the vdso functions which are not using the sequence count an unconditional check for vdso_data->clock_mode is added which switches to the real vdso when the clock_mode is VCLOCK_TIMENS. [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data pointer by CS_RAW offset.] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com
2020-01-14hrtimers: Prepare hrtimer_nanosleep() for time namespacesAndrei Vagin
clock_nanosleep() accepts absolute values of expiration time when TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace, and has to be converted to the host's time. There is timens_ktime_to_host() helper for converting time, but it accepts ktime argument. As a preparation, make hrtimer_nanosleep() accept a clock value in ktime instead of timespec64. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-17-dima@arista.com
2020-01-14time: Add do_timens_ktime_to_host() helperAndrei Vagin
The helper subtracts namespace's clock offset from the given time and ensures that the result is within [0, KTIME_MAX]. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-13-dima@arista.com
2020-01-14time: Add timens_offsets to be used for tasks in time namespaceAndrei Vagin
Introduce offsets for time namespace. They will contain an adjustment needed to convert clocks to/from host's. A new namespace is created with the same offsets as the time namespace of the current process. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-5-dima@arista.com
2020-01-14ns: Introduce Time NamespaceAndrei Vagin
Time Namespace isolates clock values. The kernel provides access to several clocks CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc. CLOCK_REALTIME System-wide clock that measures real (i.e., wall-clock) time. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. CLOCK_BOOTTIME Identical to CLOCK_MONOTONIC, except it also includes any time that the system is suspended. For many users, the time namespace means the ability to changes date and time in a container (CLOCK_REALTIME). Providing per namespace notions of CLOCK_REALTIME would be complex with a massive overhead, but has a dubious value. But in the context of checkpoint/restore functionality, monotonic and boottime clocks become interesting. Both clocks are monotonic with unspecified starting points. These clocks are widely used to measure time slices and set timers. After restoring or migrating processes, it has to be guaranteed that they never go backward. In an ideal case, the behavior of these clocks should be the same as for a case when a whole system is suspended. All this means that it is required to set CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace offsets for clocks. A time namespace is similar to a pid namespace in the way how it is created: unshare(CLONE_NEWTIME) system call creates a new time namespace, but doesn't set it to the current process. Then all children of the process will be born in the new time namespace, or a process can use the setns() system call to join a namespace. This scheme allows setting clock offsets for a namespace, before any processes appear in it. All available clone flags have been used, so CLONE_NEWTIME uses the highest bit of CSIGNAL. It means that it can be used only with the unshare() and the clone3() system calls. [ tglx: Adjusted paragraph about clone3() to reality and massaged the changelog a bit. ] Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://criu.org/Time_namespace Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
2020-01-14lib/vdso: Add unlikely() hint into vdso_read_begin()Andrei Vagin
Place the branch with no concurrent write before the contended case. Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (more clock_gettime() cycles - the better): | before | after ----------------------------------- | 150252214 | 153242367 | 150301112 | 153324800 | 150392773 | 153125401 | 150373957 | 153399355 | 150303157 | 153489417 | 150365237 | 153494270 ----------------------------------- avg | 150331408 | 153345935 diff % | 2 | 0 ----------------------------------- stdev % | 0.3 | 0.1 Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20191112012724.250792-2-dima@arista.com
2020-01-14soundwire: bus: fix device number leak on errorsPierre-Louis Bossart
If the programming of the dev_number fails due to an IO error, a new device_number will be assigned, resulting in a leak. Make sure we only assign a device_number once per Slave device. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200113225637.17313-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14phy: Add DisplayPort configuration optionsYuti Amonkar
Allow DisplayPort PHYs to be configured through the generic functions through a custom structure added to the generic union. The configuration structure is used for reconfiguration of DisplayPort PHYs during link training operation. The parameters added here are the ones defined in the DisplayPort spec v1.4 which include link rate, number of lanes, voltage swing and pre-emphasis. Add the DisplayPort phy mode to the generic phy_mode enum. Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Jyri Sarha <jsarha@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-13net: stmmac: Initial support for TBSJose Abreu
Adds the initial hooks for TBS support. This needs a 32 byte descriptor in order for it to work with current HW. Adds all the logic for Enhanced Descriptors in main core but no HW related logic for now. Changes from v2: - Use bitfield for TBS status / support (Jakub) - Remove unneeded cache alignment (Jakub) - Fix checkpatch issues Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATEYang Shi
Commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") introduced a new khugepaged scan result: SCAN_PAGE_HAS_PRIVATE, but the corresponding description for trace events were not added. Link: http://lkml.kernel.org/r/1574793844-2914-1-git-send-email-yang.shi@linux.alibaba.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Song Liu <songliubraving@fb.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm, debug_pagealloc: don't rely on static keys too earlyVlastimil Babka
Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") has introduced a static key to reduce overhead when debug_pagealloc is compiled in but not enabled. It relied on the assumption that jump_label_init() is called before parse_early_param() as in start_kernel(), so when the "debug_pagealloc=on" option is parsed, it is safe to enable the static key. However, it turns out multiple architectures call parse_early_param() earlier from their setup_arch(). x86 also calls jump_label_init() even earlier, so no issue was found while testing the commit, but same is not true for e.g. ppc64 and s390 where the kernel would not boot with debug_pagealloc=on as found by our QA. To fix this without tricky changes to init code of multiple architectures, this patch partially reverts the static key conversion from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch code) of debug_pagealloc_enabled() will again test a simple bool variable. Fastpath mm code is converted to a new debug_pagealloc_enabled_static() variant that relies on the static key, which is enabled in a well-defined point in mm_init() where it's guaranteed that jump_label_init() has been called, regardless of architecture. [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early] Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm: memcg/slab: fix percpu slab vmstats flushingRoman Gushchin
Currently slab percpu vmstats are flushed twice: during the memcg offlining and just before freeing the memcg structure. Each time percpu counters are summed, added to the atomic counterparts and propagated up by the cgroup tree. The second flushing is required due to how recursive vmstats are implemented: counters are batched in percpu variables on a local level, and once a percpu value is crossing some predefined threshold, it spills over to atomic values on the local and each ascendant levels. It means that without flushing some numbers cached in percpu variables will be dropped on floor each time a cgroup is destroyed. And with uptime the error on upper levels might become noticeable. The first flushing aims to make counters on ancestor levels more precise. Dying cgroups may resume in the dying state for a long time. After kmem_cache reparenting which is performed during the offlining slab counters of the dying cgroup don't have any chances to be updated, because any slab operations will be performed on the parent level. It means that the inaccuracy caused by percpu batching will not decrease up to the final destruction of the cgroup. By the original idea flushing slab counters during the offlining should minimize the visible inaccuracy of slab counters on the parent level. The problem is that percpu counters are not zeroed after the first flushing. So every cached percpu value is summed twice. It creates a small error (up to 32 pages per cpu, but usually less) which accumulates on parent cgroup level. After creating and destroying of thousands of child cgroups, slab counter on parent level can be way off the real value. For now, let's just stop flushing slab counters on memcg offlining. It can't be done correctly without scheduling a work on each cpu: reading and zeroing it during css offlining can race with an asynchronous update, which doesn't expect values to be changed underneath. With this change, slab counters on parent level will become eventually consistent. Once all dying children are gone, values are correct. And if not, the error is capped by 32 * NR_CPUS pages per dying cgroup. It's not perfect, as slab are reparented, so any updates after the reparenting will happen on the parent level. It means that if a slab page was allocated, a counter on child level was bumped, then the page was reparented and freed, the annihilation of positive and negative counter values will not happen until the child cgroup is released. It makes slab counters different from others, and it might want us to implement flushing in a correct form again. But it's also a question of performance: scheduling a work on each cpu isn't free, and it's an open question if the benefit of having more accurate counters is worth it. We might also consider flushing all counters on offlining, not only slab counters. So let's fix the main problem now: make the slab counters eventually consistent, so at least the error won't grow with uptime (or more precisely the number of created and destroyed cgroups). And think about the accuracy of counters separately. Link: http://lkml.kernel.org/r/20191220042728.1045881-1-guro@fb.com Fixes: bee07b33db78 ("mm: memcontrol: flush percpu slab vmstats on kmem offlining") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13ptr_ring: add include of linux/mm.hJesper Dangaard Brouer
Commit 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails") started to use kvmalloc_array and kvfree, which are defined in mm.h, the previous functions kcalloc and kfree, which are defined in slab.h. Add the missing include of linux/mm.h. This went unnoticed as other include files happened to include mm.h. Fixes: 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13iio: st_sensors: Drop redundant parameter from st_sensors_of_name_probe()Andy Shevchenko
Since we have access to the struct device_driver and thus to the ID table, there is no need to supply special parameters to st_sensors_of_name_probe(). Besides that we have a common API to get driver match data, there is no need to do matching separately for OF and ACPI. Taking into consideration above, simplify the ST sensors code. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2020-01-13md/raid6: fix algorithm choice under larger PAGE_SIZEZhengyuan Liu
There are several algorithms available for raid6 to generate xor and syndrome parity, including basic int1, int2 ... int32 and SIMD optimized implementation like sse and neon. To test and choose the best algorithms at the initial stage, we need provide enough disk data to feed the algorithms. However, the disk number we provided depends on page size and gfmul table, seeing bellow: const int disks = (65536/PAGE_SIZE) + 2; So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk, as a result the chosed algorithm is not reliable. For example, on my arm64 machine with 64K page enabled, it will choose intx32 as the best one, although the NEON implementation is better. This patch tries to fix the problem by defining a constant raid6 disk number to supporting arbitrary page size. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13raid6/test: fix a compilation warningZhengyuan Liu
The compilation warning is redefination showed as following: In file included from tables.c:2: ../../../include/linux/export.h:180: warning: "EXPORT_SYMBOL" redefined #define EXPORT_SYMBOL(sym) __EXPORT_SYMBOL(sym, "") In file included from tables.c:1: ../../../include/linux/raid/pq.h:61: note: this is the location of the previous definition #define EXPORT_SYMBOL(sym) Fixes: 69a94abb82ee ("export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols") Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13raid6/test: fix a compilation errorZhengyuan Liu
The compilation error is redeclaration showed as following: In file included from ../../../include/linux/limits.h:6, from /usr/include/x86_64-linux-gnu/bits/local_lim.h:38, from /usr/include/x86_64-linux-gnu/bits/posix1_lim.h:161, from /usr/include/limits.h:183, from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:194, from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/syslimits.h:7, from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:34, from ../../../include/linux/raid/pq.h:30, from algos.c:14: ../../../include/linux/types.h:114:15: error: conflicting types for ‘int64_t’ typedef s64 int64_t; ^~~~~~~ In file included from /usr/include/stdint.h:34, from /usr/lib/gcc/x86_64-linux-gnu/8/include/stdint.h:9, from /usr/include/inttypes.h:27, from ../../../include/linux/raid/pq.h:29, from algos.c:14: /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous \ declaration of ‘int64_t’ was here typedef __int64_t int64_t; Fixes: 54d50897d544 ("linux/kernel.h: split *_MAX and *_MIN macros into <linux/limits.h>") Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
2020-01-13platform/mellanox: mlxreg-hotplug: Add support for new capability registerVadim Pasternak
Add support for capability register, which is used for detection of the actual number of interrupt capable components within the particular group, supported by the specific system. Such components could be for example the number of power units and interrupts related to these units. The motivation is to avoid adding a new code in the future in order to distinct between the systems type supported different number of the components like power supplies, FANs, ASICs, line cards. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-01-13ASoC: Intel: common: soc-acpi: declare new tables for SoundWirePierre-Louis Bossart
We cannot really lump SoundWire-based configurations into the same tables since the mechanisms to identify boards is based on link configurations and _ADR instead of _HID for I2S, so define new tables Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200110222530.30303-3-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-13ASoC: soc-acpi: add _ADR-based link descriptorsPierre-Louis Bossart
For SoundWire support, we added a 'link_mask' to describe the PCB hardware layout. This helped form a signature that can be used as a first-order way of detecting the hardware and selecting the machine driver. The concept of link_mask is however not enough. Some BIOS enable all links, even when there are no devices physically connected. We can also see variations with multiple devices attached on one link, or different types of devices connected on the same link. To accurately represent the hardware, we need to build static tables where each link exposes a list of expected devices represented by the 64-bit _ADR field (which uniquely identifies each device). The new 'links' field is optional when the link_mask is sufficient to represent a platform in a unique way. The existing mechanism to support I2C devices is left as is, it'd be too invasive to change the existing support for _HID and the notion of link is not relevant either. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200110222530.30303-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-13ALSA: hda: Manage concurrent reg access more properlyTakashi Iwai
In the commit 8e85def5723e ("ALSA: hda: enable regmap internal locking"), we re-enabled the regmap lock due to the reported regression that showed the possible concurrent accesses. It was a temporary workaround, and there are still a few opened races even after the revert. In this patch, we cover those still opened windows with a proper mutex lock and disable the regmap internal lock again. First off, the patch introduces a new snd_hdac_device.regmap_lock mutex that is applied for each snd_hdac_regmap_*() call, including read, write and update helpers. The mutex is applied carefully so that it won't block the self-power-up procedure in the helper function. Also, this assures the protection for the accesses without regmap, too. The snd_hdac_regmap_update_raw() is refactored to use the standard regmap_update_bits_check() function instead of the open-code. The non-regmap case is still open-coded but it's an easy part. The all read and write operations are in the single mutex protection, so it's now race-free. In addition, a couple of new helper functions are added: snd_hdac_regmap_update_raw_once() and snd_hdac_regmap_sync(). Both are called from HD-audio legacy driver. The former is to initialize the given verb bits but only once when it's not initialized yet. Due to this condition, the function invokes regcache_cache_only(), and it's now performed inside the regmap_lock (formerly it was racy) too. The latter function is for simply invoking regcache_sync() inside the regmap_lock, which is called from the codec resume call path. Along with that, the HD-audio codec driver code is slightly modified / simplified to adapt those new functions. And finally, snd_hdac_regmap_read_raw(), *_write_raw(), etc are rewritten with the helper macro. It's just for simplification because the code logic is identical among all those functions. Tested-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Link: https://lore.kernel.org/r/20200109090104.26073-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-01-13PM-runtime: add tracepoints for usage_count changesMichał Mirosław
Add tracepoints to remaining places where device's power.usage_count is changed. This helps debugging where and why autosuspend is prevented. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-13Merge 5.5-rc6 into tty-nextGreg Kroah-Hartman
We need the serial/tty fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-13Merge 5.5-rc6 into usb-nextGreg Kroah-Hartman
We need the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-13ACPICA: Update version to 20200110Bob Moore
ACPICA commit f78d50aacc2a1c6dfa59052a696a54cec16e6aab Version 20200110. Link: https://github.com/acpica/acpica/commit/f78d50aa Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-13ACPICA: All acpica: Update copyrights to 2020 Including tool signons.Bob Moore
ACPICA commit 8b9c69d0984067051ffbe8526f871448ead6a26b Link: https://github.com/acpica/acpica/commit/8b9c69d0 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-13Merge 5.5-rc6 into staging-nextGreg Kroah-Hartman
We need the staging fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-13extcon: Remove unneeded extern keyword from extcon.hChanwoo Choi
'extern' keyword is unneeded in extcon.h because public header file of extcon defines the function prototype. Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-01-12Merge tag 'riscv/for-v5.5-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "Two fixes for RISC-V: - Clear FP registers during boot when FP support is present, rather than when they aren't present - Move the header files associated with the SiFive L2 cache controller to drivers/soc (where the code was recently moved)" * tag 'riscv/for-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fixup obvious bug for fp-regs reset riscv: move sifive_l2_cache.h to include/soc