summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-01-15PCI/switchtec: Rename generation-specific constantsLogan Gunthorpe
Gen4 hardware will have different values for the SWITCHTEC_X_RUNNING and SWITCHTEC_IOCTL_NUM_PARTITIONS, so rename them with GEN3 in their name. No functional changes intended. Link: https://lore.kernel.org/r/20200115035648.2578-2-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15PCI/switchtec: Add support for Intercomm Notify and Upstream Error ContainmentLogan Gunthorpe
Add support for the Inter Fabric Manager Communication (Intercomm) Notify event in PAX variants of Switchtec hardware and the Upstream Error Containment port in the MR1 release of Gen3 firmware. Link: https://lore.kernel.org/r/20200106190337.2428-4-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15PCI/ATS: Add PASID stubsJean-Philippe Brucker
The SMMUv3 driver, which may be built without CONFIG_PCI, will soon gain PASID support. Partially revert commit c6e9aefbf9db ("PCI/ATS: Remove unused PRI and PASID stubs") to re-introduce the PASID stubs, and avoid adding more #ifdefs to the SMMU driver. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15iommu/arm-smmu-v3: Parse PASID devicetree property of platform devicesJean-Philippe Brucker
For platform devices that support SubstreamID (SSID), firmware provides the number of supported SSID bits. Restrict it to what the SMMU supports and cache it into master->ssid_bits, which will also be used for PCI PASID. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15NFS: Add mount option 'softreval'Trond Myklebust
Add a mount option 'softreval' that allows attribute revalidation 'getattr' calls to time out, and causes them to fall back to using the cached attributes. The use case for this option is for ensuring that we can still (slowly) traverse paths and use cached information even when the server is down. Once the server comes back up again, the getattr calls start succeeding, and the caches will revalidate as usual. The 'softreval' mount option is automatically enabled if you have specified 'softerr'. It can be turned off using the options 'nosoftreval', or 'hard'. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15SUNRPC: Remove broken gss_mech_list_pseudoflavors()Trond Myklebust
Remove gss_mech_list_pseudoflavors() and its callers. This is part of an unused API, and could leak an RCU reference if it were ever called. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15sunrpc: convert to time64_t for expiryArnd Bergmann
Using signed 32-bit types for UTC time leads to the y2038 overflow, which is what happens in the sunrpc code at the moment. This changes the sunrpc code over to use time64_t where possible. The one exception is the gss_import_v{1,2}_context() function for kerberos5, which uses 32-bit timestamps in the protocol. Here, we can at least treat the numbers as 'unsigned', which extends the range from 2038 to 2106. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Additional refactoring for fs_context conversionScott Mayhew
Split out from commit "NFS: Add fs_context support." This patch adds additional refactoring for the conversion of NFS to use fs_context, namely: (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context. nfs_clone_mount has had several fields removed, and nfs_mount_info has been removed altogether. (*) Various functions now take an fs_context as an argument instead of nfs_mount_info, nfs_fs_context, etc. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15NFS: Add fs_context support.David Howells
Add filesystem context support to NFS, parsing the options in advance and attaching the information to struct nfs_fs_context. The highlights are: (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context. This structure represents NFS's superblock config. (*) Make use of the VFS's parsing support to split comma-separated lists (*) Pin the NFS protocol module in the nfs_fs_context. (*) Attach supplementary error information to fs_context. This has the downside that these strings must be static and can't be formatted. (*) Remove the auxiliary file_system_type structs since the information necessary can be conveyed in the nfs_fs_context struct instead. (*) Root mounts are made by duplicating the config for the requested mount so as to have the same parameters. Submounts pick up their parameters from the parent superblock. [AV -- retrans is u32, not string] [SM -- Renamed cfg to ctx in a few functions in an earlier patch] [SM -- Moved fs_context mount option parsing to an earlier patch] [SM -- Moved fs_context error logging to a later patch] [SM -- Fixed printks in nfs4_try_get_tree() and nfs4_get_referral_tree()] [SM -- Added is_remount_fc() helper] [SM -- Deferred some refactoring to a later patch] [SM -- Fixed referral mounts, which were broken in the original patch] [SM -- Fixed leak of nfs_fattr when fs_context is freed] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15nfs: don't pass nfs_subversion to ->create_server()Al Viro
pick it from mount_info Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15nfs: don't bother passing nfs_subversion to ->try_mount() and ↵Al Viro
nfs_fs_mount_common() Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15Merge branch 'topic/equal' of ↵Mark Brown
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into asoc-5.6
2020-01-15regulator fix for "regulator: core: Add regulator_is_equal() helper"Stephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20200115120258.0e535fcb@canb.auug.org.au Acked-by: Marek Vasut <marex@denx.de> Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-15serial_core: Remove unused member in uart_portDmitry Safonov
It should remove the align-padding before @name. [yes, there's a "hole" in the structure now, but that's fine, no one cares. If they do care, the whole thing should be restructured using pahole to find a better ordering. Removing this field is good as some drivers have been known to abuse it for other things when they shouldn't have been doing that. -- gregkh] Signed-off-by: Dmitry Safonov <dima@arista.com> Link: https://lore.kernel.org/r/20200114171912.261787-4-dima@arista.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-15gpiolib: Add support for the irqdomain which doesn't use irq_fwspec as argKevin Hao
Some gpio's parent irqdomain may not use the struct irq_fwspec as argument, such as msi irqdomain. So rename the callback populate_parent_fwspec() to populate_parent_alloc_arg() and make it allocate and populate the specific struct which is needed by the parent irqdomain. Signed-off-by: Kevin Hao <haokexin@gmail.com> Link: https://lore.kernel.org/r/20200114082821.14015-3-haokexin@gmail.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-01-15usb: phy-generic: Delete unused platform dataLinus Walleij
The last user of the phy generic platform data was deleted in commit 1e041b6f313aaa966612a7e415cfc09c90d6b829 ("usb: dwc3: exynos: Remove dead code"). So get rid of the platform data, which rids us of another consumer of the legacy GPIO API at the same time. Make sure we only inlcude <linux/gpio/consumer.h> which is all we use. Alter the usb_phy_gen_create_phy() function prototype to not pass any platform data as this is just hardcoded to NULL at all locations calling it in the kernel. Move the devm_gpiod_get* calls out of the if (of_node) parenthesis, as these calls are generic and do not depend on device tree, they are used by any hardware description. Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Felipe Balbi <balbi@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-15Merge tag 'drm-intel-next-2020-01-14' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Final drm/i915 features for v5.6: - DP MST fixes (José) - Fix intel_bw_state memory leak (Pankaj Bharadiya) - Switch context id allocation to xarray (Tvrtko) - ICL/EHL/TGL workarounds (Matt Roper, Tvrtko) - Debugfs for LMEM details (Lukasz Fiedorowicz) - Prefer platform acronyms over codenames in symbols (Lucas) - Tiled and port sync mode fixes for fbdev and DP (Manasi) - DSI panel and backlight enable GPIO fixes (Hans de Goede) - Relax audio min CDCLK requirements on non-GLK (Kai Vehmanen) - Plane alignment and dimension check fixes (Imre) - Fix state checks for PSR (José) - Remove ICL+ clock gating programming (José) - Static checker fixes around bool usage (Ma Feng) - Bring back tests for self-contained headers in i915 (Masahiro Yamada) - Fix DP MST disable sequence (Ville) - Start converting i915 to the new drm device based logging macros (Wambui Karuga) - Add DSI VBT I2C sequence execution (Vivek Kasireddy) - Start using function pointers and ops structs in uc code (Michal) - Fix PMU names to not use colons or dashes (Tvrtko) - TGL media decompression support (DK, Imre) - Split i915_gem_gtt.[ch] to more manageable chunks (Matthew Auld) - Create dumb buffers in LMEM where available (Ram) - Extend mmap support for LMEM (Abdiel) - Selftest updates (Chris) - Hack bump up CDCLK on TGL to avoid underruns (Stan) - Use intel_encoder and intel_connector more instead of drm counterparts (Ville) - Build error fixes (Zhang Xiaoxu) - Fixes related to GPU and engine initialization/resume (Chris) - Support for prefaulting discontiguous objects (Abdiel) - Support discontiguous LMEM object maps (Chris) - Various GEM and GT improvements and fixes (Chris) - Merge pinctrl dependencies branch for the DSI GPIO updates (Jani) - Backmerge drm-next for new logging macros (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87sgkil0v9.fsf@intel.com
2020-01-15Merge tag 'mediatek-drm-next-5.6' of ↵Dave Airlie
https://github.com/ckhu-mediatek/linux.git-tags into drm-next Mediatek DRM Next for Linux 5.6 This fix non-smooth cursor problem, add cmdq support, add ctm property support and some refinement. Signed-off-by: Dave Airlie <airlied@redhat.com> From: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.freedesktop.org/patch/msgid/1578972526.14594.8.camel@mtksdaap41
2020-01-15reimplement path_mountpoint() with less magicAl Viro
... and get rid of a bunch of bugs in it. Background: the reason for path_mountpoint() is that umount() really doesn't want attempts to revalidate the root of what it's trying to umount. The thing we want to avoid actually happen from complete_walk(); solution was to do something parallel to normal path_lookupat() and it both went overboard and got the boilerplate subtly (and not so subtly) wrong. A better solution is to do pretty much what the normal path_lookupat() does, but instead of complete_walk() do unlazy_walk(). All it takes to avoid that ->d_weak_revalidate() call... mountpoint_last() goes away, along with everything it got wrong, and so does the magic around LOOKUP_NO_REVAL. Another source of bugs is that when we traverse mounts at the final location (and we need to do that - umount . expects to get whatever's overmounting ., if any, out of the lookup) we really ought to take care of ->d_manage() - as it is, manual umount of autofs automount in progress can lead to unpleasant surprises for the daemon. Easily solved by using handle_lookup_down() instead of follow_mount(). Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-15Merge tag 'drm/tegra/for-5.6-rc1' of ↵Dave Airlie
git://anongit.freedesktop.org/tegra/linux into drm-next drm/tegra: Changes for v5.6-rc1 This contains a small set of mostly fixes and some minor improvements. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thierry Reding <thierry.reding@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200111004835.2412858-1-thierry.reding@gmail.com
2020-01-14fs-verity: implement readahead of Merkle tree pagesEric Biggers
When fs-verity verifies data pages, currently it reads each Merkle tree page synchronously using read_mapping_page(). Therefore, when the Merkle tree pages aren't already cached, fs-verity causes an extra 4 KiB I/O request for every 512 KiB of data (assuming that the Merkle tree uses SHA-256 and 4 KiB blocks). This results in more I/O requests and performance loss than is strictly necessary. Therefore, implement readahead of the Merkle tree pages. For simplicity, we take advantage of the fact that the kernel already does readahead of the file's *data*, just like it does for any other file. Due to this, we don't really need a separate readahead state (struct file_ra_state) just for the Merkle tree, but rather we just need to piggy-back on the existing data readahead requests. We also only really need to bother with the first level of the Merkle tree, since the usual fan-out factor is 128, so normally over 99% of Merkle tree I/O requests are for the first level. Therefore, make fsverity_verify_bio() enable readahead of the first Merkle tree level, for up to 1/4 the number of pages in the bio, when it sees that the REQ_RAHEAD flag is set on the bio. The readahead size is then passed down to ->read_merkle_tree_page() for the filesystem to (optionally) implement if it sees that the requested page is uncached. While we're at it, also make build_merkle_tree_level() set the Merkle tree readahead size, since it's easy to do there. However, for now don't set the readahead size in fsverity_verify_page(), since currently it's only used to verify holes on ext4 and f2fs, and it would need parameters added to know how much to read ahead. This patch significantly improves fs-verity sequential read performance. Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches: On an ARM64 phone (using sha256-ce): Before: 217 MB/s After: 263 MB/s (compare to sha256sum of non-verity file: 357 MB/s) In an x86_64 VM (using sha256-avx2): Before: 173 MB/s After: 215 MB/s (compare to sha256sum of non-verity file: 223 MB/s) Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-14net: skbuff: disambiguate argument and member for skb_list_walk_safe helperJason A. Donenfeld
This worked before, because we made all callers name their next pointer "next". But in trying to be more "drop-in" ready, the silliness here is revealed. This commit fixes the problem by making the macro argument and the member use different names. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: phy: add MACsec ops in phy_deviceAntoine Tenart
This patch adds a reference to MACsec ops in the phy_device, to allow PHYs to support offloading MACsec operations. The phydev lock will be held while calling those helpers. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: macsec: introduce the macsec_context structureAntoine Tenart
This patch introduces the macsec_context structure. It will be used in the kernel to exchange information between the common MACsec implementation (macsec.c) and the MACsec hardware offloading implementations. This structure contains pointers to MACsec specific structures which contain the actual MACsec configuration, and to the underlying device (phydev for now). Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: phy: Added IRQ print to phylink_bringup_phy()Florian Fainelli
The information about the PHY attached to the PHYLINK instance is useful but is missing the IRQ prints that phy_attached_info() adds. phy_attached_info() is a bit long and it would not be possible to use phylink_info() anyway. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14mtd: spi-nor: remove unused enum spi_nor_opsMichael Walle
The ops aren't used in any SPI NOR controller. Therefore, remove them altogether. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
2020-01-14mm/mmu_notifiers: Use 'interval_sub' as the variable for mmu_interval_notifierJason Gunthorpe
The 'interval_sub' is placed on the 'notifier_subscriptions' interval tree. This eliminates the poor name 'mni' for this variable. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-14mm/mmu_notifiers: Use 'subscription' as the variable name for mmu_notifierJason Gunthorpe
The 'subscription' is placed on the 'notifier_subscriptions' list. This eliminates the poor name 'mn' for this variable. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-14mm/mmu_notifier: Rename struct mmu_notifier_mm to mmu_notifier_subscriptionsJason Gunthorpe
The name mmu_notifier_mm implies that the thing is a mm_struct pointer, and is difficult to abbreviate. The struct is actually holding the interval tree and hlist containing the notifiers subscribed to a mm. Use 'subscriptions' as the variable name for this struct instead of the really terrible and misleading 'mmn_mm'. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-14Merge tag 'regulator-eq' of ↵Mark Brown
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into asoc-5.6 regulator: add regulator_equal()
2020-01-14regulator: core: Add regulator_is_equal() helperMarek Vasut
Add regulator_is_equal() helper to compare whether two regulators are the same. This is useful for checking whether two separate regulators in a driver are actually the same supply. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: Igor Opaniuk <igor.opaniuk@toradex.com> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Marcel Ziswiler <marcel.ziswiler@toradex.com> Cc: Mark Brown <broonie@kernel.org> Cc: Oleksandr Suvorov <oleksandr.suvorov@toradex.com> Link: https://lore.kernel.org/r/20191220164450.1395038-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14misc: alcor_pci: Add AU6625 to list of supported PCI_IDsRhys Perry
I have added the AU6625 PCI_ID to the list of supported IDs: alcor_pci.c // Added au6625s ID to the array of supported devices alcor_pci.h // Added entry to define the PCI ID Made it fit in with the already submitted code: alcor_pci.c // Added config entry to that matches the one for au6601 >From general usage there seems to be no problems. Signed-off-by: Rhys Perry <rhysperry111@gmail.com> Link: https://lore.kernel.org/r/20191229171824.10308-1-rhysperry111@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14fs/proc: Introduce /proc/pid/timens_offsetsAndrei Vagin
API to set time namespace offsets for children processes, i.e.: echo "$clockid $offset_sec $offset_nsec" > /proc/self/timens_offsets Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-28-dima@arista.com
2020-01-14x86/vdso: Zap vvar pages when switching to a time namespaceDmitry Safonov
The VVAR page layout depends on whether a task belongs to the root or non-root time namespace. Whenever a task changes its namespace, the VVAR page tables are cleared and then they will be re-faulted with a corresponding layout. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-27-dima@arista.com
2020-01-14time: Allocate per-timens vvar pageDmitry Safonov
VDSO support for Time namespace needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page contains time namespace clock offsets and it has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. Allocate the timens page during namespace creation. Setup the offsets when the first task enters the ns and freeze them to guarantee the pace of monotonic/boottime clocks and to avoid breakage of applications. The design decision is to have a global offset_lock which is used during namespace offsets setup and to freeze offsets when the first task joins the new time namespace. That is better in terms of memory usage compared to having a per namespace mutex that's used only during the setup period. Suggested-by: Andy Lutomirski <luto@kernel.org> Based-on-work-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-24-dima@arista.com
2020-01-14x86/vdso: Provide vdso_data offset on vvar_pageDmitry Safonov
VDSO support for time namespaces needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. To prepare the time namespace page the kernel needs to know the vdso_data offset. Provide arch_get_vdso_data() helper for locating vdso_data on VVAR page. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com
2020-01-14lib/vdso: Prepare for time namespace supportThomas Gleixner
To support time namespaces in the vdso with a minimal impact on regular non time namespace affected tasks, the namespace handling needs to be hidden in a slow path. The most obvious place is vdso_seq_begin(). If a task belongs to a time namespace then the VVAR page which contains the system wide vdso data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the vdso data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. If VDSO time namespace support is disabled the whole magic is compiled out. Initial testing shows that the disabled case is almost identical to the host case which does not take the slow timens path. With the special timens page installed the performance hit is constant time and in the range of 5-7%. For the vdso functions which are not using the sequence count an unconditional check for vdso_data->clock_mode is added which switches to the real vdso when the clock_mode is VCLOCK_TIMENS. [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data pointer by CS_RAW offset.] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com
2020-01-14hrtimers: Prepare hrtimer_nanosleep() for time namespacesAndrei Vagin
clock_nanosleep() accepts absolute values of expiration time when TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace, and has to be converted to the host's time. There is timens_ktime_to_host() helper for converting time, but it accepts ktime argument. As a preparation, make hrtimer_nanosleep() accept a clock value in ktime instead of timespec64. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-17-dima@arista.com
2020-01-14time: Add do_timens_ktime_to_host() helperAndrei Vagin
The helper subtracts namespace's clock offset from the given time and ensures that the result is within [0, KTIME_MAX]. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-13-dima@arista.com
2020-01-14time: Add timens_offsets to be used for tasks in time namespaceAndrei Vagin
Introduce offsets for time namespace. They will contain an adjustment needed to convert clocks to/from host's. A new namespace is created with the same offsets as the time namespace of the current process. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-5-dima@arista.com
2020-01-14ns: Introduce Time NamespaceAndrei Vagin
Time Namespace isolates clock values. The kernel provides access to several clocks CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc. CLOCK_REALTIME System-wide clock that measures real (i.e., wall-clock) time. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. CLOCK_BOOTTIME Identical to CLOCK_MONOTONIC, except it also includes any time that the system is suspended. For many users, the time namespace means the ability to changes date and time in a container (CLOCK_REALTIME). Providing per namespace notions of CLOCK_REALTIME would be complex with a massive overhead, but has a dubious value. But in the context of checkpoint/restore functionality, monotonic and boottime clocks become interesting. Both clocks are monotonic with unspecified starting points. These clocks are widely used to measure time slices and set timers. After restoring or migrating processes, it has to be guaranteed that they never go backward. In an ideal case, the behavior of these clocks should be the same as for a case when a whole system is suspended. All this means that it is required to set CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace offsets for clocks. A time namespace is similar to a pid namespace in the way how it is created: unshare(CLONE_NEWTIME) system call creates a new time namespace, but doesn't set it to the current process. Then all children of the process will be born in the new time namespace, or a process can use the setns() system call to join a namespace. This scheme allows setting clock offsets for a namespace, before any processes appear in it. All available clone flags have been used, so CLONE_NEWTIME uses the highest bit of CSIGNAL. It means that it can be used only with the unshare() and the clone3() system calls. [ tglx: Adjusted paragraph about clone3() to reality and massaged the changelog a bit. ] Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://criu.org/Time_namespace Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
2020-01-14soundwire: bus: fix device number leak on errorsPierre-Louis Bossart
If the programming of the dev_number fails due to an IO error, a new device_number will be assigned, resulting in a leak. Make sure we only assign a device_number once per Slave device. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200113225637.17313-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-01-14phy: Add DisplayPort configuration optionsYuti Amonkar
Allow DisplayPort PHYs to be configured through the generic functions through a custom structure added to the generic union. The configuration structure is used for reconfiguration of DisplayPort PHYs during link training operation. The parameters added here are the ones defined in the DisplayPort spec v1.4 which include link rate, number of lanes, voltage swing and pre-emphasis. Add the DisplayPort phy mode to the generic phy_mode enum. Signed-off-by: Yuti Amonkar <yamonkar@cadence.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Jyri Sarha <jsarha@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2020-01-13net: stmmac: Initial support for TBSJose Abreu
Adds the initial hooks for TBS support. This needs a 32 byte descriptor in order for it to work with current HW. Adds all the logic for Enhanced Descriptors in main core but no HW related logic for now. Changes from v2: - Use bitfield for TBS status / support (Jakub) - Remove unneeded cache alignment (Jakub) - Fix checkpatch issues Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13mm, debug_pagealloc: don't rely on static keys too earlyVlastimil Babka
Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") has introduced a static key to reduce overhead when debug_pagealloc is compiled in but not enabled. It relied on the assumption that jump_label_init() is called before parse_early_param() as in start_kernel(), so when the "debug_pagealloc=on" option is parsed, it is safe to enable the static key. However, it turns out multiple architectures call parse_early_param() earlier from their setup_arch(). x86 also calls jump_label_init() even earlier, so no issue was found while testing the commit, but same is not true for e.g. ppc64 and s390 where the kernel would not boot with debug_pagealloc=on as found by our QA. To fix this without tricky changes to init code of multiple architectures, this patch partially reverts the static key conversion from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch code) of debug_pagealloc_enabled() will again test a simple bool variable. Fastpath mm code is converted to a new debug_pagealloc_enabled_static() variant that relies on the static key, which is enabled in a well-defined point in mm_init() where it's guaranteed that jump_label_init() has been called, regardless of architecture. [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early] Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm: memcg/slab: fix percpu slab vmstats flushingRoman Gushchin
Currently slab percpu vmstats are flushed twice: during the memcg offlining and just before freeing the memcg structure. Each time percpu counters are summed, added to the atomic counterparts and propagated up by the cgroup tree. The second flushing is required due to how recursive vmstats are implemented: counters are batched in percpu variables on a local level, and once a percpu value is crossing some predefined threshold, it spills over to atomic values on the local and each ascendant levels. It means that without flushing some numbers cached in percpu variables will be dropped on floor each time a cgroup is destroyed. And with uptime the error on upper levels might become noticeable. The first flushing aims to make counters on ancestor levels more precise. Dying cgroups may resume in the dying state for a long time. After kmem_cache reparenting which is performed during the offlining slab counters of the dying cgroup don't have any chances to be updated, because any slab operations will be performed on the parent level. It means that the inaccuracy caused by percpu batching will not decrease up to the final destruction of the cgroup. By the original idea flushing slab counters during the offlining should minimize the visible inaccuracy of slab counters on the parent level. The problem is that percpu counters are not zeroed after the first flushing. So every cached percpu value is summed twice. It creates a small error (up to 32 pages per cpu, but usually less) which accumulates on parent cgroup level. After creating and destroying of thousands of child cgroups, slab counter on parent level can be way off the real value. For now, let's just stop flushing slab counters on memcg offlining. It can't be done correctly without scheduling a work on each cpu: reading and zeroing it during css offlining can race with an asynchronous update, which doesn't expect values to be changed underneath. With this change, slab counters on parent level will become eventually consistent. Once all dying children are gone, values are correct. And if not, the error is capped by 32 * NR_CPUS pages per dying cgroup. It's not perfect, as slab are reparented, so any updates after the reparenting will happen on the parent level. It means that if a slab page was allocated, a counter on child level was bumped, then the page was reparented and freed, the annihilation of positive and negative counter values will not happen until the child cgroup is released. It makes slab counters different from others, and it might want us to implement flushing in a correct form again. But it's also a question of performance: scheduling a work on each cpu isn't free, and it's an open question if the benefit of having more accurate counters is worth it. We might also consider flushing all counters on offlining, not only slab counters. So let's fix the main problem now: make the slab counters eventually consistent, so at least the error won't grow with uptime (or more precisely the number of created and destroyed cgroups). And think about the accuracy of counters separately. Link: http://lkml.kernel.org/r/20191220042728.1045881-1-guro@fb.com Fixes: bee07b33db78 ("mm: memcontrol: flush percpu slab vmstats on kmem offlining") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13ptr_ring: add include of linux/mm.hJesper Dangaard Brouer
Commit 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails") started to use kvmalloc_array and kvfree, which are defined in mm.h, the previous functions kcalloc and kfree, which are defined in slab.h. Add the missing include of linux/mm.h. This went unnoticed as other include files happened to include mm.h. Fixes: 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-13iio: st_sensors: Drop redundant parameter from st_sensors_of_name_probe()Andy Shevchenko
Since we have access to the struct device_driver and thus to the ID table, there is no need to supply special parameters to st_sensors_of_name_probe(). Besides that we have a common API to get driver match data, there is no need to do matching separately for OF and ACPI. Taking into consideration above, simplify the ST sensors code. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2020-01-13arch: wire up pidfd_getfd syscallSargun Dhillon
This wires up the pidfd_getfd syscall for all architectures. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-13vfs, fdtable: Add fget_task helperSargun Dhillon
This introduces a function which can be used to fetch a file, given an arbitrary task. As long as the user holds a reference (refcnt) to the task_struct it is safe to call, and will either return NULL on failure, or a pointer to the file, with a refcnt. This patch is based on Oleg Nesterov's (cf. [1]) patch from September 2018. [1]: Link: https://lore.kernel.org/r/20180915160423.GA31461@redhat.com Signed-off-by: Sargun Dhillon <sargun@sargun.me> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-2-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>