summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-10-11software node: Make argument to to_software_node constSakari Ailus
to_software_node() does not need to modify the fwnode_handle it operates on; therefore make it const. This allows passing a const fwnode_handle to to_software_node(). Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-10-11Merge tag 'drm-misc-next-2019-10-09-2' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.5: UAPI Changes: -Colorspace: Expose different prop values for DP vs. HDMI (Gwan-gyeong Mun) -fourcc: Add DRM_FORMAT_MOD_ARM_16X16_BLOCK_U_INTERLEAVED (Raymond) -not_actually: s/ENOTSUPP/EOPNOTSUPP/ in drm_edid and drm_mipi_dbi. This should not reach userspace, but adding here to specifically call that out (Daniel) -i810: Prevent underflow in dispatch ioctls (Dan) -komeda: Add ACLK sysfs attribute (Mihail) -v3d: Allow userspace to clean up after render jobs (Iago) Cross-subsystem Changes: -MAINTAINERS: -Add Alyssa & Steven as panfrost reviewers (Rob) -Add Jernej as DE2 reviewer (Maxime) -Add Chen-Yu as Allwinner maintainer (Maxime) -staging: Make some stack arrays static const (Colin) Core Changes: -ttm: Allow drivers to specify their vma manager (to use gem mgr) (Gerd) -docs: Various fixes in connector/encoder/bridge docs (Daniel, Lyude, Laurent) -connector: Allow more than 3 possible encoders for a connector (José) -dp_cec: Allow a connector to be associated with a cec device (Dariusz) -various: Fix some compile/sparse warnings (Ville) -mm: Ensure mm node removals are properly serialised (Chris) -panel: Specify the type of panel for drm_panels for later use (Laurent) -panel: Use drm_panel_init to init device and funcs (Laurent) -mst: Refactors and cleanups in anticipation of suspend/resume support (Lyude) -vram: -Add lazy unmapping for gem bo's (Thomas) -Unify and rationalize vram mm and gem vram (Thomas) -Expose vmap and vunmap for gem vram objects (Thomas) -Allow objects to be pinned at the top of vram to avoid fragmentation (Thomas) Driver Changes: -various: Include drm_bridge.h instead of relying on drm_crtc.h (Boris) -ast/mgag200: Refactor show_cursor(), move cursor to top of video mem (Thomas) -komeda: -Add error event printing (behind CONFIG) and reg dump support (Lowry) -Add suspend/resume support (Lowry) -Workaround D71 shadow registers not flushing on disable (Lowry) -meson: Add suspend/resume support (Neil) -omap: Miscellaneous refactors and improvements (Tomi/Jyri) -panfrost/shmem: Silence lockdep by using mutex_trylock (Rob) -panfrost: Miscellaneous small fixes (Rob/Steven) -sti: Fix warnings (Benjamin/Linus) -sun4i: -Add vcc-dsi regulator to sun6i_mipi_dsi (Jagan) -A few patches to figure out the DRQ/start delay calc on dsi (Jagan/Icenowy) -virtio: -Add module param to switch resource reuse workaround on/off (Gerd) -Avoid calling vmexit while holding spinlock (Gerd) -Use gem shmem helpers instead of ttm (Gerd) -Accommodate command buffer allocations too big for cma (David) Cc: Rob Herring <robh@kernel.org> Cc: Maxime Ripard <mripard@kernel.org> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Lyude Paul <lyude@redhat.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Dariusz Marcinkiewicz <darekm@google.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Raymond Smith <raymond.smith@arm.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Colin Ian King <colin.king@canonical.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Mihail Atanassov <Mihail.Atanassov@arm.com> Cc: Lowry Li <Lowry.Li@arm.com> Cc: Neil Armstrong <narmstrong@baylibre.com> Cc: Jyri Sarha <jsarha@ti.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Benjamin Gaignard <benjamin.gaignard@st.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Jagan Teki <jagan@amarulasolutions.com> Cc: Icenowy Zheng <icenowy@aosc.io> Cc: Iago Toral Quiroga <itoral@igalia.com> Cc: David Riley <davidriley@chromium.org> Signed-off-by: Dave Airlie <airlied@redhat.com> # gpg: Signature made Thu 10 Oct 2019 01:00:47 AM AEST # gpg: using RSA key 732C002572DCAF79 # gpg: Can't check signature: public key not found # Conflicts: # drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c # drivers/gpu/drm/i915/i915_drv.c # drivers/gpu/drm/i915/i915_gem.c # drivers/gpu/drm/i915/i915_gem_gtt.c # drivers/gpu/drm/i915/i915_vma.c From: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20191009150825.GA227673@art_vandelay
2019-10-10seccomp: simplify secure_computing()Christian Brauner
Afaict, the struct seccomp_data argument to secure_computing() is unused by all current callers. So let's remove it. The argument was added in [1]. It was added because having the arch supply the syscall arguments used to be faster than having it done by secure_computing() (cf. Andy's comment in [2]). This is not true anymore though. /* References */ [1]: 2f275de5d1ed ("seccomp: Add a seccomp_data parameter secure_computing()") [2]: https://lore.kernel.org/r/CALCETrU_fs_At-hTpr231kpaAd0z7xJN4ku-DvzhRU6cvcJA_w@mail.gmail.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-parisc@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: x86@kernel.org Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20190924064420.6353-1-christian.brauner@ubuntu.com Signed-off-by: Kees Cook <keescook@chromium.org>
2019-10-10SUNRPC: fix race to sk_err after xs_error_reportBenjamin Coddington
Since commit 4f8943f80883 ("SUNRPC: Replace direct task wakeups from softirq context") there has been a race to the value of the sk_err if both XPRT_SOCK_WAKE_ERROR and XPRT_SOCK_WAKE_DISCONNECT are set. In that case, we may end up losing the sk_err value that existed when xs_error_report was called. Fix this by reverting to the previous behavior: instead of using SO_ERROR to retrieve the value at a later time (which might also return sk_err_soft), copy the sk_err value onto struct sock_xprt, and use that value to wake pending tasks. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: 4f8943f80883 ("SUNRPC: Replace direct task wakeups from softirq context") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-10crypto: inside-secure - Remove #ifdef checksArnd Bergmann
When both PCI and OF are disabled, no drivers are registered, and we get some unused-function warnings: drivers/crypto/inside-secure/safexcel.c:1221:13: error: unused function 'safexcel_unregister_algorithms' [-Werror,-Wunused-function] static void safexcel_unregister_algorithms(struct safexcel_crypto_priv *priv) drivers/crypto/inside-secure/safexcel.c:1307:12: error: unused function 'safexcel_probe_generic' [-Werror,-Wunused-function] static int safexcel_probe_generic(void *pdev, drivers/crypto/inside-secure/safexcel.c:1531:13: error: unused function 'safexcel_hw_reset_rings' [-Werror,-Wunused-function] static void safexcel_hw_reset_rings(struct safexcel_crypto_priv *priv) It's better to make the compiler see what is going on and remove such ifdef checks completely. In case of PCI, this is trivial since pci_register_driver() is defined to an empty function that makes the compiler subsequently drop all unused code silently. The global pcireg_rc/ofreg_rc variables are not actually needed here since the driver registration does not fail in ways that would make it helpful. For CONFIG_OF, an IS_ENABLED() check is still required, since platform drivers can exist both with and without it. A little change to linux/pci.h is needed to ensure that pcim_enable_device() is visible to the driver. Moving the declaration outside of ifdef would be sufficient here, but for consistency with the rest of the file, adding an inline helper is probably best. Fixes: 212ef6f29e5b ("crypto: inside-secure - Fix unused variable warning when CONFIG_PCI=n") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci.h Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-10-10base: soc: Handle custom soc information sysfs entriesMurali Nalajala
Soc framework exposed sysfs entries are not sufficient for some of the h/w platforms. Currently there is no interface where soc drivers can expose further information about their SoCs via soc framework. This change address this limitation where clients can pass their custom entries as attribute group and soc framework would expose them as sysfs properties. Signed-off-by: Murali Nalajala <mnalajal@codeaurora.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/1570480662-25252-1-git-send-email-mnalajal@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-09DIM: fix dim.h kernel-doc and headersRandy Dunlap
Lots of fixes to kernel-doc in structs, enums, and functions. Also add header files that are being used but not yet #included. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Yamin Friedman <yaminf@mellanox.com> Cc: Tal Gilboa <talgi@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-09Merge tag 'spi-ptp-api' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull in a dependency for Vladimir's work on more precise packet time stamping. Mark Brown says: ==================== spi: Add a PTP API For detailed timestamping of operations. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-09Input: pixcir_i2c_ts - move definitions into a single fileFabio Estevam
All the defined symbols from linux/platform_data/pixcir_i2c_ts.h are only used by the pixcir_i2c_ts driver, so move all the definitions locally and get rid of the pixcir_i2c_ts.h file. Signed-off-by: Fabio Estevam <festevam@gmail.com> Reviewed-by: Roger Quadros <rogerq@ti.com> Tested-by: Michal Vokáč <michal.vokac@ysoft.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-10-09NFS: handle source server rebootOlga Kornievskaia
When the source server reboots after a server-to-server copy was issued, we need to retry the copy from COPY_NOTIFY. We need to detect that the source server rebooted and there is a copy waiting on a destination server and wake it up. Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09NFS: add ca_source_server<> to COPYOlga Kornievskaia
Support only one source server address: the same address that the client and source server use. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09NFS: add COPY_NOTIFY operationOlga Kornievskaia
Try using the delegation stateid, then the open stateid. Only NL4_NETATTR, No support for NL4_NAME and NL4_URL. Allow only one source server address to be returned for now. To distinguish between same server copy offload ("intra") and a copy between different server ("inter"), do a check of server owner identity and also make sure server is capable of doing a copy offload. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09NFS NFSD: defining nl4_servers structure needed by bothOlga Kornievskaia
These structures are needed by COPY_NOTIFY on the client and needed by the nfsd as well Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09soc: ti: omap-prm: add support for denying idle for reset clockdomainTero Kristo
TI SoCs hardware reset signals require the parent clockdomain to be in force wakeup mode while de-asserting the reset, otherwise it may never complete. To support this, add pdata hooks to control the clockdomain directly. Signed-off-by: Tero Kristo <t-kristo@ti.com> Reviewed-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2019-10-09locking/lockdep: Remove unused @nested argument from lock_release()Qian Cai
Since the following commit: b4adfe8e05f1 ("locking/lockdep: Remove unused argument in __lock_release") @nested is no longer used in lock_release(), so remove it from all lock_release() calls and friends. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: alexander.levin@microsoft.com Cc: daniel@iogearbox.net Cc: davem@davemloft.net Cc: dri-devel@lists.freedesktop.org Cc: duyuyang@gmail.com Cc: gregkh@linuxfoundation.org Cc: hannes@cmpxchg.org Cc: intel-gfx@lists.freedesktop.org Cc: jack@suse.com Cc: jlbec@evilplan.or Cc: joonas.lahtinen@linux.intel.com Cc: joseph.qi@linux.alibaba.com Cc: jslaby@suse.com Cc: juri.lelli@redhat.com Cc: maarten.lankhorst@linux.intel.com Cc: mark@fasheh.com Cc: mhocko@kernel.org Cc: mripard@kernel.org Cc: ocfs2-devel@oss.oracle.com Cc: rodrigo.vivi@intel.com Cc: sean@poorly.run Cc: st@kernel.org Cc: tj@kernel.org Cc: tytso@mit.edu Cc: vdavydov.dev@gmail.com Cc: vincent.guittot@linaro.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-09sched/cputime: Spare a seqcount lock/unlock cycle on context switchFrederic Weisbecker
On context switch we are locking the vtime seqcount of the scheduling-out task twice: * On vtime_task_switch_common(), when we flush the pending vtime through vtime_account_system() * On arch_vtime_task_switch() to reset the vtime state. This is pointless as these actions can be performed without the need to unlock/lock in the middle. The reason these steps are separated is to consolidate a very small amount of common code between CONFIG_VIRT_CPU_ACCOUNTING_GEN and CONFIG_VIRT_CPU_ACCOUNTING_NATIVE. Performance in this fast path is definitely a priority over artificial code factorization so split the task switch code between GEN and NATIVE and mutualize the parts than can run under a single seqcount locked block. As a side effect, vtime_account_idle() becomes included in the seqcount protection. This happens to be a welcome preparation in order to properly support kcpustat under vtime in the future and fetch CPUTIME_IDLE without race. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191003161745.28464-3-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-09sched/cputime: Rename vtime_account_system() to vtime_account_kernel()Frederic Weisbecker
vtime_account_system() decides if we need to account the time to the system (__vtime_account_system()) or to the guest (vtime_account_guest()). So this function is a misnomer as we are on a higher level than "system". All we know when we call that function is that we are accounting kernel cputime. Whether it belongs to guest or system time is a lower level detail. Rename this function to vtime_account_kernel(). This will clarify things and avoid too many underscored vtime_account_system() versions. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191003161745.28464-2-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-08Revert "tun: call dev_get_valid_name() before register_netdevice()"Eric Dumazet
This reverts commit 0ad646c81b2182f7fa67ec0c8c825e0ee165696d. As noticed by Jakub, this is no longer needed after commit 11fc7d5a0a2d ("tun: fix memory leak in error path") This no longer exports dev_get_valid_name() for the exclusive use of tun driver. Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-08soc: qcom: Fix llcc-qcom definitions to includeYueHaibing
commit 99356b03b431 ("soc: qcom: Make llcc-qcom a generic driver") move these out of llcc-qcom.h, make the building fails: drivers/edac/qcom_edac.c:86:40: error: array type has incomplete element type struct llcc_edac_reg_data static const struct llcc_edac_reg_data edac_reg_data[] = { ^~~~~~~~~~~~~ drivers/edac/qcom_edac.c:87:3: error: array index in non-array initializer [LLCC_DRAM_CE] = { ^~~~~~~~~~~~ drivers/edac/qcom_edac.c:87:3: note: (near initialization for edac_reg_data) drivers/edac/qcom_edac.c:88:3: error: field name not in record or union initializer .name = "DRAM Single-bit", ... drivers/edac/qcom_edac.c:169:51: warning: struct llcc_drv_data declared inside parameter list will not be visible outside of this definition or declaration qcom_llcc_clear_error_status(int err_type, struct llcc_drv_data *drv) ^~~~~~~~~~~~~ This patch move the needed definitions back to include. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 99356b03b431 ("soc: qcom: Make llcc-qcom a generic driver") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-10-08Merge tag 'led-fixes-for-5.4-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED fixes from Jacek Anaszewski: - fix a leftover from earlier stage of development in the documentation of recently added led_compose_name() and fix old mistake in the documentation of led_set_brightness_sync() parameter name. - MAINTAINERS: add pointer to Pavel Machek's linux-leds.git tree. Pavel is going to take over LED tree maintainership from myself. * tag 'led-fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: Add my linux-leds branch to MAINTAINERS leds: core: Fix leds.h structure documentation
2019-10-08leds: core: Fix leds.h structure documentationDan Murphy
Update the leds.h structure documentation to define the correct arguments. Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
2019-10-08of/address: Fix of_pci_range_parser_one translation of DMA addressesRob Herring
of_pci_range_parser_one() has a bug when parsing dma-ranges. When it translates the parent address (aka cpu address in the code), 'ranges' is always being used. This happens to work because most users are just 1:1 translation. Cc: Robin Murphy <robin.murphy@arm.com> Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Rob Herring <robh@kernel.org>
2019-10-08of: Make of_dma_get_range() privateRob Herring
of_dma_get_range() is only used within the DT core code, so remove the export and move the header declaration to the private header. Cc: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rob Herring <robh@kernel.org>
2019-10-08of: Remove unused of_find_matching_node_by_address()Rob Herring
of_find_matching_node_by_address() is unused, so remove it. Cc: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Rob Herring <robh@kernel.org>
2019-10-08spi: Add a PTP system timestamp to the transfer structureVladimir Oltean
SPI is one of the interfaces used to access devices which have a POSIX clock driver (real time clocks, 1588 timers etc). The fact that the SPI bus is slow is not what the main problem is, but rather the fact that drivers don't take a constant amount of time in transferring data over SPI. When there is a high delay in the readout of time, there will be uncertainty in the value that has been read out of the peripheral. When that delay is constant, the uncertainty can at least be approximated with a certain accuracy which is fine more often than not. Timing jitter occurs all over in the kernel code, and is mainly caused by having to let go of the CPU for various reasons such as preemption, servicing interrupts, going to sleep, etc. Another major reason is CPU dynamic frequency scaling. It turns out that the problem of retrieving time from a SPI peripheral with high accuracy can be solved by the use of "PTP system timestamping" - a mechanism to correlate the time when the device has snapshotted its internal time counter with the Linux system time at that same moment. This is sufficient for having a precise time measurement - it is not necessary for the whole SPI transfer to be transmitted "as fast as possible", or "as low-jitter as possible". The system has to be low-jitter for a very short amount of time to be effective. This patch introduces a PTP system timestamping mechanism in struct spi_transfer. This is to be used by SPI device drivers when they need to know the exact time at which the underlying device's time was snapshotted. More often than not, SPI peripherals have a very exact timing for when their SPI-to-interconnect bridge issues a transaction for snapshotting and reading the time register, and that will be dependent on when the SPI-to-interconnect bridge figures out that this is what it should do, aka as soon as it sees byte N of the SPI transfer. Since spi_device drivers are the ones who'd know best how the peripheral behaves in this regard, expose a mechanism in spi_transfer which allows them to specify which word (or word range) from the transfer should be timestamped. Add a default implementation of the PTP system timestamping in the SPI core. This is not going to be satisfactory performance-wise, but should at least increase the likelihood that SPI device drivers will use PTP system timestamping in the future. There are 3 entry points from the core towards the SPI controller drivers: - transfer_one: The driver is passed individual spi_transfers to execute. This is the easiest to timestamp. - transfer_one_message: The core passes the driver an entire spi_message (a potential batch of spi_transfers). The core puts the same pre and post timestamp to all transfers within a message. This is not ideal, but nothing better can be done by default anyway, since the core has no insight into how the driver batches the transfers. - transfer: Like transfer_one_message, but for unqueued drivers (i.e. the driver implements its own queue scheduling). Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20190905010114.26718-3-olteanv@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-08lib/string: Make memzero_explicit() inline instead of externalArvind Sankar
With the use of the barrier implied by barrier_data(), there is no need for memzero_explicit() to be extern. Making it inline saves the overhead of a function call, and allows the code to be reused in arch/*/purgatory without having to duplicate the implementation. Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H . Peter Anvin <hpa@zytor.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephan Mueller <smueller@chronox.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-crypto@vger.kernel.org Cc: linux-s390@vger.kernel.org Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit") Link: https://lkml.kernel.org/r/20191007220000.GA408752@rani.riverdale.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-08net/mlx5: Expose optimal performance scatter entries capabilityYamin Friedman
Expose maximum scatter entries per RDMA READ for optimal performance. Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-10-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "The usual shower of hotfixes. Chris's memcg patches aren't actually fixes - they're mature but a few niggling review issues were late to arrive. The ocfs2 fixes are quite old - those took some time to get reviewer attention. Subsystems affected by this patch series: ocfs2, hotfixes, mm/memcg, mm/slab-generic" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) mm, sl[ou]b: improve memory accounting mm, memcg: make scan aggression always exclude protection mm, memcg: make memory.emin the baseline for utilisation determination mm, memcg: proportional memory.{low,min} reclaim mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() mm/page_alloc.c: fix a crash in free_pages_prepare() mm/z3fold.c: claim page in the beginning of free kernel/sysctl.c: do not override max_threads provided by userspace memcg: only record foreign writebacks with dirty pages when memcg is not disabled mm: fix -Wmissing-prototypes warnings writeback: fix use-after-free in finish_writeback_work() mm/memremap: drop unused SECTION_SIZE and SECTION_MASK panic: ensure preemption is disabled during panic() fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc() fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock() fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry() ocfs2: clear zero in unaligned direct IO
2019-10-07mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)Vlastimil Babka
In most configurations, kmalloc() happens to return naturally aligned (i.e. aligned to the block size itself) blocks for power of two sizes. That means some kmalloc() users might unknowingly rely on that alignment, until stuff breaks when the kernel is built with e.g. CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then developers have to devise workaround such as own kmem caches with specified alignment [1], which is not always practical, as recently evidenced in [2]. The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()' variant would not help with code unknowingly relying on the implicit alignment. For slab implementations it would either require creating more kmalloc caches, or allocate a larger size and only give back part of it. That would be wasteful, especially with a generic alignment parameter (in contrast with a fixed alignment to size). Ideally we should provide to mm users what they need without difficult workarounds or own reimplementations, so let's make the kmalloc() alignment to size explicitly guaranteed for power-of-two sizes under all configurations. What this means for the three available allocators? * SLAB object layout happens to be mostly unchanged by the patch. The implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for caches with alignment larger than unsigned long long. Practically on at least x86 this includes kmalloc caches as they use cache line alignment, which is larger than that. Still, this patch ensures alignment on all arches and cache sizes. * SLUB layout is also unchanged unless redzoning is enabled through CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With this patch, explicit alignment is guaranteed with redzoning as well. This will result in more memory being wasted, but that should be acceptable in a debugging scenario. * SLOB has no implicit alignment so this patch adds it explicitly for kmalloc(). The potential downside is increased fragmentation. While pathological allocation scenarios are certainly possible, in my testing, after booting a x86_64 kernel+userspace with virtme, around 16MB memory was consumed by slab pages both before and after the patch, with difference in the noise. [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/ [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/ [3] https://lwn.net/Articles/787740/ [akpm@linux-foundation.org: documentation fixlet, per Matthew] Link: http://lkml.kernel.org/r/20190826111627.7505-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: David Sterba <dsterba@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: "Darrick J . Wong" <darrick.wong@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07mm, memcg: make scan aggression always exclude protectionChris Down
This patch is an incremental improvement on the existing memory.{low,min} relative reclaim work to base its scan pressure calculations on how much protection is available compared to the current usage, rather than how much the current usage is over some protection threshold. This change doesn't change the experience for the user in the normal case too much. One benefit is that it replaces the (somewhat arbitrary) 100% cutoff with an indefinite slope, which makes it easier to ballpark a memory.low value. As well as this, the old methodology doesn't quite apply generically to machines with varying amounts of physical memory. Let's say we have a top level cgroup, workload.slice, and another top level cgroup, system-management.slice. We want to roughly give 12G to system-management.slice, so on a 32GB machine we set memory.low to 20GB in workload.slice, and on a 64GB machine we set memory.low to 52GB. However, because these are relative amounts to the total machine size, while the amount of memory we want to generally be willing to yield to system.slice is absolute (12G), we end up putting more pressure on system.slice just because we have a larger machine and a larger workload to fill it, which seems fairly unintuitive. With this new behaviour, we don't end up with this unintended side effect. Previously the way that memory.low protection works is that if you are 50% over a certain baseline, you get 50% of your normal scan pressure. This is certainly better than the previous cliff-edge behaviour, but it can be improved even further by always considering memory under the currently enforced protection threshold to be out of bounds. This means that we can set relatively low memory.low thresholds for variable or bursty workloads while still getting a reasonable level of protection, whereas with the previous version we may still trivially hit the 100% clamp. The previous 100% clamp is also somewhat arbitrary, whereas this one is more concretely based on the currently enforced protection threshold, which is likely easier to reason about. There is also a subtle issue with the way that proportional reclaim worked previously -- it promotes having no memory.low, since it makes pressure higher during low reclaim. This happens because we base our scan pressure modulation on how far memory.current is between memory.min and memory.low, but if memory.low is unset, we only use the overage method. In most cromulent configurations, this then means that we end up with *more* pressure than with no memory.low at all when we're in low reclaim, which is not really very usable or expected. With this patch, memory.low and memory.min affect reclaim pressure in a more understandable and composable way. For example, from a user standpoint, "protected" memory now remains untouchable from a reclaim aggression standpoint, and users can also have more confidence that bursty workloads will still receive some amount of guaranteed protection. Link: http://lkml.kernel.org/r/20190322160307.GA3316@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07mm, memcg: make memory.emin the baseline for utilisation determinationChris Down
Roman points out that when when we do the low reclaim pass, we scale the reclaim pressure relative to position between 0 and the maximum protection threshold. However, if the maximum protection is based on memory.elow, and memory.emin is above zero, this means we still may get binary behaviour on second-pass low reclaim. This is because we scale starting at 0, not starting at memory.emin, and since we don't scan at all below emin, we end up with cliff behaviour. This should be a fairly uncommon case since usually we don't go into the second pass, but it makes sense to scale our low reclaim pressure starting at emin. You can test this by catting two large sparse files, one in a cgroup with emin set to some moderate size compared to physical RAM, and another cgroup without any emin. In both cgroups, set an elow larger than 50% of physical RAM. The one with emin will have less page scanning, as reclaim pressure is lower. Rebase on top of and apply the same idea as what was applied to handle cgroup_memory=disable properly for the original proportional patch http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name ("mm, memcg: Handle cgroup_disable=memory when getting memcg protection"). Link: http://lkml.kernel.org/r/20190201051810.GA18895@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Suggested-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07mm, memcg: proportional memory.{low,min} reclaimChris Down
cgroup v2 introduces two memory protection thresholds: memory.low (best-effort) and memory.min (hard protection). While they generally do what they say on the tin, there is a limitation in their implementation that makes them difficult to use effectively: that cliff behaviour often manifests when they become eligible for reclaim. This patch implements more intuitive and usable behaviour, where we gradually mount more reclaim pressure as cgroups further and further exceed their protection thresholds. This cliff edge behaviour happens because we only choose whether or not to reclaim based on whether the memcg is within its protection limits (see the use of mem_cgroup_protected in shrink_node), but we don't vary our reclaim behaviour based on this information. Imagine the following timeline, with the numbers the lruvec size in this zone: 1. memory.low=1000000, memory.current=999999. 0 pages may be scanned. 2. memory.low=1000000, memory.current=1000000. 0 pages may be scanned. 3. memory.low=1000000, memory.current=1000001. 1000001* pages may be scanned. (?!) * Of course, we won't usually scan all available pages in the zone even without this patch because of scan control priority, over-reclaim protection, etc. However, as shown by the tests at the end, these techniques don't sufficiently throttle such an extreme change in input, so cliff-like behaviour isn't really averted by their existence alone. Here's an example of how this plays out in practice. At Facebook, we are trying to protect various workloads from "system" software, like configuration management tools, metric collectors, etc (see this[0] case study). In order to find a suitable memory.low value, we start by determining the expected memory range within which the workload will be comfortable operating. This isn't an exact science -- memory usage deemed "comfortable" will vary over time due to user behaviour, differences in composition of work, etc, etc. As such we need to ballpark memory.low, but doing this is currently problematic: 1. If we end up setting it too low for the workload, it won't have *any* effect (see discussion above). The group will receive the full weight of reclaim and won't have any priority while competing with the less important system software, as if we had no memory.low configured at all. 2. Because of this behaviour, we end up erring on the side of setting it too high, such that the comfort range is reliably covered. However, protected memory is completely unavailable to the rest of the system, so we might cause undue memory and IO pressure there when we *know* we have some elasticity in the workload. 3. Even if we get the value totally right, smack in the middle of the comfort zone, we get extreme jumps between no pressure and full pressure that cause unpredictable pressure spikes in the workload due to the current binary reclaim behaviour. With this patch, we can set it to our ballpark estimation without too much worry. Any undesirable behaviour, such as too much or too little reclaim pressure on the workload or system will be proportional to how far our estimation is off. This means we can set memory.low much more conservatively and thus waste less resources *without* the risk of the workload falling off a cliff if we overshoot. As a more abstract technical description, this unintuitive behaviour results in having to give high-priority workloads a large protection buffer on top of their expected usage to function reliably, as otherwise we have abrupt periods of dramatically increased memory pressure which hamper performance. Having to set these thresholds so high wastes resources and generally works against the principle of work conservation. In addition, having proportional memory reclaim behaviour has other benefits. Most notably, before this patch it's basically mandatory to set memory.low to a higher than desirable value because otherwise as soon as you exceed memory.low, all protection is lost, and all pages are eligible to scan again. By contrast, having a gradual ramp in reclaim pressure means that you now still get some protection when thresholds are exceeded, which means that one can now be more comfortable setting memory.low to lower values without worrying that all protection will be lost. This is important because workingset size is really hard to know exactly, especially with variable workloads, so at least getting *some* protection if your workingset size grows larger than you expect increases user confidence in setting memory.low without a huge buffer on top being needed. Thanks a lot to Johannes Weiner and Tejun Heo for their advice and assistance in thinking about how to make this work better. In testing these changes, I intended to verify that: 1. Changes in page scanning become gradual and proportional instead of binary. To test this, I experimented stepping further and further down memory.low protection on a workload that floats around 19G workingset when under memory.low protection, watching page scan rates for the workload cgroup: +------------+-----------------+--------------------+--------------+ | memory.low | test (pgscan/s) | control (pgscan/s) | % of control | +------------+-----------------+--------------------+--------------+ | 21G | 0 | 0 | N/A | | 17G | 867 | 3799 | 23% | | 12G | 1203 | 3543 | 34% | | 8G | 2534 | 3979 | 64% | | 4G | 3980 | 4147 | 96% | | 0 | 3799 | 3980 | 95% | +------------+-----------------+--------------------+--------------+ As you can see, the test kernel (with a kernel containing this patch) ramps up page scanning significantly more gradually than the control kernel (without this patch). 2. More gradual ramp up in reclaim aggression doesn't result in premature OOMs. To test this, I wrote a script that slowly increments the number of pages held by stress(1)'s --vm-keep mode until a production system entered severe overall memory contention. This script runs in a highly protected slice taking up the majority of available system memory. Watching vmstat revealed that page scanning continued essentially nominally between test and control, without causing forward reclaim progress to become arrested. [0]: https://facebookmicrosites.github.io/cgroup2/docs/overview.html#case-study-the-fbtax2-project [akpm@linux-foundation.org: reflow block comments to fit in 80 cols] [chris@chrisdown.name: handle cgroup_disable=memory when getting memcg protection] Link: http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name Link: http://lkml.kernel.org/r/20190124014455.GA6396@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07memcg: only record foreign writebacks with dirty pages when memcg is not ↵Baoquan He
disabled In kdump kernel, memcg usually is disabled with 'cgroup_disable=memory' for saving memory. Now kdump kernel will always panic when dump vmcore to local disk: BUG: kernel NULL pointer dereference, address: 0000000000000ab8 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 598 Comm: makedumpfile Not tainted 5.3.0+ #26 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 RIP: 0010:mem_cgroup_track_foreign_dirty_slowpath+0x38/0x140 Call Trace: __set_page_dirty+0x52/0xc0 iomap_set_page_dirty+0x50/0x90 iomap_write_end+0x6e/0x270 iomap_write_actor+0xce/0x170 iomap_apply+0xba/0x11e iomap_file_buffered_write+0x62/0x90 xfs_file_buffered_aio_write+0xca/0x320 [xfs] new_sync_write+0x12d/0x1d0 vfs_write+0xa5/0x1a0 ksys_write+0x59/0xd0 do_syscall_64+0x59/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 And this will corrupt the 1st kernel too with 'cgroup_disable=memory'. Via the trace and with debugging, it is pointing to commit 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing") which introduced this regression. Disabling memcg causes the null pointer dereference at uninitialized data in function mem_cgroup_track_foreign_dirty_slowpath(). Fix it by returning directly if memcg is disabled, but not trying to record the foreign writebacks with dirty pages. Link: http://lkml.kernel.org/r/20190924141928.GD31919@MiWiFi-R3L-srv Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing") Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07netfilter: ipset: move ip_set_get_ip_port() to ip_set_bitmap_port.c.Jeremy Sowden
ip_set_get_ip_port() is only used in ip_set_bitmap_port.c. Move it there and make it static. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-07netfilter: ipset: move function to ip_set_bitmap_ip.c.Jeremy Sowden
One inline function in ip_set_bitmap.h is only called in ip_set_bitmap_ip.c: move it and remove inline function specifier. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-07netfilter: ipset: make ip_set_put_flags extern.Jeremy Sowden
ip_set_put_flags is rather large for a static inline function in a header-file. Move it to ip_set_core.c and export it. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-07netfilter: ipset: move functions to ip_set_core.c.Jeremy Sowden
Several inline functions in ip_set.h are only called in ip_set_core.c: move them and remove inline function specifier. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-07netfilter: ipset: move ip_set_comment functions from ip_set.h to ip_set_core.c.Jeremy Sowden
Most of the functions are only called from within ip_set_core.c. The exception is ip_set_init_comment. However, this is too complex to be a good candidate for a static inline function. Move it to ip_set_core.c, change its linkage to extern and export it, leaving a declaration in ip_set.h. ip_set_comment_free is only used as an extension destructor, so change its prototype to match and drop cast. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-07netfilter: ipset: add a coding-style fix to ip_set_ext_destroy.Jeremy Sowden
Use a local variable to hold comment in order to align the arguments of ip_set_comment_free properly. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-07iio: core: Add optional symbolic label to device attributesPhil Reid
If a label is defined in the device tree for this device add that to the device specific attributes. This is useful for userspace to be able to identify an individual device when multiple identical chips are present in the system. Tested-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Phil Reid <preid@electromag.com.au> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2019-10-07device_cgroup: Export devcgroup_check_permissionHarish Kasiviswanathan
For AMD compute (amdkfd) driver. All AMD compute devices are exported via single device node /dev/kfd. As a result devices cannot be controlled individually using device cgroup. AMD compute devices will rely on its graphics counterpart that exposes /dev/dri/renderN node for each device. For each task (based on its cgroup), KFD driver will check if /dev/dri/renderN node is accessible before exposing it. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-07uaccess: implement a proper unsafe_copy_to_user() and switch filldir over to itLinus Torvalds
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") I made filldir() use unsafe_put_user(), which improves code generation on x86 enormously. But because we didn't have a "unsafe_copy_to_user()", the dirent name copy was also done by hand with unsafe_put_user() in a loop, and it turns out that a lot of other architectures didn't like that, because unlike x86, they have various alignment issues. Most non-x86 architectures trap and fix it up, and some (like xtensa) will just fail unaligned put_user() accesses unconditionally. Which makes that "copy using put_user() in a loop" not work for them at all. I could make that code do explicit alignment etc, but the architectures that don't like unaligned accesses also don't really use the fancy "user_access_begin/end()" model, so they might just use the regular old __copy_to_user() interface. So this commit takes that looping implementation, turns it into the x86 version of "unsafe_copy_to_user()", and makes other architectures implement the unsafe copy version as __copy_to_user() (the same way they do for the other unsafe_xyz() accessor functions). Note that it only does this for the copying _to_ user space, and we still don't have a unsafe version of copy_from_user(). That's partly because we have no current users of it, but also partly because the copy_from_user() case is slightly different and cannot efficiently be implemented in terms of a unsafe_get_user() loop (because gcc can't do asm goto with outputs). It would be trivial to do this using "rep movsb", which would work really nicely on newer x86 cores, but really badly on some older ones. Al Viro is looking at cleaning up all our user copy routines to make this all a non-issue, but for now we have this simple-but-stupid version for x86 that works fine for the dirent name copy case because those names are short strings and we simply don't need anything fancier. Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-and-tested-by: Tony Luck <tony.luck@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07spi: Add a PTP system timestamp to the transfer structureVladimir Oltean
SPI is one of the interfaces used to access devices which have a POSIX clock driver (real time clocks, 1588 timers etc). The fact that the SPI bus is slow is not what the main problem is, but rather the fact that drivers don't take a constant amount of time in transferring data over SPI. When there is a high delay in the readout of time, there will be uncertainty in the value that has been read out of the peripheral. When that delay is constant, the uncertainty can at least be approximated with a certain accuracy which is fine more often than not. Timing jitter occurs all over in the kernel code, and is mainly caused by having to let go of the CPU for various reasons such as preemption, servicing interrupts, going to sleep, etc. Another major reason is CPU dynamic frequency scaling. It turns out that the problem of retrieving time from a SPI peripheral with high accuracy can be solved by the use of "PTP system timestamping" - a mechanism to correlate the time when the device has snapshotted its internal time counter with the Linux system time at that same moment. This is sufficient for having a precise time measurement - it is not necessary for the whole SPI transfer to be transmitted "as fast as possible", or "as low-jitter as possible". The system has to be low-jitter for a very short amount of time to be effective. This patch introduces a PTP system timestamping mechanism in struct spi_transfer. This is to be used by SPI device drivers when they need to know the exact time at which the underlying device's time was snapshotted. More often than not, SPI peripherals have a very exact timing for when their SPI-to-interconnect bridge issues a transaction for snapshotting and reading the time register, and that will be dependent on when the SPI-to-interconnect bridge figures out that this is what it should do, aka as soon as it sees byte N of the SPI transfer. Since spi_device drivers are the ones who'd know best how the peripheral behaves in this regard, expose a mechanism in spi_transfer which allows them to specify which word (or word range) from the transfer should be timestamped. Add a default implementation of the PTP system timestamping in the SPI core. This is not going to be satisfactory performance-wise, but should at least increase the likelihood that SPI device drivers will use PTP system timestamping in the future. There are 3 entry points from the core towards the SPI controller drivers: - transfer_one: The driver is passed individual spi_transfers to execute. This is the easiest to timestamp. - transfer_one_message: The core passes the driver an entire spi_message (a potential batch of spi_transfers). The core puts the same pre and post timestamp to all transfers within a message. This is not ideal, but nothing better can be done by default anyway, since the core has no insight into how the driver batches the transfers. - transfer: Like transfer_one_message, but for unqueued drivers (i.e. the driver implements its own queue scheduling). Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20190905010114.26718-3-olteanv@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-07nvmem: core: add nvmem_device_findThomas Bogendoerfer
nvmem_device_find provides a way to search for nvmem devices with the help of a match function simlair to bus_find_device. Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: David S. Miller <davem@davemloft.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-rtc@vger.kernel.org Cc: linux-serial@vger.kernel.org
2019-10-07module: rename __kstrtab_ns_* to __kstrtabns_* to avoid symbol conflictMasahiro Yamada
The module namespace produces __strtab_ns_<sym> symbols to store namespace strings, but it does not guarantee the name uniqueness. This is a potential problem because we have exported symbols starting with "ns_". For example, kernel/capability.c exports the following symbols: EXPORT_SYMBOL(ns_capable); EXPORT_SYMBOL(capable); Assume a situation where those are converted as follows: EXPORT_SYMBOL_NS(ns_capable, some_namespace); EXPORT_SYMBOL_NS(capable, some_namespace); The former expands to "__kstrtab_ns_capable" and "__kstrtab_ns_ns_capable", and the latter to "__kstrtab_capable" and "__kstrtab_ns_capable". Then, we have the duplicated "__kstrtab_ns_capable". To ensure the uniqueness, rename "__kstrtab_ns_*" to "__kstrtabns_*". Reviewed-by: Matthias Maennich <maennich@google.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-10-07module: swap the order of symbol.namespaceMasahiro Yamada
Currently, EXPORT_SYMBOL_NS(_GPL) constructs the kernel symbol as follows: __ksymtab_SYMBOL.NAMESPACE The sym_extract_namespace() in modpost allocates memory for the part SYMBOL.NAMESPACE when '.' is contained. One problem is that the pointer returned by strdup() is lost because the symbol name will be copied to malloc'ed memory by alloc_symbol(). No one will keep track of the pointer of strdup'ed memory. sym->namespace still points to the NAMESPACE part. So, you can free it with complicated code like this: free(sym->namespace - strlen(sym->name) - 1); It complicates memory free. To fix it elegantly, I swapped the order of the symbol and the namespace as follows: __ksymtab_NAMESPACE.SYMBOL then, simplified sym_extract_namespace() so that it allocates memory only for the NAMESPACE part. I prefer this order because it is intuitive and also matches to major languages. For example, NAMESPACE::NAME in C++, MODULE.NAME in Python. Reviewed-by: Matthias Maennich <maennich@google.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-10-07firmware: qcom: scm: add support to restore secure config to qcm_scm-32Rob Clark
Add support to restore the secure configuration for qcm_scm-32.c. This is needed by the On Chip MEMory (OCMEM) that is present on some Snapdragon devices. Signed-off-by: Rob Clark <robdclark@gmail.com> [masneyb@onstation.org: ported to latest kernel; set ctx_bank_num to spare parameter.] Signed-off-by: Brian Masney <masneyb@onstation.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Tested-by: Gabriel Francisco <frc.gabrielgmail.com> Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-07firmware: qcom: scm: add OCMEM lock/unlock interfaceRob Clark
Add support for the OCMEM lock/unlock interface that is needed by the On Chip MEMory (OCMEM) that is present on some Snapdragon devices. Signed-off-by: Rob Clark <robdclark@gmail.com> [masneyb@onstation.org: ported to latest kernel; minor reformatting.] Signed-off-by: Brian Masney <masneyb@onstation.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Tested-by: Gabriel Francisco <frc.gabrielgmail.com> Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-07blk-mq: Inline status checkersPavel Begunkov
blk_mq_request_completed() and blk_mq_request_started() are short, inline it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-07block: Document all members of blk_mq_tag_set and bkl_mq_queue_mapBart Van Assche
The meaning of several member variables of these two data structures is nontrivial. Hence document all member variables using the kernel-doc syntax. Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>