summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-01-09Drivers: hv: vmbus: Check for ring when getting debug infoDexuan Cui
fc96df16a1ce is good and can already fix the "return stack garbage" issue, but let's also improve hv_ringbuffer_get_debuginfo(), which would silently return stack garbage, if people forget to check channel->state or ring_info->ring_buffer, when using the function in the future. Having an error check in the function would eliminate the potential risk. Add a Fixes tag to indicate the patch depdendency. Fixes: fc96df16a1ce ("Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels") Cc: stable@vger.kernel.org Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-09spi: core: avoid waking pump thread from spi_sync instead run teardown delayedMartin Sperl
When spi_sync is running alone with no other spi devices connected to the bus the worker thread is woken during spi_finalize_current_message to run the teardown code every time. This is totally unnecessary in the case that there is no message queued. On a multi-core system this results in one wakeup of the thread for each spi_message processed via spi_sync where in most cases the teardown does not happen as the hw is already in use. This patch now delays the teardown by 1 second by using a separate kthread_delayed_work for the teardown. This avoids waking the kthread too often. For spi_sync transfers in a tight loop (say 40k messages/s) this avoids the penalty of waking the worker thread 40k times/s. On a rasperry pi 3 with 4 cores the results in 32% of a single core only to find out that there is nothing in the queue and it can go back to sleep. With this patch applied the spi-worker is woken exactly once: after the load finishes and the spi bus is idle for 1 second. I believe I have also seen situations where during a spi_sync loop the worker thread (triggered by the last message finished) is slightly faster and _wins_ the race to process the message, so we are actually running the kthread and letting it do some work... This is also no longer observed with this patch applied as. Tested with a new CAN controller driver for the mcp2517fd which uses spi_sync for interrupt handling and spi_async for scheduling of can frames for transmission (in a different thread) Some statistics when receiving 100000 CAN frames with the mcp25xxfd driver on a Raspberry pi 3: without the patch: ------------------ root@raspcm3:~# for x in $(pgrep spi0) $(pgrep irq/94-mcp25xxf) ; do awk '{printf "%-20s %6i\n", $2,$15}' /proc/$x/stat; done (spi0) 5 (irq/94-mcp25xxf) 0 root@raspcm3:~# vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 821960 13592 50848 0 0 80 2 1986 105 1 2 97 0 0 0 0 0 821968 13592 50876 0 0 0 0 8046 30 0 0 100 0 0 0 0 0 821936 13592 50876 0 0 0 0 8032 24 0 0 100 0 0 0 0 0 821936 13592 50876 0 0 0 0 8035 30 0 0 100 0 0 0 0 0 821936 13592 50876 0 0 0 0 8033 22 0 0 100 0 0 2 0 0 821936 13592 50876 0 0 0 0 11598 7129 0 3 97 0 0 1 0 0 821872 13592 50876 0 0 0 0 37741 59003 0 31 69 0 0 2 0 0 821840 13592 50876 0 0 0 0 37762 59078 0 29 71 0 0 2 0 0 821776 13592 50876 0 0 0 0 37593 58792 0 28 72 0 0 1 0 0 821744 13592 50876 0 0 0 0 37642 58881 0 30 70 0 0 2 0 0 821680 13592 50876 0 0 0 0 37490 58602 0 27 73 0 0 1 0 0 821648 13592 50876 0 0 0 0 37412 58418 0 29 71 0 0 1 0 0 821584 13592 50876 0 0 0 0 37337 58288 0 27 73 0 0 1 0 0 821552 13592 50876 0 0 0 0 37584 58774 0 27 73 0 0 0 0 0 821520 13592 50876 0 0 0 0 18363 20566 0 9 91 0 0 0 0 0 821520 13592 50876 0 0 0 0 8037 32 0 0 100 0 0 0 0 0 821520 13592 50876 0 0 0 0 8031 23 0 0 100 0 0 0 0 0 821520 13592 50876 0 0 0 0 8034 26 0 0 100 0 0 0 0 0 821520 13592 50876 0 0 0 0 8033 24 0 0 100 0 0 ^C root@raspcm3:~# for x in $(pgrep spi0) $(pgrep irq/94-mcp25xxf) ; do awk '{printf "%-20s %6i\n", $2,$15}' /proc/$x/stat; done (spi0) 228 (irq/94-mcp25xxf) 794 root@raspcm3:~# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 17: 34 0 0 0 ARMCTRL-level 1 Edge 3f00b880.mailbox 27: 1 0 0 0 ARMCTRL-level 35 Edge timer 33: 1416870 0 0 0 ARMCTRL-level 41 Edge 3f980000.usb, dwc2_hsotg:usb1 34: 1 0 0 0 ARMCTRL-level 42 Edge vc4 35: 0 0 0 0 ARMCTRL-level 43 Edge 3f004000.txp 40: 1753 0 0 0 ARMCTRL-level 48 Edge DMA IRQ 42: 11 0 0 0 ARMCTRL-level 50 Edge DMA IRQ 44: 11 0 0 0 ARMCTRL-level 52 Edge DMA IRQ 45: 0 0 0 0 ARMCTRL-level 53 Edge DMA IRQ 66: 0 0 0 0 ARMCTRL-level 74 Edge vc4 crtc 69: 0 0 0 0 ARMCTRL-level 77 Edge vc4 crtc 70: 0 0 0 0 ARMCTRL-level 78 Edge vc4 crtc 77: 20 0 0 0 ARMCTRL-level 85 Edge 3f205000.i2c, 3f804000.i2c, 3f805000.i2c 78: 6346 0 0 0 ARMCTRL-level 86 Edge 3f204000.spi 80: 205 0 0 0 ARMCTRL-level 88 Edge mmc0 81: 493 0 0 0 ARMCTRL-level 89 Edge uart-pl011 89: 0 0 0 0 bcm2836-timer 0 Edge arch_timer 90: 4291 3821 2180 1649 bcm2836-timer 1 Edge arch_timer 94: 14289 0 0 0 pinctrl-bcm2835 16 Level mcp25xxfd IPI0: 0 0 0 0 CPU wakeup interrupts IPI1: 0 0 0 0 Timer broadcast interrupts IPI2: 3645 242371 7919 1328 Rescheduling interrupts IPI3: 112 543 273 194 Function call interrupts IPI4: 0 0 0 0 CPU stop interrupts IPI5: 1 0 0 0 IRQ work interrupts IPI6: 0 0 0 0 completion interrupts Err: 0 top shows 93% for the mcp25xxfd interrupt handler, 31% for spi0. with the patch: --------------- root@raspcm3:~# for x in $(pgrep spi0) $(pgrep irq/94-mcp25xxf) ; do awk '{printf "%-20s %6i\n", $2,$15}' /proc/$x/stat; done (spi0) 0 (irq/94-mcp25xxf) 0 root@raspcm3:~# vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- 0 0 0 804768 13584 62628 0 0 0 0 8038 24 0 0 100 0 0 0 0 0 804768 13584 62628 0 0 0 0 8042 25 0 0 100 0 0 1 0 0 804704 13584 62628 0 0 0 0 9603 2967 0 20 80 0 0 1 0 0 804672 13584 62628 0 0 0 0 9828 3380 0 24 76 0 0 1 0 0 804608 13584 62628 0 0 0 0 9823 3375 0 23 77 0 0 1 0 0 804608 13584 62628 0 0 0 12 9829 3394 0 23 77 0 0 1 0 0 804544 13584 62628 0 0 0 0 9816 3362 0 22 78 0 0 1 0 0 804512 13584 62628 0 0 0 0 9817 3367 0 23 77 0 0 1 0 0 804448 13584 62628 0 0 0 0 9822 3370 0 22 78 0 0 1 0 0 804416 13584 62628 0 0 0 0 9815 3367 0 23 77 0 0 0 0 0 804352 13584 62628 0 0 0 84 9222 2250 0 14 86 0 0 0 0 0 804352 13592 62620 0 0 0 24 8131 209 0 0 93 7 0 0 0 0 804320 13592 62628 0 0 0 0 8041 27 0 0 100 0 0 0 0 0 804352 13592 62628 0 0 0 0 8040 26 0 0 100 0 0 root@raspcm3:~# for x in $(pgrep spi0) $(pgrep irq/94-mcp25xxf) ; do awk '{printf "%-20s %6i\n", $2,$15}' /proc/$x/stat; done (spi0) 0 (irq/94-mcp25xxf) 767 root@raspcm3:~# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 17: 29 0 0 0 ARMCTRL-level 1 Edge 3f00b880.mailbox 27: 1 0 0 0 ARMCTRL-level 35 Edge timer 33: 1024412 0 0 0 ARMCTRL-level 41 Edge 3f980000.usb, dwc2_hsotg:usb1 34: 1 0 0 0 ARMCTRL-level 42 Edge vc4 35: 0 0 0 0 ARMCTRL-level 43 Edge 3f004000.txp 40: 1773 0 0 0 ARMCTRL-level 48 Edge DMA IRQ 42: 11 0 0 0 ARMCTRL-level 50 Edge DMA IRQ 44: 11 0 0 0 ARMCTRL-level 52 Edge DMA IRQ 45: 0 0 0 0 ARMCTRL-level 53 Edge DMA IRQ 66: 0 0 0 0 ARMCTRL-level 74 Edge vc4 crtc 69: 0 0 0 0 ARMCTRL-level 77 Edge vc4 crtc 70: 0 0 0 0 ARMCTRL-level 78 Edge vc4 crtc 77: 20 0 0 0 ARMCTRL-level 85 Edge 3f205000.i2c, 3f804000.i2c, 3f805000.i2c 78: 6417 0 0 0 ARMCTRL-level 86 Edge 3f204000.spi 80: 237 0 0 0 ARMCTRL-level 88 Edge mmc0 81: 489 0 0 0 ARMCTRL-level 89 Edge uart-pl011 89: 0 0 0 0 bcm2836-timer 0 Edge arch_timer 90: 4048 3704 2383 1892 bcm2836-timer 1 Edge arch_timer 94: 14287 0 0 0 pinctrl-bcm2835 16 Level mcp25xxfd IPI0: 0 0 0 0 CPU wakeup interrupts IPI1: 0 0 0 0 Timer broadcast interrupts IPI2: 2361 2948 7890 1616 Rescheduling interrupts IPI3: 65 617 301 166 Function call interrupts IPI4: 0 0 0 0 CPU stop interrupts IPI5: 1 0 0 0 IRQ work interrupts IPI6: 0 0 0 0 completion interrupts Err: 0 top shows 91% for the mcp25xxfd interrupt handler, 0% for spi0 So we see that spi0 is no longer getting scheduled wasting CPU cycles There are a lot less context switches and corresponding Rescheduling interrupts All of these show that this improves efficiency of the system and reduces CPU utilization. Signed-off-by: Martin Sperl <kernel@martin.sperl.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-01-09regulator: provide rdev_get_regmap()Bartosz Golaszewski
Provide a helper allowing to access regulator's regmap. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-01-09soc: bcm: bcm2835-pm: Add support for power domains under a new binding.Eric Anholt
This provides a free software alternative to raspberrypi-power.c's firmware calls to manage power domains. It also exposes a reset line, where previously the vc4 driver had to try to force power off the domain in order to trigger a reset. Signed-off-by: Eric Anholt <eric@anholt.net> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
2019-01-09bcm2835-pm: Move bcm2835-watchdog's DT probe to an MFD.Eric Anholt
The PM block that the wdt driver was binding to actually has multiple features we want to expose (power domains, reset, watchdog). Move the DT attachment to a MFD driver and make WDT probe against MFD. Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
2019-01-09spi: Optionally use GPIO descriptors for CS GPIOsLinus Walleij
This augments the SPI core to optionally use GPIO descriptors for chip select on a per-master-driver opt-in basis. Drivers using this will rely on the SPI core to look up GPIO descriptors associated with the device, such as when using device tree or board files with GPIO descriptor tables. When getting descriptors from the device tree, this will in turn activate the code in gpiolib that was added in commit 6953c57ab172 ("gpio: of: Handle SPI chipselect legacy bindings") which means that these descriptors are aware of the active low semantics that is the default for SPI CS GPIO lines and we can assume that all of these are "active high" and thus assign SPI_CS_HIGH to all CS lines on the DT path. The previously used gpio_set_value() would call down into gpiod_set_raw_value() and ignore the polarity inversion semantics. It seems like many drivers go to great lengths to set up the CS GPIO line as non-asserted, respecting SPI_CS_HIGH. We pull this out of the SPI drivers and into the core, and by simply requesting the line as GPIOD_OUT_LOW when retrieveing it from the device and relying on the gpiolib to handle any inversion semantics. This way a lot of code can be simplified and removed in each converted driver. The end goal after dealing with each driver in turn, is to delete the non-descriptor path (of_spi_register_master() for example) and let the core deal with only descriptors. The different SPI drivers have complex interactions with the core so we cannot simply change them all over, we need to use a stepwise, bisectable approach so that each driver can be converted and fixed in isolation. This patch has the intended side effect of adding support for ACPI GPIOs as it starts relying on gpiod_get_*() to get the GPIO handle associated with the device. Cc: Linuxarm <linuxarm@huawei.com> Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Fangjian (Turing) <f.fangjian@huawei.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-01-09include/linux/compiler*.h: fix OPTIMIZER_HIDE_VARMichael S. Tsirkin
Since commit 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") clang no longer reuses the OPTIMIZER_HIDE_VAR macro from compiler-gcc - instead it gets the version in include/linux/compiler.h. Unfortunately that version doesn't actually prevent compiler from optimizing out the variable. Fix up by moving the macro out from compiler-gcc.h to compiler.h. Compilers without incline asm support will keep working since it's protected by an ifdef. Also fix up comments to match reality since we are no longer overriding any macros. Build-tested with gcc and clang. Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") Cc: Eli Friedman <efriedma@codeaurora.org> Cc: Joe Perches <joe@perches.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2019-01-09x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINEWANG Chao
Commit 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support") replaced the RETPOLINE define with CONFIG_RETPOLINE checks. Remove the remaining pieces. [ bp: Massage commit message. ] Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support") Signed-off-by: WANG Chao <chao.wang@ucloud.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: linux-kbuild@vger.kernel.org Cc: srinivas.eeda@oracle.com Cc: stable <stable@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20181210163725.95977-1-chao.wang@ucloud.cn
2019-01-09ACPI / PMIC: Add support for executing PMIC MIPI sequence elementsHans de Goede
DSI LCD panels describe an initialization sequence in the Video BIOS Tables using so called MIPI sequences. One possible element in these sequences is a PMIC specific element of 15 bytes. Although this is not really an ACPI opregion, the ACPI opregion code is the closest thing we have. We need to have support for these PMIC specific MIPI sequence elements somwhere. Since we already instantiate a special platform device for Intel PMICs for the ACPI PMIC OpRegion handler to bind to, with PMIC specific implementations of the OpRegion, the handling of MIPI sequence PMIC elements fits very well in the ACPI PMIC OpRegion code. This commit adds a new intel_soc_pmic_exec_mipi_pmic_seq_element() function, which is to be backed by a PMIC specific exec_mipi_pmic_seq_element callback. This function will be called by the i915 code to execture MIPI sequence PMIC elements. Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190107111556.4510-2-hdegoede@redhat.com
2019-01-09x86/cache: Rename config option to CONFIG_X86_RESCTRLBorislav Petkov
CONFIG_RESCTRL is too generic. The final goal is to have a generic option called like this which is selected by the arch-specific ones CONFIG_X86_RESCTRL and CONFIG_ARM64_RESCTRL. The generic one will cover the resctrl filesystem and other generic and shared bits of functionality. Signed-off-by: Borislav Petkov <bp@suse.de> Suggested-by: Ingo Molnar <mingo@kernel.org> Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Babu Moger <babu.moger@amd.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: James Morse <james.morse@arm.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20190108171401.GC12235@zn.tnic
2019-01-08libnvdimm/dimm: Fix security capability detection for non-Intel NVDIMMsDan Williams
Kees reports a crash with the following signature... RIP: 0010:nvdimm_visible+0x79/0x80 [..] Call Trace: internal_create_group+0xf4/0x380 sysfs_create_groups+0x46/0xb0 device_add+0x331/0x680 nd_async_device_register+0x15/0x60 async_run_entry_fn+0x38/0x100 ...when starting a QEMU environment with "label-less" DIMM. Without labels QEMU does not publish any DSM methods. Without defined methods the NVDIMM_FAMILY type is not established and the nfit driver will skip registering security operations. In that case the security state should be initialized to a negative value in __nvdimm_create() and nvdimm_visible() should skip interrogating the specific ops. However, since 'enum nvdimm_security_state' was only defined to contain positive values the "if (nvdimm->sec.state < 0)" check always fails. Define a negative error state to allow negative state values to be handled as expected. Fixes: f2989396553a ("acpi/nfit, libnvdimm: Introduce nvdimm_security_ops") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reported-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-08mm, page_alloc: do not wake kswapd with zone lock heldMel Gorman
syzbot reported the following regression in the latest merge window and it was confirmed by Qian Cai that a similar bug was visible from a different context. ====================================================== WARNING: possible circular locking dependency detected 4.20.0+ #297 Not tainted ------------------------------------------------------ syz-executor0/8529 is trying to acquire lock: 000000005e7fb829 (&pgdat->kswapd_wait){....}, at: __wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120 but task is already holding lock: 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock include/linux/spinlock.h:329 [inline] 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk mm/page_alloc.c:2548 [inline] 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist mm/page_alloc.c:3021 [inline] 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist mm/page_alloc.c:3050 [inline] 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue mm/page_alloc.c:3072 [inline] 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491 It appears to be a false positive in that the only way the lock ordering should be inverted is if kswapd is waking itself and the wakeup allocates debugging objects which should already be allocated if it's kswapd doing the waking. Nevertheless, the possibility exists and so it's best to avoid the problem. This patch flags a zone as needing a kswapd using the, surprisingly, unused zone flag field. The flag is read without the lock held to do the wakeup. It's possible that the flag setting context is not the same as the flag clearing context or for small races to occur. However, each race possibility is harmless and there is no visible degredation in fragmentation treatment. While zone->flag could have continued to be unused, there is potential for moving some existing fields into the flags field instead. Particularly read-mostly ones like zone->initialized and zone->contiguous. Link: http://lkml.kernel.org/r/20190103225712.GJ31517@techsingularity.net Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Reported-by: syzbot+93d94a001cfbce9e60e1@syzkaller.appspotmail.com Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Qian Cai <cai@lca.pw> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08RDMA/mlx5: Delete declaration of already removed functionLeon Romanovsky
The implementation of mlx5_core_page_fault_resume() was removed in commit d5d284b829a6 ("{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA"). This patch removes declaration too. Fixes: d5d284b829a6 ("{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08LSM: Infrastructure management of the ipc security blobCasey Schaufler
Move management of the kern_ipc_perm->security and msg_msg->security blobs out of the individual security modules and into the security infrastructure. Instead of allocating the blobs from within the modules the modules tell the infrastructure how much space is required, and the space is allocated there. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> [kees: adjusted for ordered init series] Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-08LSM: Infrastructure management of the task securityCasey Schaufler
Move management of the task_struct->security blob out of the individual security modules and into the security infrastructure. Instead of allocating the blobs from within the modules the modules tell the infrastructure how much space is required, and the space is allocated there. The only user of this blob is AppArmor. The AppArmor use is abstracted to avoid future conflict. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> [kees: adjusted for ordered init series] Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-08LSM: Infrastructure management of the inode securityCasey Schaufler
Move management of the inode->i_security blob out of the individual security modules and into the security infrastructure. Instead of allocating the blobs from within the modules the modules tell the infrastructure how much space is required, and the space is allocated there. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> [kees: adjusted for ordered init series] Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-08LSM: Infrastructure management of the file securityCasey Schaufler
Move management of the file->f_security blob out of the individual security modules and into the infrastructure. The modules no longer allocate or free the data, instead they tell the infrastructure how much space they require. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> [kees: adjusted for ordered init series] Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-08Infrastructure management of the cred security blobCasey Schaufler
Move management of the cred security blob out of the security modules and into the security infrastructre. Instead of allocating and freeing space the security modules tell the infrastructure how much space they require. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> [kees: adjusted for ordered init series] Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-08SELinux: Remove unused selinux_is_enabledCasey Schaufler
There are no longer users of selinux_is_enabled(). Remove it. As selinux_is_enabled() is the only reason for include/linux/selinux.h remove that as well. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-08procfs: add smack subdir to attrsCasey Schaufler
Back in 2007 I made what turned out to be a rather serious mistake in the implementation of the Smack security module. The SELinux module used an interface in /proc to manipulate the security context on processes. Rather than use a similar interface, I used the same interface. The AppArmor team did likewise. Now /proc/.../attr/current will tell you the security "context" of the process, but it will be different depending on the security module you're using. This patch provides a subdirectory in /proc/.../attr for Smack. Smack user space can use the "current" file in this subdirectory and never have to worry about getting SELinux attributes by mistake. Programs that use the old interface will continue to work (or fail, as the case may be) as before. The proposed S.A.R.A security module is dependent on the mechanism to create its own attr subdirectory. The original implementation is by Kees Cook. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-08capability: Initialize as LSM_ORDER_FIRSTKees Cook
This converts capabilities to use the new LSM_ORDER_FIRST position. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
2019-01-08LSM: Introduce enum lsm_orderKees Cook
In preparation for distinguishing the "capability" LSM from other LSMs, it must be ordered first. This introduces LSM_ORDER_MUTABLE for the general LSMs and LSM_ORDER_FIRST for capability. In the future LSM_ORDER_LAST for could be added for anything that must run last (e.g. Landlock may use this). Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-08Yama: Initialize as ordered LSMKees Cook
This converts Yama from being a direct "minor" LSM into an ordered LSM. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
2019-01-08LoadPin: Initialize as ordered LSMKees Cook
This converts LoadPin from being a direct "minor" LSM into an ordered LSM. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
2019-01-08LSM: Separate idea of "major" LSM from "exclusive" LSMKees Cook
In order to both support old "security=" Legacy Major LSM selection, and handling real exclusivity, this creates LSM_FLAG_EXCLUSIVE and updates the selection logic to handle them. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
2019-01-08LSM: Tie enabling logic to presence in ordered listKees Cook
Until now, any LSM without an enable storage variable was considered enabled. This inverts the logic and sets defaults to true only if the LSM gets added to the ordered initialization list. (And an exception continues for the major LSMs until they are integrated into the ordered initialization in a later patch.) Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-08LSM: Lift LSM selection out of individual LSMsKees Cook
As a prerequisite to adjusting LSM selection logic in the future, this moves the selection logic up out of the individual major LSMs, making their init functions only run when actually enabled. This considers all LSMs enabled by default unless they specified an external "enable" variable. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: John Johansen <john.johansen@canonical.com>
2019-01-08LSM: Plumb visibility into optional "enabled" stateKees Cook
In preparation for lifting the "is this LSM enabled?" logic out of the individual LSMs, pass in any special enabled state tracking (as needed for SELinux, AppArmor, and LoadPin). This should be an "int" to include handling any future cases where "enabled" is exposed via sysctl which has no "bool" type. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: John Johansen <john.johansen@canonical.com>
2019-01-08LSM: Introduce LSM_FLAG_LEGACY_MAJORKees Cook
This adds a flag for the current "major" LSMs to distinguish them when we have a universal method for ordering all LSMs. It's called "legacy" since the distinction of "major" will go away in the blob-sharing world. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: John Johansen <john.johansen@canonical.com>
2019-01-08i2c: add suspended flag and accessors for i2c adaptersWolfram Sang
A few drivers open code the handling of suspended adapters. It could be handled by the core, though, to ensure generic handling. This patch adds the flag and accessor functions. The usage of these helpers is optional, though. See the kerneldoc in this patch. Using the new flag, we now reject further transfers if the adapter is already marked suspended. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-01-08dma-mapping: remove dma_zalloc_coherent()Luis Chamberlain
dma_zalloc_coherent() is no longer needed as it has no users because dma_alloc_coherent() already zeroes out memory for us. The Coccinelle grammar rule that used to check for dma_alloc_coherent() + memset() is modified so that it just tells the user that the memset is not needed anymore. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-08cross-tree: phase out dma_zalloc_coherent()Luis Chamberlain
We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-07doc: networking: convert offload files into RST and update referencesOtto Sabart
This patch renames offload files. This is necessary for Sphinx. Also update reference to checksum-offloads.rst file. Whole kernel code was grepped for references using: $ grep -r "\(segmentation\|checksum\)-offloads.txt" . There should be no other references to {segmentation,checksum}-offloads.txt files. Signed-off-by: Otto Sabart <ottosabart@seberm.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-01-07libceph: allow setting abort_on_full for rbdDongsheng Yang
Introduce a new option abort_on_full, default to false. Then we can get -ENOSPC when the pool is full, or reaches quota. [ Don't show abort_on_full in /proc/mounts. ] Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-01-07reset: Add reset_control_get_count()Geert Uytterhoeven
Currently the reset core has internal support for counting the number of resets for a device described in DT. Generalize this to devices using lookup resets, and export it for public use. This will be used by generic drivers that need to be sure a device is controlled by a single, dedicated reset line (e.g. vfio-platform). Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> [p.zabel@pengutronix.de: fixed a typo in reset_control_get_count comment] Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-01-07reset: Improve reset controller kernel docsGeert Uytterhoeven
Grammar and indentation fixes. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> [p.zabel@pengutronix.de: dropped "shared among" -> "shared between"] Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-01-07dmaengine: dw: convert to SPDX identifiersAndy Shevchenko
This patch updates license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-01-07dmaengine: dw: Split DW and iDMA 32-bit operationsAndy Shevchenko
Here is a kinda big refactoring that should have been done in the first place, when Intel iDMA 32-bit support appeared. It splits operations which are different to Synopsys DesignWare and Intel iDMA 32-bit controllers. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-01-07dmaengine: dw: Remove unused internal propertyAndy Shevchenko
All known devices, which use DT for configuration, support memory-to-memory transfers. So enable it by default. The rest two cases, i.e. Intel Quark and PPC460ex, instantiate DMA driver and use its channels exclusively for hardware, which means there is no available channel for any other purposes anyway. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-01-07dmaengine: dw: Remove misleading is_private propertyAndy Shevchenko
The commit a9ddb575d6d6 ("dmaengine: dw_dmac: Enhance device tree support") introduces is_private property in uncertain understanding what does it mean. First of all, documentation defines DMA_PRIVATE capability as Documentation/crypto/async-tx-api.txt: The DMA_PRIVATE capability flag is used to tag dma devices that should not be used by the general-purpose allocator. It can be set at initialization time if it is known that a channel will always be private. Alternatively, it is set when dma_request_channel() finds an unused "public" channel. A couple caveats to note when implementing a driver and consumer: 1/ Once a channel has been privately allocated it will no longer be considered by the general-purpose allocator even after a call to dma_release_channel(). 2/ Since capabilities are specified at the device level a dma_device with multiple channels will either have all channels public, or all channels private. Documentation/driver-api/dmaengine/provider.rst: - DMA_PRIVATE The devices only supports slave transfers, and as such isn't available for async transfers. The capability had been introduced by the commit 59b5ec21446b ("dmaengine: introduce dma_request_channel and private channels") and some code didn't changed from that times ever. Taking into consideration above and the fact that on all known platforms Synopsys DesignWare DMA engine is attached to serve slave transfers, the DMA_PRIVATE capability must be enabled for this device unconditionally. Otherwise, as rightfully noticed in drivers/dma/at_xdmac.c: /* * Without DMA_PRIVATE the driver is not able to allocate more than * one channel, second allocation fails in private_candidate. */ because of of a caveats mentioned in above documentation excerpts. So, remove conditional around DMA_PRIVATE followed by removal leftovers. If someone wonders, DMA_PRIVATE can be not used if and only if the all channels of the DMA controller are supposed to serve memory-to-memory like operations. For example, EP93xx has two controllers, one of which can only perform memory-to-memory transfers Note, this change doesn't affect dmatest to be able to test such controllers. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (maintainer:SERIAL DRIVERS) Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-01-06remoteproc: fix kernel-doc comment for parse_fwFabien Dessenne
Fix the kernel-doc comment for "parse_fw" and fix a typo. Signed-off-by: Fabien Dessenne <fabien.dessenne@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-01-06acpi/nfit, device-dax: Identify differentiated memory with a unique numa-nodeDan Williams
Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware Interface Table), is the first known instance of a memory range described by a unique "target" proximity domain. Where "initiator" and "target" proximity domains is an approach that the ACPI HMAT (Heterogeneous Memory Attributes Table) uses to described the unique performance properties of a memory range relative to a given initiator (e.g. CPU or DMA device). Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y char-device follows the traditional notion of 'numa-node' where the attribute conveys the closest online numa-node. That numa-node attribute is useful for cpu-binding and memory-binding processes *near* the device. However, when the memory range backing a 'pmem', or 'dax' device is onlined (memory hot-add) the memory-only-numa-node representing that address needs to be differentiated from the set of online nodes. In other words, the numa-node association of the device depends on whether you can bind processes *near* the cpu-numa-node in the offline device-case, or bind process *on* the memory-range directly after the backing address range is onlined. Allow for the case that platform firmware describes persistent memory with a unique proximity domain, i.e. when it is distinct from the proximity of DRAM and CPUs that are on the same socket. Plumb the Linux numa-node translation of that proximity through the libnvdimm region device to namespaces that are in device-dax mode. With this in place the proposed kmem driver [1] can optionally discover a unique numa-node number for the address range as it transitions the memory from an offline state managed by a device-driver to an online memory range managed by the core-mm. [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com Reported-by: Fan Du <fan.du@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-06XArray: Honour reserved entries in xa_insertMatthew Wilcox
xa_insert() should treat reserved entries as occupied, not as available. Also, it should treat requests to insert a NULL pointer as a request to reserve the slot. Add xa_insert_bh() and xa_insert_irq() for completeness. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-01-06XArray: Permit storing 2-byte-aligned pointersMatthew Wilcox
On m68k, statically allocated pointers may only be two-byte aligned. This clashes with the XArray's method for tagging internal pointers. Permit storing these pointers in single slots (ie not in multislots). Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-01-06XArray: Change xa_for_each iteratorMatthew Wilcox
There were three problems with this API: 1. It took too many arguments; almost all users wanted to iterate over every element in the array rather than a subset. 2. It required that 'index' be initialised before use, and there's no realistic way to make GCC catch that. 3. 'index' and 'entry' were the opposite way round from every other member of the XArray APIs. So split it into three different APIs: xa_for_each(xa, index, entry) xa_for_each_start(xa, index, entry, start) xa_for_each_marked(xa, index, entry, filter) Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-01-06XArray: Turn xa_init_flags into a static inlineMatthew Wilcox
A regular xa_init_flags() put all dynamically-initialised XArrays into the same locking class. That leads to lockdep believing that taking one XArray lock while holding another is a deadlock. It's possible to work around some of these situations with separate locking classes for irq/bh/regular XArrays, and SINGLE_DEPTH_NESTING, but that's ugly, and it doesn't work for all situations (where we have completely unrelated XArrays). Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-01-06Merge tag 'kbuild-v4.21-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - improve boolinit.cocci and use_after_iter.cocci semantic patches - fix alignment for kallsyms - move 'asm goto' compiler test to Kconfig and clean up jump_label CONFIG option - generate asm-generic wrappers automatically if arch does not implement mandatory UAPI headers - remove redundant generic-y defines - misc cleanups * tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: rename generated .*conf-cfg to *conf-cfg kbuild: remove unnecessary stubs for archheader and archscripts kbuild: use assignment instead of define ... endef for filechk_* rules arch: remove redundant UAPI generic-y defines kbuild: generate asm-generic wrappers if mandatory headers are missing arch: remove stale comments "UAPI Header export list" riscv: remove redundant kernel-space generic-y kbuild: change filechk to surround the given command with { } kbuild: remove redundant target cleaning on failure kbuild: clean up rule_dtc_dt_yaml kbuild: remove UIMAGE_IN and UIMAGE_OUT jump_label: move 'asm goto' support test to Kconfig kallsyms: lower alignment on ARM scripts: coccinelle: boolinit: drop warnings on named constants scripts: coccinelle: check for redeclaration kconfig: remove unused "file" field of yylval union nds32: remove redundant kernel-space generic-y nios2: remove unneeded HAS_DMA define
2019-01-06Merge tag 'dma-mapping-4.21-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fixes from Christoph Hellwig: "Fix various regressions introduced in this cycles: - fix dma-debug tracking for the map_page / map_single consolidatation - properly stub out DMA mapping symbols for !HAS_DMA builds to avoid link failures - fix AMD Gart direct mappings - setup the dma address for no kernel mappings using the remap allocator" * tag 'dma-mapping-4.21-1' of git://git.infradead.org/users/hch/dma-mapping: dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING for remapped allocations x86/amd_gart: fix unmapping of non-GART mappings dma-mapping: remove a few unused exports dma-mapping: properly stub out the DMA API for !CONFIG_HAS_DMA dma-mapping: remove dmam_{declare,release}_coherent_memory dma-mapping: implement dmam_alloc_coherent using dmam_alloc_attrs dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs
2019-01-06Merge tag 'tag-chrome-platform-for-v4.21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform Pull chrome platform updates from Benson Leung: - Changes for EC_MKBP_EVENT_SENSOR_FIFO handling. - Also, maintainership changes. Olofj out, Enric balletbo in. * tag 'tag-chrome-platform-for-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform: MAINTAINERS: add maintainers for ChromeOS EC sub-drivers MAINTAINERS: platform/chrome: Add Enric as a maintainer MAINTAINERS: platform/chrome: remove myself as maintainer platform/chrome: don't report EC_MKBP_EVENT_SENSOR_FIFO as wakeup platform/chrome: straighten out cros_ec_get_{next,host}_event() error codes
2019-01-05bpf: fix sanitation of alu op with pointer / scalar type from different pathsDaniel Borkmann
While 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") took care of rejecting alu op on pointer when e.g. pointer came from two different map values with different map properties such as value size, Jann reported that a case was not covered yet when a given alu op is used in both "ptr_reg += reg" and "numeric_reg += reg" from different branches where we would incorrectly try to sanitize based on the pointer's limit. Catch this corner case and reject the program instead. Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>