summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-02nvme: opencode revalidate_disk in nvme_validate_nsChristoph Hellwig
Keep control in the NVMe driver instead of going through an indirect call back into ->revalidate_disk. Also reorder the function a bit to be easier to follow with the additional code. And now that we have removed all callers of revalidate_disk() in the nvme code, ->revalidate_disk is only called from the open code when first opening the device. Which is of course totally pointless as we have a valid size since the initial scan, and will get an updated view through the asynchronous notifiation everytime the size changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02block: use revalidate_disk_size in set_capacity_revalidate_and_notifyChristoph Hellwig
Only virtio_blk and xen-blkfront set the revalidate argument to true, and both do not implement the ->revalidate_disk method. So switch to the helper that just updates the size instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02block: add a new revalidate_disk_size helperChristoph Hellwig
revalidate_disk is a relative awkward helper for driver use, as it first calls an optional driver method and then updates the block device size, while most callers either don't need the method call at all, or want to keep state between the caller and the called method. Add a revalidate_disk_size helper that just performs the update of the block device size from the gendisk one, and switch all drivers that do not implement ->revalidate_disk to use the new helper instead of revalidate_disk() Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02block: rename bd_invalidatedChristoph Hellwig
Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags variable to better describe the condition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02block: don't clear bd_invalidated in check_disk_size_changeChristoph Hellwig
bd_invalidated is set by check_disk_change or in add_disk to initiate a partition scan. Move it from check_disk_size_change which is called from both revalidate_disk() and bdev_disk_changed() to only the latter, as that is what is called from the block device open code (and nbd) to deal with the bd_invalidated event. revalidate_disk() on the other hand is mostly used to propagate a size update from the gendisk to the block device, which is entirely unrelated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02Documentation/filesystems/locking.rst: remove an incorrect sentenceChristoph Hellwig
unlock_native_capacity is never called from check_disk_change(), and while revalidate_disk can be called from it, it can also be called from two other places at the moment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpcAneesh Kumar K.V
The test is broken w.r.t page table update rules and results in kernel crash as below. Disable the support until we get the tests updated. [ 21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304! cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0] pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380 lr: c0000000005eeeec: pte_update+0x11c/0x190 sp: c000000c6d1e7950 msr: 8000000002029033 current = 0xc000000c6d172c80 paca = 0xc000000003ba0000 irqmask: 0x03 irq_happened: 0x01 pid = 1, comm = swapper/0 kernel BUG at arch/powerpc/mm/pgtable.c:304! [link register ] c0000000005eeeec pte_update+0x11c/0x190 [c000000c6d1e7950] 0000000000000001 (unreliable) [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190 [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8 [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338 [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0 [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4 [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160 [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c With DEBUG_VM disabled [ 20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 20.530183] Faulting instruction address: 0xc0000000000df330 cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700] pc: c0000000000df330: memset+0x68/0x104 lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0 sp: c000000c6d19f990 msr: 8000000002009033 dar: 0 current = 0xc000000c6d177480 paca = 0xc00000001ec4f400 irqmask: 0x03 irq_happened: 0x01 pid = 1, comm = swapper/0 [link register ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0 [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable) [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378 [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244 [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0 [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4 [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160 [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c 33:mon> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902040122.136414-1-aneesh.kumar@linux.ibm.com
2020-09-02Merge branch 'topic/tasklet-convert' into for-linusTakashi Iwai
Pull tasklet API conversions. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02ALSA: ua101: convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://lore.kernel.org/r/20200902040221.354941-11-allen.lkml@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02ALSA: usb-audio: convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://lore.kernel.org/r/20200902040221.354941-10-allen.lkml@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02ASoC: txx9: convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20200902040221.354941-9-allen.lkml@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02ASoC: siu: convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20200902040221.354941-8-allen.lkml@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02ASoC: fsl_esai: convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20200902040221.354941-7-allen.lkml@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02ALSA: hdsp: convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://lore.kernel.org/r/20200902040221.354941-6-allen.lkml@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02ALSA: riptide: convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://lore.kernel.org/r/20200902040221.354941-5-allen.lkml@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02ALSA: pci/asihpi: convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://lore.kernel.org/r/20200902040221.354941-4-allen.lkml@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02ALSA: firewire: convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20200902040221.354941-3-allen.lkml@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02ALSA: core: convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://lore.kernel.org/r/20200902040221.354941-2-allen.lkml@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-02s390: update defconfigsHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-02s390: fix GENERIC_LOCKBREAK dependency typo in KconfigEric Farman
Commit fa686453053b ("sched/rt, s390: Use CONFIG_PREEMPTION") changed a bunch of uses of CONFIG_PREEMPT to _PREEMPTION. Except in the Kconfig it used two T's. That's the only place in the system where that spelling exists, so let's fix that. Fixes: fa686453053b ("sched/rt, s390: Use CONFIG_PREEMPTION") Cc: <stable@vger.kernel.org> # 5.6 Signed-off-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-02drm/i915: Clear the repeater bit on HDCP disableSean Paul
On HDCP disable, clear the repeater bit. This ensures if we connect a non-repeater sink after a repeater, the bit is in the state we expect. Fixes: ee5e5e7a5e0f ("drm/i915: Add HDCP framework + base implementation") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ramalingam C <ramalingam.c@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Sean Paul <seanpaul@chromium.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v4.17+ Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Signed-off-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200818153910.27894-3-sean@poorly.run (cherry picked from commit 2cc0c7b520bf8ea20ec42285d4e3d37b467eb7f9) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-09-02drm/i915: Fix sha_text population codeSean Paul
This patch fixes a few bugs: 1- We weren't taking into account sha_leftovers when adding multiple ksvs to sha_text. As such, we were or'ing the end of ksv[j - 1] with the beginning of ksv[j] 2- In the sha_leftovers == 2 and sha_leftovers == 3 case, bstatus was being placed on the wrong half of sha_text, overlapping the leftover ksv value 3- In the sha_leftovers == 2 case, we need to manually terminate the byte stream with 0x80 since the hardware doesn't have enough room to add it after writing M0 The upside is that all of the HDCP supported HDMI repeaters I could find on Amazon just strip HDCP anyways, so it turns out to be _really_ hard to hit any of these cases without an MST hub, which is not (yet) supported. Oh, and the sha_leftovers == 1 case works perfectly! Fixes: ee5e5e7a5e0f ("drm/i915: Add HDCP framework + base implementation") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ramalingam C <ramalingam.c@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Sean Paul <seanpaul@chromium.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v4.17+ Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Signed-off-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200818153910.27894-2-sean@poorly.run (cherry picked from commit 1f0882214fd0037b74f245d9be75c31516fed040) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-09-02drm/i915/display: Ensure that ret is always initialized in ↵Nathan Chancellor
icl_combo_phy_verify_state Clang warns: drivers/gpu/drm/i915/display/intel_combo_phy.c:268:3: warning: variable 'ret' is uninitialized when used here [-Wuninitialized] ret &= check_phy_reg(dev_priv, phy, ICL_PORT_TX_DW8_LN0(phy), ^~~ drivers/gpu/drm/i915/display/intel_combo_phy.c:261:10: note: initialize the variable 'ret' to silence this warning bool ret; ^ = 0 1 warning generated. In practice, the bug this warning appears to be concerned with would not actually matter because ret gets initialized to the return value of cnl_verify_procmon_ref_values. However, that does appear to be a bug since it means the first hunk of the patch this fixes won't actually do anything (since the values of check_phy_reg won't factor into the final ret value). Initialize ret to true then make all of the assignments a bitwise AND with itself so that the function always does what it should do. Fixes: 239bef676d8e ("drm/i915/display: Implement new combo phy initialization step") Link: https://github.com/ClangBuiltLinux/linux/issues/1094 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200828202830.7165-1-jose.souza@intel.com Signed-off-by: José Roberto de Souza <jose.souza@intel.com> (cherry picked from commit 2034c2129bc4a91d471815d4dc7a2a69eaa5338d) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-09-02arm64/module: set trampoline section flags regardless of CONFIG_DYNAMIC_FTRACEJessica Yu
In the arm64 module linker script, the section .text.ftrace_trampoline is specified unconditionally regardless of whether CONFIG_DYNAMIC_FTRACE is enabled (this is simply due to the limitation that module linker scripts are not preprocessed like the vmlinux one). Normally, for .plt and .text.ftrace_trampoline, the section flags present in the module binary wouldn't matter since module_frob_arch_sections() would assign them manually anyway. However, the arm64 module loader only sets the section flags for .text.ftrace_trampoline when CONFIG_DYNAMIC_FTRACE=y. That's only become problematic recently due to a recent change in binutils-2.35, where the .text.ftrace_trampoline section (along with the .plt section) is now marked writable and executable (WAX). We no longer allow writable and executable sections to be loaded due to commit 5c3a7db0c7ec ("module: Harden STRICT_MODULE_RWX"), so this is causing all modules linked with binutils-2.35 to be rejected under arm64. Drop the IS_ENABLED(CONFIG_DYNAMIC_FTRACE) check in module_frob_arch_sections() so that the section flags for .text.ftrace_trampoline get properly set to SHF_EXECINSTR|SHF_ALLOC, without SHF_WRITE. Signed-off-by: Jessica Yu <jeyu@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: http://lore.kernel.org/r/20200831094651.GA16385@linux-8ccs Link: https://lore.kernel.org/r/20200901160016.3646-1-jeyu@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-09-02arm64: Remove exporting cpu_logical_map symbolSudeep Holla
Commit eaecca9e7710 ("arm64: Fix __cpu_logical_map undefined issue") exported cpu_logical_map in order to fix tegra194-cpufreq module build failure. As this might potentially cause problem while supporting physical CPU hotplug, tegra194-cpufreq module was reworded to avoid use of cpu_logical_map() via the commit 93d0c1ab2328 ("cpufreq: replace cpu_logical_map() with read_cpuid_mpir()") Since cpu_logical_map was exported to fix the module build temporarily, let us remove the same before it gains any user again. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20200901095229.56793-1-sudeep.holla@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-09-02drm/virtio: fix unblankGerd Hoffmann
When going through a disable/enable cycle without changing the framebuffer the optimization added by commit 3954ff10e06e ("drm/virtio: skip set_scanout if framebuffer didn't change") causes the screen stay blank. Add a bool to force an update to fix that. v2: use drm_atomic_crtc_needs_modeset() (Daniel). Cc: 1882851@bugs.launchpad.net Fixes: 3954ff10e06e ("drm/virtio: skip set_scanout if framebuffer didn't change") Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Jiri Slaby <jirislaby@kernel.org> Tested-by: Diego Viola <diego.viola@gmail.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20200818072511.6745-2-kraxel@redhat.com (cherry picked from commit 1bc371cd0ec907bab870cacb6e898105f9c41dc8)
2020-09-01Merge tag 'perf-tools-fixes-for-v5.9-2020-09-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix infinite loop in the TUI for grouped events in 'perf top/record', eg when using "perf top -e '{cycles,instructions,cache-misses}'". - Fix segfault by skipping side-band event setup if HAVE_LIBBPF_SUPPORT is not set. - Fix synthesized branch stacks generated from CoreSight ETM trace and Intel PT hardware traces. - Fix error when synthesizing events from ARM SPE hardware trace. - The SNOOPX and REMOTE offsets in the data_src bitmask in perf records were were both 37, SNOOPX is 38, fix it. - Fix use of CPU list with summary option in 'perf sched timehist'. - Avoid an uninitialized read when using fake PMUs. - Set perf_event_attr.exclude_guest=1 for user-space counting. - Don't order events when doing a 'perf report -D' raw dump of perf.data records. - Set NULL sentinel in pmu_events table in "Parse and process metrics" 'perf test' - Fix basic bpf filtering 'perf test' on s390x. - Fix out of bounds array access in the 'perf stat' print_counters() evlist method. - Add mwait_idle_with_hints.constprop.0 to the list of idle symbols. - Use %zd for size_t printf formats on 32-bit. - Correct the help info of "perf record --no-bpf-event" option. - Add entries for CoreSight and Arm SPE tooling to MAINTAINERS. * tag 'perf-tools-fixes-for-v5.9-2020-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf report: Disable ordered_events for raw dump perf tools: Correct SNOOPX field offset perf intel-pt: Fix corrupt data after perf inject from perf cs-etm: Fix corrupt data after perf inject from perf top/report: Fix infinite loop in the TUI for grouped events perf parse-events: Avoid an uninitialized read when using fake PMUs perf stat: Fix out of bounds array access in the print_counters() evlist method perf test: Set NULL sentinel in pmu_events table in "Parse and process metrics" test perf parse-events: Set exclude_guest=1 for user-space counting perf record: Correct the help info of option "--no-bpf-event" perf tools: Use %zd for size_t printf formats on 32-bit MAINTAINERS: Add entries for CoreSight and Arm SPE tooling perf: arm-spe: Fix check error when synthesizing events perf symbols: Add mwait_idle_with_hints.constprop.0 to the list of idle symbols perf top: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not set perf sched timehist: Fix use of CPU list with summary option perf test: Fix basic bpf filtering test
2020-09-01block: Remove a duplicative conditionBaolin Wang
Remove a duplicative condition to remove below cppcheck warnings: "warning: Redundant condition: sched_allow_merge. '!A || (A && B)' is equivalent to '!A || B' [redundantCondition]" Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01scsi: libsas: Set data_dir as DMA_NONE if libata marks qc as NODATALuo Jiaxing
It was discovered that sdparm will fail when attempting to disable write cache on a SATA disk connected via libsas. In the ATA command set the write cache state is controlled through the SET FEATURES operation. This is roughly corresponds to MODE SELECT in SCSI and the latter command is what is used in the SCSI-ATA translation layer. A subtle difference is that a MODE SELECT carries data whereas SET FEATURES is defined as a non-data command in ATA. Set the DMA data direction to DMA_NONE if the requested ATA command is identified as non-data. [mkp: commit desc] Fixes: fa1c1e8f1ece ("[SCSI] Add SATA support to libsas") Link: https://lore.kernel.org/r/1598426666-54544-1-git-send-email-luojiaxing@huawei.com Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-09-01block: better deal with the delayed not supported case in ↵Ritika Srivastava
blk_cloned_rq_check_limits If WRITE_ZERO/WRITE_SAME operation is not supported by the storage, blk_cloned_rq_check_limits() will return IO error which will cause device-mapper to fail the paths. Instead, if the queue limit is set to 0, return BLK_STS_NOTSUPP. BLK_STS_NOTSUPP will be ignored by device-mapper and will not fail the paths. Suggested-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ritika Srivastava <ritika.srivastava@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01block: Return blk_status_t instead of errno codesRitika Srivastava
Replace returning legacy errno codes with blk_status_t in blk_cloned_rq_check_limits(). Signed-off-by: Ritika Srivastava <ritika.srivastava@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01block: grant IOPRIO_CLASS_RT to CAP_SYS_NICEKhazhismel Kumykov
CAP_SYS_ADMIN is too broad, and ionice fits into CAP_SYS_NICE's grouping. Retain CAP_SYS_ADMIN permission for backwards compatibility. Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: update iocost_monitor.pyTejun Heo
iocost went through significant internal changes. Update iocost_monitor.py accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: add three debug stat - cost.wait, indebt and indelayTejun Heo
These are really cheap to collect and can be useful in debugging iocost behavior. Add them as debug stats for now. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: restore inuse update tracepointsTejun Heo
Update and restore the inuse update tracepoints. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: implement vtime loss compensationTejun Heo
When an iocg accumulates too much vtime or gets deactivated, we throw away some vtime, which lowers the overall device utilization. As the exact amount which is being thrown away is known, we can compensate by accelerating the vrate accordingly so that the extra vtime generated in the current period matches what got lost. This significantly improves work conservation when involving high weight cgroups with intermittent and bursty IO patterns. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: halve debts if device stays idleTejun Heo
A low weight iocg can amass a large amount of debt, for example, when anonymous memory gets reclaimed aggressively. If the system has a lot of memory paired with a slow IO device, the debt can span multiple seconds or more. If there are no other subsequent IO issuers, the in-debt iocg may end up blocked paying its debt while the IO device is idle. This patch implements a mechanism to protect against such pathological cases. If the device has been sufficiently idle for a substantial amount of time, the debts are halved. The criteria are on the conservative side as we want to resolve the rare extreme cases without impacting regular operation by forgiving debts too readily. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: implement delay adjustment hysteresisTejun Heo
Curently, iocost syncs the delay duration to the outstanding debt amount, which seemed enough to protect the system from anon memory hogs. However, that was mostly because the delay calcuation was using hweight_inuse which quickly converges towards zero under debt for delay duration calculation, often pusnishing debtors overly harshly for longer than deserved. The previous patch fixed the delay calcuation and now the protection against anonymous memory hogs isn't enough because the effect of delay is indirect and non-linear and a huge amount of future debt can accumulate abruptly while unthrottled. This patch implements delay hysteresis so that delay is decayed exponentially over time instead of getting cleared immediately as debt is paid off. While the overall behavior is similar to the blk-cgroup implementation used by blk-iolatency, a lot of the details are different and due to the empirical nature of the mechanism, it's challenging to adapt the mechanism for one controller without negatively impacting the other. As the delay is gradually decayed now, there's no point in running it from its own hrtimer. Periodic updates are now performed from ioc_timer_fn() and the dedicated hrtimer is removed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: revamp debt handlingTejun Heo
Debt handling had several issues. * How much inuse a debtor carries wasn't clearly defined. inuse would be driven down over time from not issuing IOs but it'd be better to clamp it to minimum immediately once in debt. * How much can be paid off was determined by hweight_inuse. As inuse was driven down, the payment amount would fall together regardless of the debtor's active weight. This means that the debtors were punished harshly. * ioc_rqos_merge() wasn't calling blkcg_schedule_throttle() after iocg_kick_delay(). This patch revamps debt handling so that * Debt handling owns inuse for iocgs in debt and keeps them at zero. * Payment amount is determined by hweight_active. This is more deterministic and safer than hweight_inuse but still far from ideal in that it doesn't factor in possible donations from other iocgs for debt payments. This likely needs further improvements in the future. * iocg_rqos_merge() now calls blkcg_schedule_throttle() as necessary. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andy Newell <newella@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: revamp in-period donation snapbacksTejun Heo
When the margin drops below the minimum on a donating iocg, donation is immediately canceled in full. There are a couple shortcomings with the current behavior. * It's abrupt. A small temporary budget deficit can lead to a wide swing in weight allocation and a large surplus. * It's open coded in the issue path but not implemented for the merge path. A series of merges at a low inuse can make the iocg incur debts and stall incorrectly. This patch reimplements in-period donation snapbacks so that * inuse adjustment and cost calculations are factored into adjust_inuse_and_calc_cost() which is called from both the issue and merge paths. * Snapbacks are more gradual. It occurs in quarter steps. * A snapback triggers if the margin goes below the low threshold and is lower than the budget at the time of the last adjustment. * For the above, __propagate_weights() stores the margin in iocg->saved_margin. Move iocg->last_inuse storing together into __propagate_weights() for consistency. * Full snapback is guaranteed when there are waiters. * With precise donation and gradual snapbacks, inuse adjustments are now a lot more effective and the value of scaling inuse on weight changes isn't clear. Removed inuse scaling from weight_update(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: revamp donation amount determinationTejun Heo
iocost has various safety nets to combat inuse adjustment calculation inaccuracies. With Andy's method implemented in transfer_surpluses(), inuse adjustment calculations are now accurate and we can make donation amount determinations accurate too. * Stop keeping track of past usage history and using the maximum. Act on the immediate usage information. * Remove donation constraints defined by SURPLUS_* constants. Donate whatever isn't used. * Determine the donation amount so that the iocg will end up with MARGIN_TARGET_PCT budget at the end of the coming period assuming the same usage as the previous period. TARGET is set at 50% of period, which is the previous maximum. This provides smooth convergence for most repetitive IO patterns. * Apply donation logic early at 20% budget. There's no risk in doing so as the calculation is based on the delta between the current budget and the target budget at the end of the coming period. * Remove preemptive iocg activation for zero cost IOs. As donation can reach near zero now, the mere activation doesn't provide any protection anymore. In the unlikely case that this becomes a problem, the right solution is assigning appropriate costs for such IOs. This significantly improves the donation determination logic while also simplifying it. Now all donations are immediate, exact and smooth. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andy Newell <newella@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: implement Andy's method for donation weight updatesTejun Heo
iocost implements work conservation by reducing iocg->inuse and propagating the adjustment upwards proportionally. However, while I knew the target absolute hierarchical proportion - adjusted hweight_inuse, I couldn't figure out how to determine the iocg->inuse adjustment to achieve that and approximated the adjustment by scaling iocg->inuse using the proportion of the needed hweight_inuse changes. When nested, these scalings aren't accurate even when adjusting a single node as the donating node also receives the benefit of the donated portion. When multiple nodes are donating as they often do, they can be wildly wrong. iocost employed various safety nets to combat the inaccuracies. There are ample buffers in determining how much to donate, the adjustments are conservative and gradual. While it can achieve a reasonable level of work conservation in simple scenarios, the inaccuracies can easily add up leading to significant loss of total work. This in turn makes it difficult to closely cap vrate as vrate adjustment is needed to compensate for the loss of work. The combination of inaccurate donation calculations and vrate adjustments can lead to wide fluctuations and clunky overall behaviors. Andy Newell devised a method to calculate the needed ->inuse updates to achieve the target hweight_inuse's. The method is compatible with the proportional inuse adjustment propagation which allows all hot path operations to be local to each iocg. To roughly summarize, Andy's method divides the tree into donating and non-donating parts, calculates global donation rate which is used to determine the target hweight_inuse for each node, and then derives per-level proportions. There's non-trivial amount of math involved. Please refer to the following pdfs for detailed descriptions. https://drive.google.com/file/d/1PsJwxPFtjUnwOY1QJ5AeICCcsL7BM3bo https://drive.google.com/file/d/1vONz1-fzVO7oY5DXXsLjSxEtYYQbOvsE https://drive.google.com/file/d/1WcrltBOSPN0qXVdBgnKm4mdp9FhuEFQN This patch implements Andy's method in transfer_surpluses(). This makes the donation calculations accurate per cycle and enables further improvements in other parts of the donation logic. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andy Newell <newella@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: restructure surplus donation logicTejun Heo
The way the surplus donation logic is structured isn't great. There are two separate paths for starting/increasing donations and decreasing them making the logic harder to follow and is prone to unnecessary behavior differences. In preparation for improved donation handling, this patch restructures the code so that * All donors - new, increasing and decreasing - are funneled through the same code path. * The target donation calculation is factored into hweight_after_donation() which is called once from the same spot for all possible donors. * Actual inuse adjustment is factored into trasnfer_surpluses(). This change introduces a few behavior differences - e.g. donation amount reduction now uses the max usage of the recent three periods just like new and increasing donations, and inuse now gets adjusted upwards the same way it gets downwards. These differences are unlikely to have severely negative implications and the whole logic will be revamped soon. This patch also removes two tracepoints. The existing TPs don't quite fit the new implementation. A later patch will update and reinstate them. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: decouple vrate adjustment from surplus transfersTejun Heo
Budget donations are inaccurate and could take multiple periods to converge. To prevent triggering vrate adjustments while surplus transfers were catching up, vrate adjustment was suppressed if donations were increasing, which was indicated by non-zero nr_surpluses. This entangling won't be necessary with the scheduled rewrite of donation mechanism which will make it precise and immediate. Let's decouple the two in preparation. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: replace iocg->has_surplus with ->surplus_listTejun Heo
Instead of marking iocgs with surplus with a flag and filtering for them while walking all active iocgs, build a surpluses list. This doesn't make much difference now but will help implementing improved donation logic which will iterate iocgs with surplus multiple times. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: calculate iocg->usages[] from iocg->local_stat.usage_usTejun Heo
Currently, iocg->usages[] which are used to guide inuse adjustments are calculated from vtime deltas. This, however, assumes that the hierarchical inuse weight at the time of calculation held for the entire period, which often isn't true and can lead to significant errors. Now that we have absolute usage information collected, we can derive iocg->usages[] from iocg->local_stat.usage_us so that inuse adjustment decisions are made based on actual absolute usage. The calculated usage is clamped between 1 and WEIGHT_ONE and WEIGHT_ONE is also used to signal saturation regardless of the current hierarchical inuse weight. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: add absolute usage statTejun Heo
Currently, iocost doesn't collect or expose any statistics punting off all monitoring duties to drgn based iocost_monitor.py. While it works for some scenarios, there are some usability and data availability challenges. For example, accurate per-cgroup usage information can't be tracked by vtime progression at all and the number available in iocg->usages[] are really short-term snapshots used for control heuristics with possibly significant errors. This patch implements per-cgroup absolute usage stat counter and exposes it through io.stat along with the current vrate. Usage stat collection and flushing employ the same method as cgroup rstat on the active iocg's and the only hot path overhead is preemption toggling and adding to a percpu counter. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: grab ioc->lock for debt handlingTejun Heo
Currently, debt handling requires only iocg->waitq.lock. In the future, we want to adjust and propagate inuse changes depending on debt status. Let's grab ioc->lock in debt handling paths in preparation. * Because ioc->lock nests outside iocg->waitq.lock, the decision to grab ioc->lock needs to be made before entering the critical sections. * Add and use iocg_[un]lock() which handles the conditional double locking. * Add @pay_debt to iocg_kick_waitq() so that debt payment happens only when the caller grabbed both locks. This patch is prepatory and the comments contain references to future changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: streamline vtime margin and timer slack handlingTejun Heo
The margin handling was pretty inconsistent. * ioc->margin_us and ioc->inuse_margin_vtime were used as vtime margin thresholds. However, the two are in different units with the former requiring conversion to vtime on use. * iocg_kick_waitq() was using a quarter of WAITQ_TIMER_MARGIN_PCT of period_us as the timer slack - ~1.2%. While iocg_kick_delay() was using a quarter of ioc->margin_us - ~12.5%. There aren't strong reasons to use different values for the two. This patch cleans up margin and timer slack handling: * vtime margins are now recorded in ioc->margins.{min, max} on period duration changes and used consistently. * Timer slack is now 1% of period_us and recorded in ioc->timer_slack_ns and used consistently for iocg_kick_waitq() and iocg_kick_delay(). The only functional change is shortening of timer slack. No meaningful visible change is expected. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01blk-iocost: make ioc_now->now and ioc->period_at 64bitTejun Heo
They are in microseconds and wrap in around 1.2 hours with u32. While unlikely, confusions from wraparounds are still possible. We aren't saving anything meaningful by keeping these u32. Let's make them u64. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>