summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-12-07blkcg: remove bio_disassociate_task()Dennis Zhou
Now that a bio only holds a blkg reference, so clean up is simply putting back that reference. Remove bio_disassociate_task() as it just calls bio_disassociate_blkg() and call the latter directly. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: remove additional reference to the cssDennis Zhou
The previous patch in this series removed carrying around a pointer to the css in blkg. However, the blkg association logic still relied on taking a reference on the css to ensure we wouldn't fail in getting a reference for the blkg. Here the implicit dependency on the css is removed. The association continues to rely on the tryget logic walking up the blkg tree. This streamlines the three ways that association can happen: normal, swap, and writeback. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: remove bio->bi_css and instead use bio->bi_blkgDennis Zhou
Prior patches ensured that any bio that interacts with a request_queue is properly associated with a blkg. This makes bio->bi_css unnecessary as blkg maintains a reference to blkcg already. This removes the bio field bi_css and transfers corresponding uses to access via bi_blkg. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: associate writeback bios with a blkgDennis Zhou
One of the goals of this series is to remove a separate reference to the css of the bio. This can and should be accessed via bio_blkcg(). In this patch, wbc_init_bio() now requires a bio to have a device associated with it. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: associate a blkg for pages being evicted by swapDennis Zhou
A prior patch in this series added blkg association to bios issued by cgroups. There are two other paths that we want to attribute work back to the appropriate cgroup: swap and writeback. Here we modify the way swap tags bios to include the blkg. Writeback will be tackle in the next patch. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: consolidate bio_issue_init() to be a part of coreDennis Zhou
bio_issue_init among other things initializes the timestamp for an IO. Rather than have this logic handled by policies, this consolidates it to be on the init paths (normal, clone, bounce clone). Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: associate blkg when associating a deviceDennis Zhou
Previously, blkg association was handled by controller specific code in blk-throttle and blk-iolatency. However, because a blkg represents a relationship between a blkcg and a request_queue, it makes sense to keep the blkg->q and bio->bi_disk->queue consistent. This patch moves association into the bio_set_dev macro(). This should cover the majority of cases where the device is set/changed keeping the two pointers consistent. Fallback code is added to blkcg_bio_issue_check() to catch any missing paths. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: introduce common blkg association logicDennis Zhou
There are 3 ways blkg association can happen: association with the current css, with the page css (swap), or from the wbc css (writeback). This patch handles how association is done for the first case where we are associating bsaed on the current css. If there is already a blkg associated, the css will be reused and association will be redone as the request_queue may have changed. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: convert blkg_lookup_create() to find closest blkgDennis Zhou
There are several scenarios where blkg_lookup_create() can fail such as the blkcg dying, request_queue is dying, or simply being OOM. Most handle this by simply falling back to the q->root_blkg and calling it a day. This patch implements the notion of closest blkg. During blkg_lookup_create(), if it fails to create, return the closest blkg found or the q->root_blkg. blkg_try_get_closest() is introduced and used during association so a bio is always attached to a blkg. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: update blkg_lookup_create() to do lockingDennis Zhou
To know when to create a blkg, the general pattern is to do a blkg_lookup() and if that fails, lock and do the lookup again, and if that fails finally create. It doesn't make much sense for everyone who wants to do creation to write this themselves. This changes blkg_lookup_create() to do locking and implement this pattern. The old blkg_lookup_create() is renamed to __blkg_lookup_create(). If a call site wants to do its own error handling or already owns the queue lock, they can use __blkg_lookup_create(). This will be used in upcoming patches. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: fix ref count issue with bio_blkcg() using task_cssDennis Zhou
The bio_blkcg() function turns out to be inconsistent and consequently dangerous to use. The first part returns a blkcg where a reference is owned by the bio meaning it does not need to be rcu protected. However, the third case, the last line, is problematic: return css_to_blkcg(task_css(current, io_cgrp_id)); This can race against task migration and the cgroup dying. It is also semantically different as it must be called rcu protected and is susceptible to failure when trying to get a reference to it. This patch adds association ahead of calling bio_blkcg() rather than after. This makes association a required and explicit step along the code paths for calling bio_blkcg(). In blk-iolatency, association is moved above the bio_blkcg() call to ensure it will not return %NULL. BFQ uses the old bio_blkcg() function, but I do not want to address it in this series due to the complexity. I have created a private version documenting the inconsistency and noting not to use it. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blk-mq: remove QUEUE_FLAG_POLL from default MQ flagsJens Axboe
We only support polling if we have poll queues now, but the flag is being set by default. Remove the default QUEUE_FLAG_POLL setting, we'll set it in blk_mq_init_allocated_queue() if we have poll queues available for this device. Fixes: 6544d229bf43 ("block: enable polling by default if a poll map is initalized") Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04block: remove ->poll_fnChristoph Hellwig
This was intended to support users like nvme multipath, but is just getting in the way and adding another indirect call. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04block: move queues types to the block layerChristoph Hellwig
Having another indirect all in the fast path doesn't really help in our post-spectre world. Also having too many queue type is just going to create confusion, so I'd rather manage them centrally. Note that the queue type naming and ordering changes a bit - the first index now is the default queue for everything not explicitly marked, the optional ones are read and poll queues. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04Merge tag 'v4.20-rc5' into for-4.21/blockJens Axboe
Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and also getting the merge fix that went into mainline that users are hitting testing for-4.21/block and/or for-next. * tag 'v4.20-rc5': (664 commits) Linux 4.20-rc5 PCI: Fix incorrect value returned from pcie_get_speed_cap() MAINTAINERS: Update linux-mips mailing list address ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set ...
2018-12-03sbitmap: fix sbitmap_for_each_set()Omar Sandoval
We need to ignore bits in the cleared mask when iterating over all set bits. Fixes: ea86ea2cdced ("sbitmap: ammortize cost of clearing bits") Reported-by: Jens Axboe@kernel.dk> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-02Merge tag 'armsoc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Volume is a little higher than usual due to a set of gpio fixes for Davinci platforms that's been around a while, still seemed appropriate to not hold off until next merge window. Besides that it's the usual mix of minor fixes, mostly corrections of small stuff in device trees. Major stability-related one is the removal of a regulator from DT on Rock960, since DVFS caused undervoltage. I expect it'll be restored once they figure out the underlying issue" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits) MAINTAINERS: Remove unused Qualcomm SoC mailing list ARM: davinci: dm644x: set the GPIO base to 0 ARM: davinci: da830: set the GPIO base to 0 ARM: davinci: dm355: set the GPIO base to 0 ARM: davinci: dm646x: set the GPIO base to 0 ARM: davinci: dm365: set the GPIO base to 0 ARM: davinci: da850: set the GPIO base to 0 gpio: davinci: restore a way to manually specify the GPIO base ARM: davinci: dm644x: define gpio interrupts as separate resources ARM: davinci: dm355: define gpio interrupts as separate resources ARM: davinci: dm646x: define gpio interrupts as separate resources ARM: davinci: dm365: define gpio interrupts as separate resources ARM: davinci: da8xx: define gpio interrupts as separate resources ARM: dts: at91: sama5d2: use the divided clock for SMC ARM: dts: imx51-zii-rdu1: Remove EEPROM node ARM: dts: rockchip: Remove @0 from the veyron memory node arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou. arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP arm64: dts: sdm845-mtp: Reserve reserved gpios arm64: dts: ti: k3-am654: Fix wakeup_uart reg address ...
2018-12-02Merge tag 'for-linus-4.20a-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - A revert of a previous commit as it is no longer necessary and has shown to cause problems in some memory hotplug cases. - Some small fixes and a minor cleanup. - A patch for adding better diagnostic data in a very rare failure case. * tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: pvcalls-front: fixes incorrect error handling Revert "xen/balloon: Mark unallocated host memory as UNUSABLE" xen: xlate_mmu: add missing header to fix 'W=1' warning xen/x86: add diagnostic printout to xen_mc_flush() in case of error x86/xen: cleanup includes in arch/x86/xen/spinlock.c
2018-12-01Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull STIBP fallout fixes from Thomas Gleixner: "The performance destruction department finally got it's act together and came up with a cure for the STIPB regression: - Provide a command line option to control the spectre v2 user space mitigations. Default is either seccomp or prctl (if seccomp is disabled in Kconfig). prctl allows mitigation opt-in, seccomp enables the migitation for sandboxed processes. - Rework the code to handle the conditional STIBP/IBPB control and remove the now unused ptrace_may_access_sched() optimization attempt - Disable STIBP automatically when SMT is disabled - Optimize the switch_to() logic to avoid MSR writes and invocations of __switch_to_xtra(). - Make the asynchronous speculation TIF updates synchronous to prevent stale mitigation state. As a general cleanup this also makes retpoline directly depend on compiler support and removes the 'minimal retpoline' option which just pretended to provide some form of security while providing none" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/speculation: Provide IBPB always command line options x86/speculation: Add seccomp Spectre v2 user space protection mode x86/speculation: Enable prctl mode for spectre_v2_user x86/speculation: Add prctl() control for indirect branch speculation x86/speculation: Prepare arch_smt_update() for PRCTL mode x86/speculation: Prevent stale SPEC_CTRL msr content x86/speculation: Split out TIF update ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS x86/speculation: Prepare for conditional IBPB in switch_mm() x86/speculation: Avoid __switch_to_xtra() calls x86/process: Consolidate and simplify switch_to_xtra() code x86/speculation: Prepare for per task indirect branch speculation control x86/speculation: Add command line control for indirect branch speculation x86/speculation: Unify conditional spectre v2 print functions x86/speculataion: Mark command line parser data __initdata x86/speculation: Mark string arrays const correctly x86/speculation: Reorder the spec_v2 code x86/l1tf: Show actual SMT state x86/speculation: Rework SMT state change sched/smt: Expose sched_smt_present static key ...
2018-11-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "31 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits) ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set userfaultfd: shmem: add i_size checks userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem ...
2018-11-30Merge tag 'fscache-fixes-20181130' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache and cachefiles fixes from David Howells: "Misc fixes: - Fix an assertion failure at fs/cachefiles/xattr.c:138 caused by a race between a cache object lookup failing and someone attempting to reenable that object, thereby triggering an update of the object's attributes. - Fix an assertion failure at fs/fscache/operation.c:449 caused by a split atomic subtract and atomic read that allows a race to happen. - Fix a leak of backing pages when simultaneously reading the same page from the same object from two or more threads. - Fix a hang due to a race between a cache object being discarded and the corresponding cookie being reenabled. There are also some minor cleanups: - Cast an enum value to a different enum type to prevent clang from generating a warning. This shouldn't cause any sort of change in the emitted code. - Use ktime_get_real_seconds() instead of get_seconds(). This is just used to uniquify a filename for an object to be placed in the graveyard. Objects placed there are deleted by cachfilesd in userspace immediately thereafter. - Remove an initialised, but otherwise unused variable. This should have been entirely optimised away anyway" * tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache, cachefiles: remove redundant variable 'cache' cachefiles: avoid deprecated get_seconds() cachefiles: Explicitly cast enumerated type in put_object fscache: fix race between enablement and dropping of object cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active fscache: Fix race in fscache_op_complete() due to split atomic_sub & read cachefiles: Fix an assertion failure when trying to update a failed object
2018-11-30psi: make disabling/enabling easier for vendor kernelsJohannes Weiner
Mel Gorman reports a hackbench regression with psi that would prohibit shipping the suse kernel with it default-enabled, but he'd still like users to be able to opt in at little to no cost to others. With the current combination of CONFIG_PSI and the psi_disabled bool set from the commandline, this is a challenge. Do the following things to make it easier: 1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros to enable CONFIG_PSI in their kernel but leave the feature disabled unless a user requests it at boot-time. To avoid double negatives, rename psi_disabled= to psi=. 2. Make psi_disabled a static branch to eliminate any branch costs when the feature is disabled. In terms of numbers before and after this patch, Mel says: : The following is a comparision using CONFIG_PSI=n as a baseline against : your patch and a vanilla kernel : : 4.20.0-rc4 4.20.0-rc4 4.20.0-rc4 : kconfigdisable-v1r1 vanilla psidisable-v1r1 : Amean 1 1.3100 ( 0.00%) 1.3923 ( -6.28%) 1.3427 ( -2.49%) : Amean 3 3.8860 ( 0.00%) 4.1230 * -6.10%* 3.8860 ( -0.00%) : Amean 5 6.8847 ( 0.00%) 8.0390 * -16.77%* 6.7727 ( 1.63%) : Amean 7 9.9310 ( 0.00%) 10.8367 * -9.12%* 9.9910 ( -0.60%) : Amean 12 16.6577 ( 0.00%) 18.2363 * -9.48%* 17.1083 ( -2.71%) : Amean 18 26.5133 ( 0.00%) 27.8833 * -5.17%* 25.7663 ( 2.82%) : Amean 24 34.3003 ( 0.00%) 34.6830 ( -1.12%) 32.0450 ( 6.58%) : Amean 30 40.0063 ( 0.00%) 40.5800 ( -1.43%) 41.5087 ( -3.76%) : Amean 32 40.1407 ( 0.00%) 41.2273 ( -2.71%) 39.9417 ( 0.50%) : : It's showing that the vanilla kernel takes a hit (as the bisection : indicated it would) and that disabling PSI by default is reasonably : close in terms of performance for this particular workload on this : particular machine so; Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30sbitmap: optimize wakeup checkJens Axboe
Even if we have no waiters on any of the sbitmap_queue wait states, we still have to loop every entry to check. We do this for every IO, so the cost adds up. Shift a bit of the cost to the slow path, when we actually have waiters. Wrap prepare_to_wait_exclusive() and finish_wait(), so we can maintain an internal count of how many are currently active. Then we can simply check this count in sbq_wake_ptr() and not have to loop if we don't have any sleepers. Convert the two users of sbitmap with waiting, blk-mq-tag and iSCSI. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-30sbitmap: ammortize cost of clearing bitsJens Axboe
sbitmap maintains a set of words that we use to set and clear bits, with each bit representing a tag for blk-mq. Even though we spread the bits out and maintain a hint cache, one particular bit allocated will end up being cleared in the exact same spot. This introduces batched clearing of bits. Instead of clearing a given bit, the same bit is set in a cleared/free mask instead. If we fail allocating a bit from a given word, then we check the free mask, and batch move those cleared bits at that time. This trades 64 atomic bitops for 2 cmpxchg(). In a threaded poll test case, half the overhead of getting and clearing tags is removed with this change. On another poll test case with a single thread, performance is unchanged. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-30Merge tag 'staging-4.20-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging and IIO driver fixes from Greg KH: "Here are some small IIO and staging driver fixes for 4.20-rc5. Nothing major, the IIO fix ended up touching the HID drivers at the same time, but the HID maintainer acked it. The staging fixes are all minor patches for reported issues and regressions, full details are in the shortlog. All of these have been in linux-next for a while with no reported issues" * tag 'staging-4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio/hid-sensors: Fix IIO_CHAN_INFO_RAW returning wrong values for signed numbers staging: vchiq_arm: fix compat VCHIQ_IOC_AWAIT_COMPLETION staging: mt7621-pinctrl: fix uninitialized variable ngroups staging: rtl8723bs: Add missing return for cfg80211_rtw_get_station staging: most: use format specifier "%s" in snprintf staging: rtl8723bs: Fix incorrect sense of ether_addr_equal staging: mt7621-dma: fix potentially dereferencing uninitialized 'tx_desc' staging: comedi: clarify/unify macros for NI macro-defined terminals drivers: staging: cedrus: find ctx before dereferencing it ctx staging: rtl8723bs: Fix the return value in case of error in 'rtw_wx_read32()' staging: comedi: ni_mio_common: scale ao INSN_CONFIG_GET_CMD_TIMING_CONSTRAINTS iio:st_magn: Fix enable device after trigger
2018-11-30Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - MCE related boot crash fix on certain AMD systems - FPU exception handling fix - FPU handling race fix - revert+rewrite of the RSDP boot protocol extension, use boot_params instead - documentation fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/MCE/AMD: Fix the thresholding machinery initialization order x86/fpu: Use the correct exception table macro in the XSTATE_OP wrapper x86/fpu: Disable bottom halves while loading FPU registers x86/acpi, x86/boot: Take RSDP address from boot params if available x86/boot: Mostly revert commit ae7e1238e68f2a ("Add ACPI RSDP address to setup_header") x86/ptrace: Fix documentation for tracehook_report_syscall_entry()
2018-11-30Merge tag 'trace-v4.20-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing fixes from Steven Rostedt: "Two more fixes: - Change idx variable in DO_TRACE macro to __idx to avoid name conflicts. A kvm event had "idx" as a parameter and it confused the macro. - Fix a race where interrupts would be traced when set_graph_function was set. The previous patch set increased a race window that tricked the function graph tracer to think it should trace interrupts when it really should not have. The bug has been there before, but was seldom hit. Only the last patch series made it more common" * tag 'trace-v4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/fgraph: Fix set_graph_function from showing interrupts tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique
2018-11-30Merge tag 'trace-v4.20-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "While rewriting the function graph tracer, I discovered a design flaw that was introduced by a patch that tried to fix one bug, but by doing so created another bug. As both bugs corrupt the output (but they do not crash the kernel), I decided to fix the design such that it could have both bugs fixed. The original fix, fixed time reporting of the function graph tracer when doing a max_depth of one. This was code that can test how much the kernel interferes with userspace. But in doing so, it could corrupt the time keeping of the function profiler. The issue is that the curr_ret_stack variable was being used for two different meanings. One was to keep track of the stack pointer on the ret_stack (shadow stack used by the function graph tracer), and the other use case was the graph call depth. Although, the two may be closely related, where they got updated was the issue that lead to the two different bugs that required the two use cases to be updated differently. The big issue with this fix is that it requires changing each architecture. The good news is, I was able to remove a lot of code that was duplicated within the architectures and place it into a single location. Then I could make the fix in one place. I pushed this code into linux-next to let it settle over a week, and before doing so, I cross compiled all the affected architectures to make sure that they built fine. In the mean time, I also pulled in a patch that fixes the sched_switch previous tasks state output, that was not actually correct" * tag 'trace-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: sched, trace: Fix prev_state output in sched_switch tracepoint function_graph: Have profiler use curr_ret_stack and not depth function_graph: Reverse the order of pushing the ret_stack and the callback function_graph: Move return callback before update of curr_ret_stack function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack function_graph: Make ftrace_push_return_trace() static sparc/function_graph: Simplify with function_graph_enter() sh/function_graph: Simplify with function_graph_enter() s390/function_graph: Simplify with function_graph_enter() riscv/function_graph: Simplify with function_graph_enter() powerpc/function_graph: Simplify with function_graph_enter() parisc: function_graph: Simplify with function_graph_enter() nds32: function_graph: Simplify with function_graph_enter() MIPS: function_graph: Simplify with function_graph_enter() microblaze: function_graph: Simplify with function_graph_enter() arm64: function_graph: Simplify with function_graph_enter() ARM: function_graph: Simplify with function_graph_enter() x86/function_graph: Simplify with function_graph_enter() function_graph: Create function_graph_enter() to consolidate architecture code
2018-11-30Merge tag 'pstore-v4.20-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fix from Kees Cook: "Fix corrupted compression due to unlucky size choice with ECC" * tag 'pstore-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Correctly calculate usable PRZ bytes
2018-11-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "This is a bit later than usual for our first -rc but I'm not seeing anything worry-some in the RDMA tree right now. Quiet so far this -rc cycle, only a few internal driver related bugs and a small series fixing ODP bugs found by more advanced testing. A set of small driver and core code fixes: - Small series fixing longtime user triggerable bugs in the ODP processing inside mlx5 and core code - Various small driver malfunctions and crashes (use after, free, error unwind, implementation bugs) - A misfunction of the RDMA GID cache that can be triggered by the administrator" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/mlx5: Initialize return variable in case pagefault was skipped IB/mlx5: Fix page fault handling for MW IB/umem: Set correct address to the invalidation function IB/mlx5: Skip non-ODP MR when handling a page fault RDMA/hns: Bugfix pbl configuration for rereg mr iser: set sector for ambiguous mr status errors RDMA/rdmavt: Fix rvt_create_ah function signature IB/mlx5: Avoid load failure due to unknown link width IB/mlx5: Fix XRC QP support after introducing extended atomic RDMA/bnxt_re: Avoid accessing the device structure after it is freed RDMA/bnxt_re: Fix system hang when registration with L2 driver fails RDMA/core: Add GIDs while changing MAC addr only for registered ndev RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WR net/mlx5: Fix XRC SRQ umem valid bits
2018-11-29tracepoint: Use __idx instead of idx in DO_TRACE macro to make it uniqueZenghui Yu
After enabling KVM event tracing, almost all of trace_kvm_exit()'s printk shows "kvm_exit: IRQ: ..." even if the actual exception_type is NOT IRQ. More specifically, trace_kvm_exit() is defined in virt/kvm/arm/trace.h by TRACE_EVENT. This slight problem may have existed after commit e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU"). There are two variables in trace_kvm_exit() and __DO_TRACE() which have the same name, *idx*. Thus the actual value of *idx* will be overwritten when tracing. Fix it by adding a simple prefix. Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Wang Haibin <wanghaibin.wang@huawei.com> Cc: linux-trace-devel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU") Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-29pstore/ram: Correctly calculate usable PRZ bytesKees Cook
The actual number of bytes stored in a PRZ is smaller than the bytes requested by platform data, since there is a header on each PRZ. Additionally, if ECC is enabled, there are trailing bytes used as well. Normally this mismatch doesn't matter since PRZs are circular buffers and the leading "overflow" bytes are just thrown away. However, in the case of a compressed record, this rather badly corrupts the results. This corruption was visible with "ramoops.mem_size=204800 ramoops.ecc=1". Any stored crashes would not be uncompressable (producing a pstorefs "dmesg-*.enc.z" file), and triggering errors at boot: [ 2.790759] pstore: crypto_comp_decompress failed, ret = -22! Backporting this depends on commit 70ad35db3321 ("pstore: Convert console write to use ->write_buf") Reported-by: Joel Fernandes <joel@joelfernandes.org> Fixes: b0aad7a99c1d ("pstore: Add compression support to pstore") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-11-29Merge tag 'sound-4.20-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "As a usual pattern, we've got relatively large updates at rc5: - A fix for races in ALSA control user elements - ASoC DAPM regression due to component refactoring - A fix in error handling of ASoC iteration macro - ASoC Intel SST Skylake kconfig fix; a new Kconfig will appear as a consequence, but in the end it's a good cleanup - HD-audio and USB-audio quirks as always - Assort of ASoC driver fixes (pcm186x, Intel cht, rockchip, pcm3060, rsnd, omap, wm_adsp, qcom, sunxi, stm32)" * tag 'sound-4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (34 commits) ALSA: usb-audio: Add vendor and product name for Dell WD19 Dock ALSA: hda/realtek - Support ALC300 ALSA: hda/realtek - Add auto-mute quirk for HP Spectre x360 laptop ALSA: hda/realtek - fix the pop noise on headphone for lenovo laptops ALSA: control: Fix race between adding and removing a user element ALSA: sparc: Fix invalid snd_free_pages() at error path ALSA: wss: Fix invalid snd_free_pages() at error path ALSA: hda/realtek - fix headset mic detection for MSI MS-B171 ALSA: hda: Add ASRock N68C-S UCC the power_save blacklist ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write ASoC: omap-dmic: Add pm_qos handling to avoid overruns with CPU_IDLE ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE ASoC: omap-mcbsp: Fix latency value calculation for pm_qos ASoC: acpi: fix: continue searching when machine is ignored ASoC: Intel: Skylake: fix Kconfigs, make HDaudio codec optional MAINTAINERS: add ASoC maintainers for sound dt-bindings ASoC: pcm186x: Fix device reset-registers trigger value ASoC: dapm: Recalculate audio map forcely when card instantiated ASoC: omap-abe-twl6040: Fix missing audio card caused by deferred probing ASoC: pcm3060: Rename output widgets ...
2018-11-29blk-mq: add mq_ops->commit_rqs()Jens Axboe
blk-mq passes information to the hardware about any given request being the last that we will issue in this sequence. The point is that hardware can defer costly doorbell type writes to the last request. But if we run into errors issuing a sequence of requests, we may never send the request with bd->last == true set. For that case, we need a hook that tells the hardware that nothing else is coming right now. For failures returned by the drivers ->queue_rq() hook, the driver is responsible for flushing pending requests, if it uses bd->last to optimize that part. This works like before, no changes there. Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-29block: improve logic around when to sort a plug listJens Axboe
Only do it if we have requests for multiple queues in the same plug. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-29Revert "xen/balloon: Mark unallocated host memory as UNUSABLE"Igor Druzhinin
This reverts commit b3cf8528bb21febb650a7ecbf080d0647be40b9f. That commit unintentionally broke Xen balloon memory hotplug with "hotplug_unpopulated" set to 1. As long as "System RAM" resource got assigned under a new "Unusable memory" resource in IO/Mem tree any attempt to online this memory would fail due to general kernel restrictions on having "System RAM" resources as 1st level only. The original issue that commit has tried to workaround fa564ad96366 ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") also got amended by the following 03a551734 ("x86/PCI: Move and shrink AMD 64-bit window to avoid conflict") which made the original fix to Xen ballooning unnecessary. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) ARM64 JIT fixes for subprog handling from Daniel Borkmann. 2) Various sparc64 JIT bug fixes (fused branch convergance, frame pointer usage detection logic, PSEODU call argument handling). 3) Fix to use BH locking in nf_conncount, from Taehee Yoo. 4) Fix race of TX skb freeing in ipheth driver, from Bernd Eckstein. 5) Handle return value of TX NAPI completion properly in lan743x driver, from Bryan Whitehead. 6) MAC filter deletion in i40e driver clears wrong state bit, from Lihong Yang. 7) Fix use after free in rionet driver, from Pan Bian. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits) s390/qeth: fix length check in SNMP processing net: hisilicon: remove unexpected free_netdev rapidio/rionet: do not free skb before reading its length i40e: fix kerneldoc for xsk methods ixgbe: recognize 1000BaseLX SFP modules as 1Gbps i40e: Fix deletion of MAC filters igb: fix uninitialized variables netfilter: nf_tables: deactivate expressions in rule replecement routine lan743x: Enable driver to work with LAN7431 tipc: fix lockdep warning during node delete lan743x: fix return value for lan743x_tx_napi_poll net: via: via-velocity: fix spelling mistake "alignement" -> "alignment" qed: fix spelling mistake "attnetion" -> "attention" net: thunderx: fix NULL pointer dereference in nic_remove sctp: increase sk_wmem_alloc when head->truesize is increased firestream: fix spelling mistake: "Inititing" -> "Initializing" net: phy: add workaround for issue where PHY driver doesn't bind to the device usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2 sparc: Adjust bpf JIT prologue for PSEUDO calls. bpf, doc: add entries of who looks over which jits ...
2018-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Disable BH while holding list spinlock in nf_conncount, from Taehee Yoo. 2) List corruption in nf_conncount, also from Taehee. 3) Fix race that results in leaving around an empty list node in nf_conncount, from Taehee Yoo. 4) Proper chain handling for inactive chains from the commit path, from Florian Westphal. This includes a selftest for this. 5) Do duplicate rule handles when replacing rules, also from Florian. 6) Remove net_exit path in xt_RATEEST that results in splat, from Taehee. 7) Possible use-after-free in nft_compat when releasing extensions. From Florian. 8) Memory leak in xt_hashlimit, from Taehee. 9) Call ip_vs_dst_notifier after ipv6_dev_notf, from Xin Long. 10) Fix cttimeout with udplite and gre, from Florian. 11) Preserve oif for IPv6 link-local generated traffic from mangle table, from Alin Nastac. 12) Missing error handling in masquerade notifiers, from Taehee Yoo. 13) Use mutex to protect registration/unregistration of masquerade extensions in order to prevent a race, from Taehee. 14) Incorrect condition check in tree_nodes_free(), also from Taehee. 15) Fix chain counter leak in rule replacement path, from Taehee. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28block: use rcu_work instead of call_rcu to avoid sleep in softirqYufen Yu
We recently got a stack by syzkaller like this: BUG: sleeping function called from invalid context at mm/slab.h:361 in_atomic(): 1, irqs_disabled(): 0, pid: 6644, name: blkid INFO: lockdep is turned off. CPU: 1 PID: 6644 Comm: blkid Not tainted 4.4.163-514.55.6.9.x86_64+ #76 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 0000000000000000 5ba6a6b879e50c00 ffff8801f6b07b10 ffffffff81cb2194 0000000041b58ab3 ffffffff833c7745 ffffffff81cb2080 5ba6a6b879e50c00 0000000000000000 0000000000000001 0000000000000004 0000000000000000 Call Trace: <IRQ> [<ffffffff81cb2194>] __dump_stack lib/dump_stack.c:15 [inline] <IRQ> [<ffffffff81cb2194>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51 [<ffffffff8129a981>] ___might_sleep+0x291/0x490 kernel/sched/core.c:7675 [<ffffffff8129ac33>] __might_sleep+0xb3/0x270 kernel/sched/core.c:7637 [<ffffffff81794c13>] slab_pre_alloc_hook mm/slab.h:361 [inline] [<ffffffff81794c13>] slab_alloc_node mm/slub.c:2610 [inline] [<ffffffff81794c13>] slab_alloc mm/slub.c:2692 [inline] [<ffffffff81794c13>] kmem_cache_alloc_trace+0x2c3/0x5c0 mm/slub.c:2709 [<ffffffff81cbe9a7>] kmalloc include/linux/slab.h:479 [inline] [<ffffffff81cbe9a7>] kzalloc include/linux/slab.h:623 [inline] [<ffffffff81cbe9a7>] kobject_uevent_env+0x2c7/0x1150 lib/kobject_uevent.c:227 [<ffffffff81cbf84f>] kobject_uevent+0x1f/0x30 lib/kobject_uevent.c:374 [<ffffffff81cbb5b9>] kobject_cleanup lib/kobject.c:633 [inline] [<ffffffff81cbb5b9>] kobject_release+0x229/0x440 lib/kobject.c:675 [<ffffffff81cbb0a2>] kref_sub include/linux/kref.h:73 [inline] [<ffffffff81cbb0a2>] kref_put include/linux/kref.h:98 [inline] [<ffffffff81cbb0a2>] kobject_put+0x72/0xd0 lib/kobject.c:692 [<ffffffff8216f095>] put_device+0x25/0x30 drivers/base/core.c:1237 [<ffffffff81c4cc34>] delete_partition_rcu_cb+0x1d4/0x2f0 block/partition-generic.c:232 [<ffffffff813c08bc>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline] [<ffffffff813c08bc>] rcu_do_batch kernel/rcu/tree.c:2705 [inline] [<ffffffff813c08bc>] invoke_rcu_callbacks kernel/rcu/tree.c:2973 [inline] [<ffffffff813c08bc>] __rcu_process_callbacks kernel/rcu/tree.c:2940 [inline] [<ffffffff813c08bc>] rcu_process_callbacks+0x59c/0x1c70 kernel/rcu/tree.c:2957 [<ffffffff8120f509>] __do_softirq+0x299/0xe20 kernel/softirq.c:273 [<ffffffff81210496>] invoke_softirq kernel/softirq.c:350 [inline] [<ffffffff81210496>] irq_exit+0x216/0x2c0 kernel/softirq.c:391 [<ffffffff82c2cd7b>] exiting_irq arch/x86/include/asm/apic.h:652 [inline] [<ffffffff82c2cd7b>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926 [<ffffffff82c2bc25>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:746 <EOI> [<ffffffff814cbf40>] ? audit_kill_trees+0x180/0x180 [<ffffffff8187d2f7>] fd_install+0x57/0x80 fs/file.c:626 [<ffffffff8180989e>] do_sys_open+0x45e/0x550 fs/open.c:1043 [<ffffffff818099c2>] SYSC_open fs/open.c:1055 [inline] [<ffffffff818099c2>] SyS_open+0x32/0x40 fs/open.c:1050 [<ffffffff82c299e1>] entry_SYSCALL_64_fastpath+0x1e/0x9a In softirq context, we call rcu callback function delete_partition_rcu_cb(), which may allocate memory by kzalloc with GFP_KERNEL flag. If the allocation cannot be satisfied, it may sleep. However, That is not allowed in softirq contex. Although we found this problem on linux 4.4, the latest kernel version seems to have this problem as well. And it is very similar to the previous one: https://lkml.org/lkml/2018/7/9/391 Fix it by using RCU workqueue, which allows sleep. Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-28fscache: Fix race in fscache_op_complete() due to split atomic_sub & readkiran.modukuri
The code in fscache_retrieval_complete is using atomic_sub followed by an atomic_read: atomic_sub(n_pages, &op->n_pages); if (atomic_read(&op->n_pages) <= 0) fscache_op_complete(&op->op, true); This causes two threads doing a decrement of n_pages to race with each other seeing the op->refcount 0 at same time - and they end up calling fscache_op_complete() in both the threads leading to an assertion failure. Fix this by using atomic_sub_return_relaxed() instead of two calls. Note that I'm using 'relaxed' rather than, say, 'release' as there aren't multiple variables that appear to need ordering across the release. The oops looks something like: FS-Cache: Assertion failed FS-Cache: 0 > 0 is false ... kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449! ... Workqueue: fscache_operation fscache_op_work_func [fscache] ... RIP: 0010:[<ffffffffc037eacd>] fscache_op_complete+0x10d/0x180 [fscache] ... Call Trace: [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles] [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache] [<ffffffff81096da0>] process_one_work+0x150/0x3f0 [<ffffffff8109751a>] worker_thread+0x11a/0x470 [<ffffffff81808e59>] ? __schedule+0x359/0x980 [<ffffffff81097400>] ? rescuer_thread+0x310/0x310 [<ffffffff8109cdd6>] kthread+0xd6/0xf0 [<ffffffff8109cd00>] ? kthread_park+0x60/0x60 [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70 [<ffffffff8109cd00>] ? kthread_park+0x60/0x60 This seen this in 4.4.x kernels and the same bug affects fscache in latest upstreams kernels. Fixes: 1bb4b7f98f36 ("FS-Cache: The retrieval remaining-pages counter needs to be atomic_t") Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-28x86/speculation: Add prctl() control for indirect branch speculationThomas Gleixner
Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of indirect branch speculation via STIBP and IBPB. Invocations: Check indirect branch speculation status with - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0); Enable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0); Disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0); Force disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0); See Documentation/userspace-api/spec_ctrl.rst. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de
2018-11-28ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRSThomas Gleixner
The IBPB control code in x86 removed the usage. Remove the functionality which was introduced for this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.de
2018-11-28x86/speculation: Rework SMT state changeThomas Gleixner
arch_smt_update() is only called when the sysfs SMT control knob is changed. This means that when SMT is enabled in the sysfs control knob the system is considered to have SMT active even if all siblings are offline. To allow finegrained control of the speculation mitigations, the actual SMT state is more interesting than the fact that siblings could be enabled. Rework the code, so arch_smt_update() is invoked from each individual CPU hotplug function, and simplify the update function while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
2018-11-28sched/smt: Expose sched_smt_present static keyThomas Gleixner
Make the scheduler's 'sched_smt_present' static key globaly available, so it can be used in the x86 speculation control code. Provide a query function and a stub for the CONFIG_SMP=n case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de
2018-11-27sched, trace: Fix prev_state output in sched_switch tracepointPavankumar Kondeti
commit 3f5fe9fef5b2 ("sched/debug: Fix task state recording/printout") tried to fix the problem introduced by a previous commit efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing"). However the prev_state output in sched_switch is still broken. task_state_index() uses fls() which considers the LSB as 1. Left shifting 1 by this value gives an incorrect mapping to the task state. Fix this by decrementing the value returned by __get_task_state() before shifting. Link: http://lkml.kernel.org/r/1540882473-1103-1-git-send-email-pkondeti@codeaurora.org Cc: stable@vger.kernel.org Fixes: 3f5fe9fef5b2 ("sched/debug: Fix task state recording/printout") Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-27function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stackSteven Rostedt (VMware)
Currently, the depth of the ret_stack is determined by curr_ret_stack index. The issue is that there's a race between setting of the curr_ret_stack and calling of the callback attached to the return of the function. Commit 03274a3ffb44 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") moved the calling of the callback to after the setting of the curr_ret_stack, even stating that it was safe to do so, when in fact, it was the reason there was a barrier() there (yes, I should have commented that barrier()). Not only does the curr_ret_stack keep track of the current call graph depth, it also keeps the ret_stack content from being overwritten by new data. The function profiler, uses the "subtime" variable of ret_stack structure and by moving the curr_ret_stack, it allows for interrupts to use the same structure it was using, corrupting the data, and breaking the profiler. To fix this, there needs to be two variables to handle the call stack depth and the pointer to where the ret_stack is being used, as they need to change at two different locations. Cc: stable@kernel.org Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-27function_graph: Make ftrace_push_return_trace() staticSteven Rostedt (VMware)
As all architectures now call function_graph_enter() to do the entry work, no architecture should ever call ftrace_push_return_trace(). Make it static. This is needed to prepare for a fix of a design bug on how the curr_ret_stack is used. Cc: stable@kernel.org Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-27Merge tag 'asoc-v4.20-rc4' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v4.20 Lots of fixes here, the majority of which are driver specific but there's a couple of core things and one notable driver specific one: - A core fix for a DAPM regression introduced during the component refactoring, we'd lost the code that forced a reevaluation of the DAPM graph after probe (which we suppress during init to save lots of recalcuation) and have now restored it. - A core fix for error handling using the newly added for_each_rtd_codec_dai_rollback() macro. - A fix for the names of widgets in the newly introduced pcm3060 driver, merged as a fix so we don't have a release with legacy names.
2018-11-26bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addrDaniel Borkmann
Make fetching of the BPF call address from ppc64 JIT generic. ppc64 was using a slightly different variant rather than through the insns' imm field encoding as the target address would not fit into that space. Therefore, the target subprog number was encoded into the insns' offset and fetched through fp->aux->func[off]->bpf_func instead. Given there are other JITs with this issue and the mechanism of fetching the address is JIT-generic, move it into the core as a helper instead. On the JIT side, we get information on whether the retrieved address is a fixed one, that is, not changing through JIT passes, or a dynamic one. For the former, JITs can optimize their imm emission because this doesn't change jump offsets throughout JIT process. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Tested-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-27netfilter: add missing error handling code for register functionsTaehee Yoo
register_{netdevice/inetaddr/inet6addr}_notifier may return an error value, this patch adds the code to handle these error paths. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>