Age | Commit message (Collapse) | Author |
|
If the radix tree underlying the IDR happens to be full and we attempt
to remove an id which is larger than any id in the IDR, we will call
__radix_tree_delete() with an uninitialised 'slot' pointer, at which
point anything could happen. This was easiest to hit with a single
entry at id 0 and attempting to remove a non-0 id, but it could have
happened with 64 entries and attempting to remove an id >= 64.
Roman said:
The syzcaller test boils down to opening /dev/kvm, creating an
eventfd, and calling a couple of KVM ioctls. None of this requires
superuser. And the result is dereferencing an uninitialized pointer
which is likely a crash. The specific path caught by syzbot is via
KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
other user-triggerable paths, so cc:stable is probably justified.
Matthew added:
We have around 250 calls to idr_remove() in the kernel today. Many of
them pass an ID which is embedded in the object they're removing, so
they're safe. Picking a few likely candidates:
drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
drivers/atm/nicstar.c could be taken down by a handcrafted packet
Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
Debugged-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
incorrect bio"
This reverts commit ba16ddfbeb9d ("ocfs2/o2hb: check len for
bio_add_page() to avoid getting incorrect bio").
In my testing, this patch introduces a problem that mkfs can't have
slots more than 16 with 4k block size.
And the original logic is safe actually with the situation it mentions
so revert this commit.
Attach test log:
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start = 0
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio failed, page ffffea0002d7ed40, len 0, vec_len 4096, vec_start 0,bi_sector 8192
(mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5
(mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5
(mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5
Link: http://lkml.kernel.org/r/SIXPR06MB0461721F398A5A92FC68C39ED5920@SIXPR06MB0461.apcprd06.prod.outlook.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If swapon() fails after incrementing nr_rotate_swap, we don't decrement
it and thus effectively leak it. Make sure we decrement it if we
incremented it.
Link: http://lkml.kernel.org/r/b6fe6b879f17fa68eee6cbd876f459f6e5e33495.1526491581.git.osandov@fb.com
Fixes: 81a0298bdfab ("mm, swap: don't use VMA based swap readahead if HDD is used as swap")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
These two functions now trigger a warning when CONFIG_PROC_FS is disabled:
fs/xfs/xfs_stats.c:128:12: error: 'xqmstat_proc_show' defined but not used [-Werror=unused-function]
static int xqmstat_proc_show(struct seq_file *m, void *v)
^~~~~~~~~~~~~~~~~
fs/xfs/xfs_stats.c:118:12: error: 'xqm_proc_show' defined but not used [-Werror=unused-function]
static int xqm_proc_show(struct seq_file *m, void *v)
^~~~~~~~~~~~~
Previously, they were referenced from an unused 'static const' structure,
which is silently dropped by gcc.
We can address the warning by adding the same #ifdef around them that
hides the reference.
Fixes: 3f3942aca6da ("proc: introduce proc_create_single{,_data}")
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into fixes
Qualcomm Fixes for 4.17-rc7
* Fix crash in qcom_scm_call_atomic1()
* tag 'qcom-fixes-for-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux:
firmware: qcom: scm: Fix crash in qcom_scm_call_atomic1()
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes
Allwinner fixes for 4.17
Here is a bunch of fixes for merge issues, typos and wrong clocks being
described for simplefb, resulting in non-working displays.
* tag 'sunxi-fixes-for-4.17' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
ARM: sun8i: v3s: fix spelling mistake: "disbaled" -> "disabled"
ARM: dts: sun4i: Fix incorrect clocks for displays
ARM: dts: sun8i: h3: Re-enable EMAC on Orange Pi One
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
In its current state, the driver will handle backing device
login in a loop for a certain number of retries while the
device returns a partial success, indicating that the driver
may need to try again using a smaller number of resources.
The variable it checks to continue retrying may change
over the course of operations, resulting in reallocation
of resources but exits without sending the login attempt.
Guard against this by introducing a boolean variable that
will retain the state indicating that the driver needs to
reattempt login with backing device firmware.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The hardware monitoring mailing list should be copied for changes
in hwmon devicetree bindings.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-05-24
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a bug in the original fix to prevent out of bounds speculation when
multiple tail call maps from different branches or calls end up at the
same tail call helper invocation, from Daniel.
2) Two selftest fixes, one in reuseport_bpf_numa where test is skipped in
case of missing numa support and another one to update kernel config to
properly support xdp_meta.sh test, from Anders.
...
Would be great if you have a chance to merge net into net-next after that.
The verifier fix would be needed later as a dependency in bpf-next for
upcomig work there. When you do the merge there's a trivial conflict on
BPF side with 849fa50662fb ("bpf/verifier: refine retval R0 state for
bpf_get_stack helper"): Resolution is to keep both functions, the
do_refine_retval_range() and record_func_map().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the hypercall was called from userspace or real mode, KVM injects #UD
and then advances RIP, so it looks like #UD was caused by the following
instruction. This probably won't cause more than confusion, but could
give an unexpected access to guest OS' instruction emulator.
Also, refactor the code to count hv hypercalls that were handled by the
virt userspace.
Fixes: 6356ee0c9602 ("x86: Delay skip of emulated hypercall instruction")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
PMTU tests in pmtu.sh need support for VTI, VTI6 and dummy
interfaces: add them to config file.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: d1f1b9cbf34c ("selftests: net: Introduce first PMTU test")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- prevent hardif_put call with NULL parameter, by Colin Ian King
- Avoid race in Translation Table allocator, by Sven Eckelmann
- Fix Translation Table sync flags for intermediate Responses,
by Linus Luessing
- prevent sending inconsistent Translation Table TVLVs,
by Marek Lindner
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As per SLIMBus specs Value Elements and Information Elements
address map ranges from 0x000 - 0xFFF.
So allow register addresses up to 16 bits
Fixes: 7d6f7fb053ad ("regmap: add SLIMbus support")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
For some reason the devm variant of slimbus init is not added
into the header eventhough this __devm_regmap_init_slimbus()
is an exported function.
This patch adds this. This also fixes below warning in regmap-slimbus.c
regmap-slimbus.c:65:15: warning: symbol '__devm_regmap_init_slimbus'
was not declared. Should it be static?
regmap-slimbus.c:65:16: warning: no previous prototype for
'__devm_regmap_init_slimbus' [-Wmissing-prototypes]
Fixes: 7d6f7fb053ad ("regmap: add SLIMbus support")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The recent changes in Broadcom's ethernet driver(L2 driver) broke
RoCE functionality in terms of MSIx vector allocation and
de-allocation.
There is a possibility that L2 driver would initiate MSIx vector
reallocation depending upon the requests coming from administrator.
In such cases L2 driver needs to free up all the MSIx vectors
allocated previously and reallocate/initialize those.
If RoCE driver is loaded and reshuffling is attempted, there will be
kernel crashes because RoCE driver would still be holding the MSIx
vectors but L2 driver would attempt to free in-use vectors. Thus
leading to a kernel crash.
Making changes in roce driver to fix crashes described above.
As part of solution L2 driver tells RoCE driver to release
the MSIx vector whenever there is a need. When RoCE driver
get message it sync up with all the running tasklets and IRQ
handlers and releases the vectors. L2 driver send one more
message to RoCE driver to resume the MSIx vectors. L2 driver
guarantees that RoCE vector do not change during reshuffling.
Fixes: ec86f14ea506 ("bnxt_en: Add ULP calls to stop and restart IRQs.")
Fixes: 08654eb213a8 ("bnxt_en: Change IRQ assignment for RDMA driver.")
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 fixes from Will Deacon:
- fix application of read-only permissions to kernel section mappings
- sanitise reported ESR values for signals delivered on a kernel
address
- ensure tishift GCC helpers are exported to modules
- fix inline asm constraints for some LSE atomics
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Make sure permission updates happen for pmd/pud
arm64: fault: Don't leak data in ESR context for user fault on kernel VA
arm64: export tishift functions to modules
arm64: lse: Add early clobbers to some input/output asm operands
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"Just one fix, to make sure the PCR (Processor Compatibility Register)
is reset on boot.
Otherwise if we're running in compat mode in a guest (eg. pretending a
Power9 is a Power8) and the host kernel oopses and kdumps then the
kdump kernel's userspace will be running in Power8 mode, and will
SIGILL if it uses Power9-only instructions.
Thanks to Michael Neuling"
* tag 'powerpc-4.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Clear PCR on boot
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Propagate correct error code for RPMB requests
MMC host:
- sdhci-iproc: Drop hard coded cap for 1.8v
- sdhci-iproc: Fix 32bit writes for transfer mode
- sdhci-iproc: Enable SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus"
* tag 'mmc-v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
mmc: block: propagate correct returned value in mmc_rpmb_ioctl
|
|
git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Only two sets of drivers fixes: one rcar-du lvds regression fix, and a
group of fixes for vmwgfx"
* tag 'drm-fixes-for-v4.17-rc7' of git://people.freedesktop.org/~airlied/linux:
drm/vmwgfx: Schedule an fb dirty update after resume
drm/vmwgfx: Fix host logging / guestinfo reading error paths
drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
drm: rcar-du: lvds: Fix crash in .atomic_check when disabling connector
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Two fixes:
- a timer pause event notification was garbled upon the recent
hardening work; corrected now
- HD-audio runtime PM regression fix due to the incorrect return
type"
* tag 'sound-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix runtime PM
ALSA: timer: Fix pause event notification
|
|
Commit d5c435df4a890 ("intel_th: msu: Use the real device in case of IOMMU
domain allocation") changes dma buffer allocation to use the actual
underlying device, but forgets to change the deallocation path, which leads
to (if you've got CAP_SYS_RAWIO):
> # echo 0,0 > /sys/bus/intel_th/devices/0-msc0/nr_pages
> ------------[ cut here ]------------
> kernel BUG at ../linux/drivers/iommu/intel-iommu.c:3670!
> CPU: 3 PID: 231 Comm: sh Not tainted 4.17.0-rc1+ #2729
> RIP: 0010:intel_unmap+0x11e/0x130
...
> Call Trace:
> intel_free_coherent+0x3e/0x60
> msc_buffer_win_free+0x100/0x160 [intel_th_msu]
This patch fixes the buffer deallocation code to use the correct device.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Fixes: d5c435df4a890 ("intel_th: msu: Use the real device in case of IOMMU domain allocation")
Reported-by: Baofeng Tian <baofeng.tian@intel.com>
CC: stable@vger.kernel.org # v4.14+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fengguang is running into a warning from the buddy allocator:
> swapper/0: page allocation failure: order:9, mode:0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc1 #262
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> Call Trace:
...
> __kmalloc+0x14b/0x180: ____cache_alloc at mm/slab.c:3127
> stm_register_device+0xf3/0x5c0: stm_register_device at drivers/hwtracing/stm/core.c:695
...
Which is basically a result of the stm class trying to allocate ~512kB
for the dummy_stm with its default parameters. There's no reason, however,
for it not to be vmalloc()ed instead, which is what this patch does.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
CC: stable@vger.kernel.org # v4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
- The description of 'blocking' is missing in null_blk.txt
- The 'lightnvm' parameter has been removed in null_blk.c
This updates both in null_blk.txt.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If nvme_get_effects_log() failed the 'id' buffer from the previous
nvme_identify_ctrl() call will never be freed.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The host nqn actually is smaller than the space reserved for it,
so we should be using strlcpy to keep KASAN happy.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check
if a command contains data to me mapped. This fixes the case where
a struct requests contains LBAs, but no data will actually be send,
e.g. the pending Write Zeroes support.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Todays limit on concurrent LS's is very small - 4 buffers. With large
subsystem counts or large numbers of initiators connecting, the limit
may be exceeded.
Raise the LS buffer count to 256.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This patch adds simple file backed namespace support for NVMeOF target.
The new file io-cmd-file.c is responsible for handling the code for I/O
commands when ns is file backed. Also, we introduce mempools based slow
path using sync I/Os for file backed ns to ensure forward progress under
reclaim.
The old block device based implementation is moved to io-cmd-bdev.c and
use a "nvmet_bdev_" symbol prefix. The enable/disable calls are also
move into the respective files.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: updated changelog, fixed double req->ns lookup in bdev case]
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Remove the duplicate NULL initialization for req->ns. req->ns is always
initialized to NULL in nvmet_req_init(), so there is no need to reset
it later on failures unless we have previously assigned a value to it.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
"nvmet_check_ctrl_status()" is called from admin-cmd.c along
with io-cmd.c, make the error message more generic.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The whole point of the discovery controller is that it can accept
multiple connections. Additionally the cmic field is not even defined for
the discovery controller identify page.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
When connecting to the discovery controller we have certain defaults
to observe, so centralize them to avoid inconsistencies due to argument
ordering.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
After creating the nvme controller, nvmf_create_ctrl() validates
the newly created subsysnqn vs the one specified by the options.
In general, this is an unnecessary check as the Connect message
should implicitly ensure this value matches.
With the change to the FC transport to do an asynchronous connect
for the first association create, the transport will return to
nvmf_create_ctrl() before that first association has been established,
thus the subnqn will not yet be set.
Remove the unnecessary validation.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Current code will set DNR if the controller is deleting or there is
an error during controller init. None of this is necessary.
Remove the code that sets DNR
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
For any failure after nvme_rdma_start_queue in
nvme_rdma_configure_admin_queue, the admin queue will be freed with the
NVME_RDMA_Q_LIVE flag still set. Once nvme_rdma_stop_queue is invoked,
that will cause a use-after-free.
BUG: KASAN: use-after-free in rdma_disconnect+0x1f/0xe0 [rdma_cm]
To fix it, call nvme_rdma_stop_queue for all the failed cases after
nvme_rdma_start_queue.
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The nvme timeout handling doesn't do anything if the pci channel is
offline, which is the case when recovering from PCI error event, so it
was a bad idea to sync the controller reset in this state. This patch
flushes the reset work in the error_resume callback instead when the
channel is back to online. This keeps AER handling serialized and
can recover from timeouts.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199757
Fixes: cc1d5e749a2e ("nvme/pci: Sync controller reset for AER slot_reset")
Reported-by: Alex Gagniuc <mr.nuke.me@gmail.com>
Tested-by: Alex Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Set cq_vector after alloc cq/sq, otherwise nvme_suspend_queue will invoke
free_irq for it and cause a 'Trying to free already-free IRQ xxx'
warning if the create CQ/SQ command times out.
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
[hch: fixed to pass a s16 and clean up the comment]
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
When a system is under memory presure (high usage with fragments),
the original 256KB ICM chunk allocations will likely trigger kernel
memory management to enter slow path doing memory compact/migration
ops in order to complete high order memory allocations.
When that happens, user processes calling uverb APIs may get stuck
for more than 120s easily even though there are a lot of free pages
in smaller chunks available in the system.
Syslog:
...
Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
oracle_205573_e:205573 blocked for more than 120 seconds.
...
With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
However in order to support smaller ICM chunk size, we need to fix
another issue in large size kcalloc allocations.
E.g.
Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
entry). So we need a 16MB allocation for a table->icm pointer array to
hold 2M pointers which can easily cause kcalloc to fail.
The solution is to use kvzalloc to replace kcalloc which will fall back
to vmalloc automatically if kmalloc fails.
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Acked-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
On platforms where the Low Power S0 Idle _DSM interface is used,
on wakeup from suspend-to-idle, when it is known that the ACPI SCI
has triggered while suspended, dispatch the EC GPE in order to catch
all EC events that may have triggered the wakeup before carrying out
the noirq phase of device resume.
That is needed to handle power button wakeup on some platforms where
the EC goes into a low-power mode during suspend-to-idle and while in
that mode it will discard events after a timeout. If that timeout is
shorter than the time it takes to complete the noirq resume of
devices, looking for EC events after the latter is too late.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
|
|
Introduce acpi_dispatch_gpe() as a wrapper around acpi_ev_detect_gpe()
for checking if the given GPE (as represented by a GPE device handle
and a GPE number) is currently active and dispatching it (if that's
the case) outside of interrupt context.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add the trivial owner_on_cpu() helper for rwsem_can_spin_on_owner() and
rwsem_spin_on_owner(), it also allows to make rwsem_can_spin_on_owner()
a bit more clear.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Y. Ts'o <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180518165534.GA22348@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Store user space frame-pointer value (BP register) into the perf trace
on a sample for a process so the value becomes available when
unwinding call stacks for functions gaining event samples.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/311d4a34-f81b-5535-3385-01427ac73b41@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
PERF_EVENT_IOC_MODIFY_ATTRIBUTES
Since pointer size is different in compat, and switching in _perf_ioctl
is done using exact ioctl numbers, all new ioctl numbers that use pointer
should be added to perf_compat_ioctl for _IOC_SIZE fixup before passing
to perf_ioctl routine (this shouldn't be needed if semantics of the size
argument of _IO* macros was honored).
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20180521123420.GA24291@asgard.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
As Miklos reported and suggested:
"This pattern repeats two times in trace_uprobe.c and in
kernel/events/core.c as well:
ret = kern_path(filename, LOOKUP_FOLLOW, &path);
if (ret)
goto fail_address_parse;
inode = igrab(d_inode(path.dentry));
path_put(&path);
And it's wrong. You can only hold a reference to the inode if you
have an active ref to the superblock as well (which is normally
through path.mnt) or holding s_umount.
This way unmounting the containing filesystem while the tracepoint is
active will give you the "VFS: Busy inodes after unmount..." message
and a crash when the inode is finally put.
Solution: store path instead of inode."
This patch fixes the issue in kernel/event/core.c.
Reviewed-and-tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <kernel-team@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 375637bc5249 ("perf/core: Introduce address range filtering")
Link: http://lkml.kernel.org/r/20180418062907.3210386-2-songliubraving@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When hw and sw events are mixed in the same group, they are all attached
to the hw perf_event_context. This sometimes requires moving group of
perf_event to a different context.
We found a bug in how the kernel handles this, for example if we do:
perf stat -e '{faults,ref-cycles,faults}' -I 1000
1.005591180 1,297 faults
1.005591180 457,476,576 ref-cycles
1.005591180 <not supported> faults
First, sw event "faults" is attached to the sw context, and becomes the
group leader. Then, hw event "ref-cycles" is attached, so both events
are moved to the hw context. Last, another sw "faults" tries to attach,
but it fails because of mismatch between the new target ctx (from sw
pmu) and the group_leader's ctx (hw context, same as ref-cycles).
The broken condition is:
group_leader is sw event;
group_leader is on hw context;
add a sw event to the group.
Fix this scenario by checking group_leader's context (instead of just
event type). If group_leader is on hw context, use the ->pmu of this
context to look up context for the new event.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <kernel-team@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: b04243ef7006 ("perf: Complete software pmu grouping")
Link: http://lkml.kernel.org/r/20180503194716.162815-1-songliubraving@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When a task is enqueued the estimated utilization of a CPU is updated
to better support the selection of the required frequency.
However, schedutil is (implicitly) updated by update_load_avg() which
always happens before util_est_{en,de}queue(), thus potentially
introducing a latency between estimated utilization updates and
frequency selections.
Let's update util_est at the beginning of enqueue_task_fair(),
which will ensure that all schedutil updates will see the most
updated estimated utilization value for a CPU.
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")
Link: http://lkml.kernel.org/r/20180524141023.13765-3-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
utilization
Since the refactoring introduced by:
commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")
we aggregate FAIR utilization only if this class has runnable tasks.
This was mainly due to avoid the risk to stay on an high frequency just
because of the blocked utilization of a CPU not being properly decayed
while the CPU was idle.
However, since:
commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
the FAIR blocked utilization is properly decayed also for IDLE CPUs.
This allows us to use the FAIR blocked utilization as a safe mechanism
to gracefully reduce the frequency only if no FAIR tasks show up on a
CPU for a reasonable period of time.
Moreover, we also reduce the frequency drops of CPUs running periodic
tasks which, depending on the task periodicity and the time required
for a frequency switch, was increasing the chances to introduce some
undesirable performance variations.
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Link: http://lkml.kernel.org/r/20180524141023.13765-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|