Age | Commit message (Collapse) | Author |
|
Pull NVMe fixes from Christoph:
"Various fixlets all over."
* 'nvme-4.20' of git://git.infradead.org/nvme:
nvme-rdma: fix double freeing of async event data
nvme: flush namespace scanning work just before removing namespaces
nvme: warn when finding multi-port subsystems without multipathing enabled
nvme-pci: fix surprise removal
nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request()
nvme: Free ctrl device name on init failure
|
|
There are actually two kinds of discard merge:
- one is the normal discard merge, just like normal read/write request,
and call it single-range discard
- another is the multi-range discard, queue_max_discard_segments(rq->q) > 1
For the former case, queue_max_discard_segments(rq->q) is 1, and we
should handle this kind of discard merge like the normal read/write
request.
This patch fixes the following kernel panic issue[1], which is caused by
not removing the single-range discard request from elevator queue.
Guangwu has one raid discard test case, in which this issue is a bit
easier to trigger, and I verified that this patch can fix the kernel
panic issue in Guangwu's test case.
[1] kernel panic log from Jens's report
BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
PGD 0 P4D 0.
Oops: 0000 [#1] SMP PTI
CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \
4.20.0-rc3-00649-ge64d9a554a91-dirty #14 Hardware name: Wiwynn \
Leopard-Orv2/Leopard-DDR BW, BIOS LBM08 03/03/2017 Workqueue: kblockd \
blk_mq_run_work_fn RIP: \
0010:blk_mq_get_driver_tag+0x81/0x120 Code: 24 \
10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \
0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \
f6 87 b0 00 00 00 02 RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246 \
RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8
RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000
R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300
R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000
FS: 0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
blk_mq_dispatch_rq_list+0xec/0x480
? elv_rb_del+0x11/0x30
blk_mq_do_dispatch_sched+0x6e/0xf0
blk_mq_sched_dispatch_requests+0xfa/0x170
__blk_mq_run_hw_queue+0x5f/0xe0
process_one_work+0x154/0x350
worker_thread+0x46/0x3c0
kthread+0xf5/0x130
? process_one_work+0x350/0x350
? kthread_destroy_worker+0x50/0x50
ret_from_fork+0x1f/0x30
Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \
kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \
cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \
button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \
nvme_core fuse sg loop efivarfs autofs4 CR2: 0000000000000148 \
---[ end trace 340a1fb996df1b9b ]---
RIP: 0010:blk_mq_get_driver_tag+0x81/0x120
Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \
00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 \
20 72 37 f6 87 b0 00 00 00 02
Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler attached")
Reported-by: Jens Axboe <axboe@kernel.dk>
Cc: Guangwu Zhang <guazhang@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The stackleak_erase() function is called on the trampoline stack at the
end of syscall. This stack is not big enough for ftrace and kprobes
operations, e.g. it can be exhausted if we use kprobe_events for
stackleak_erase().
So let's disable function tracing and kprobes of stackleak_erase().
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 10e9ae9fabaf ("gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack")
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fix from Kees Cook:
"Fix corrupted compression due to unlucky size choice with ECC"
* tag 'pstore-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Correctly calculate usable PRZ bytes
|
|
Similar to the atomic helpers, we should enable vblank while we're
waiting for the commit to finish. DPU needs this, MDP5 seems to work
fine without it.
Reviewed-by: Abhinav Kumar <abhinavk@codeaurora.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
|
|
Currenty the VCO rate in the 10nm PLL driver relies
on the parent rate which is not configured.
Configure the VCO rate to 19.2 Mhz as required by
the 10nm PLL driver.
Signed-off-by: Abhinav Kumar <abhinavk@codeaurora.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
|
|
Pull rdma fixes from Jason Gunthorpe:
"This is a bit later than usual for our first -rc but I'm not seeing
anything worry-some in the RDMA tree right now. Quiet so far this -rc
cycle, only a few internal driver related bugs and a small series
fixing ODP bugs found by more advanced testing.
A set of small driver and core code fixes:
- Small series fixing longtime user triggerable bugs in the ODP
processing inside mlx5 and core code
- Various small driver malfunctions and crashes (use after, free,
error unwind, implementation bugs)
- A misfunction of the RDMA GID cache that can be triggered by the
administrator"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Initialize return variable in case pagefault was skipped
IB/mlx5: Fix page fault handling for MW
IB/umem: Set correct address to the invalidation function
IB/mlx5: Skip non-ODP MR when handling a page fault
RDMA/hns: Bugfix pbl configuration for rereg mr
iser: set sector for ambiguous mr status errors
RDMA/rdmavt: Fix rvt_create_ah function signature
IB/mlx5: Avoid load failure due to unknown link width
IB/mlx5: Fix XRC QP support after introducing extended atomic
RDMA/bnxt_re: Avoid accessing the device structure after it is freed
RDMA/bnxt_re: Fix system hang when registration with L2 driver fails
RDMA/core: Add GIDs while changing MAC addr only for registered ndev
RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WR
net/mlx5: Fix XRC SRQ umem valid bits
|
|
Userspace hasn't used submit cmds with submit_offset != 0 for a while,
but this starts cropping up again with cmdstream sub-buffer-allocation
in libdrm_freedreno.
Doesn't do much good to increment the buf ptr before assigning it.
Fixes: 78b8e5b847b4 drm/msm: dump a rd GPUADDR header for all buffers in the command
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
|
|
The msm_gpu_open() function should free "show_priv" on error or it
causes static checker warnings.
Fixes: 4f776f4511c7 ("drm/msm/gpu: Convert the GPU show function to use the GPU state")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
|
|
The current recovery code gets a pointer to the task struct and does a
few things all within the rcu_read_lock. This puts constraints on the
types of gfp flags that can be used within the rcu lock. This patch
instead gets a reference to the task within the rcu lock and releases
the lock immediately, this way the task stays afloat until we need it and
we also get to use the desired gfp flags.
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
|
|
This patch simply checks first to see if the target can support crash dump
capture before proceeding.
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
|
|
Some error paths in configuration of admin queue free data buffer
associated with async request SQE without resetting the data buffer
pointer to NULL, This buffer is also freed up again if the controller
is shutdown or reset.
Signed-off-by: Prabhath Sajeepa <psajeepa@purestorage.com>
Reviewed-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
nvme_stop_ctrl can be called also for reset flow and there is no need to
flush the scan_work as namespaces are not being removed. This can cause
deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
before controller teardown (and specifically I/O cancellation of the
scan_work itself) takes place, but the scan_work will be blocked anyways
so there is no need to flush it.
Instead, move scan_work flush to nvme_remove_namespaces() where it really
needs to flush.
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed by: James Smart <jsmart2021@gmail.com>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Without CONFIG_NVME_MULTIPATH enabled a multi-port subsystem might
show up as invididual devices and cause problems, warn about it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
|
Variable 'cache' is being assigned but is never used hence it is
redundant and can be removed.
Cleans up clang warning:
warning: variable 'cache' set but not used [-Wunused-but-set-variable]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
get_seconds() returns an unsigned long can overflow on some architectures
and is deprecated because of that. In cachefs, we cast that number to
a a 32-bit integer, which will overflow in year 2106 on all architectures.
As confirmed by David Howells, the overflow probably isn't harmful
in the end, since the timestamps are only used to make the file names
unique, but they don't strictly have to be in monotonically increasing
order since the files only exist in order to be deleted as quickly
as possible.
Moving to ktime_get_real_seconds() avoids the deprecated interface.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Clang warns when one enumerated type is implicitly converted to another.
fs/cachefiles/namei.c:247:50: warning: implicit conversion from
enumeration type 'enum cachefiles_obj_ref_trace' to different
enumeration type 'enum fscache_obj_ref_trace' [-Wenum-conversion]
cache->cache.ops->put_object(&xobject->fscache,
cachefiles_obj_put_wait_retry);
Silence this warning by explicitly casting to fscache_obj_ref_trace,
which is also done in put_object.
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().
At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.
When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared. This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared
There is some uncertainty in this analysis, but it seems to be fit the
observations. Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.
Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.
The backtrace for the blocked process looked like:
PID: 29360 TASK: ffff881ff2ac0f80 CPU: 3 COMMAND: "zsh"
#0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
#1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
#2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
#3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
#4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
#5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
#6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
#7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
#8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
#9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Fix the AUX_CMD_SEND bit for ti,sn65dsi86 bridge chip. With wrong
value the dpcd aux transactions with eDP panel are failing.
Signed-off-by: Sandeep Panda <spanda@codeaurora.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20181130092745.4219-1-spanda@codeaurora.org
|
|
commit e259221763a40403d5bb232209998e8c45804ab8 ("fs: simplify the
generic_write_sync prototype") reworked callers of generic_write_sync(),
and ended up dropping the error return for the directio path. Prior to
that commit, in dio_complete(), an error would be bubbled up the stack,
but after that commit, errors passed on to dio_complete were eaten up.
This was reported on the list earlier, and a fix was proposed in
https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but
never followed up with. We recently hit this bug in our testing where
fencing io errors, which were previously erroring out with EIO, were
being returned as success operations after this commit.
The fix proposed on the list earlier was a little short -- it would have
still called generic_write_sync() in case `ret` already contained an
error. This fix ensures generic_write_sync() is only called when there's
no pending error in the write. Additionally, transferred is replaced
with ret to bring this code in line with other callers.
Fixes: e259221763a4 ("fs: simplify the generic_write_sync prototype")
Reported-by: Ravi Nankani <rnankani@amazon.com>
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: Torsten Mehlan <tomeh@amazon.de>
CC: Uwe Dannowski <uwed@amazon.de>
CC: Amit Shah <aams@amazon.de>
CC: David Woodhouse <dwmw@amazon.co.uk>
CC: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Sending the exact same hotplug event is not great uapi. Luckily the
only already merged implementation of leases (in the -modesetting
driver) doesn't care about what kind of uevent it gets, and
unconditionally processes both hotplug and lease changes. So we can
still adjust the uapi here.
But e.g. weston tries to filter stuff, and I guess others might want
to do that too. Try to make that possible. Cc: stable since it's uapi
adjustement that we want to roll out everywhere.
Michel Dänzer mentioned on irc that -amdgpu also has lease support. It
has the same code flow as -modesetting though, so we can still go
ahead.
v2: Mention -amdgpu (Michel)
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181129094226.30591-1-daniel.vetter@ffwll.ch
|
|
An affected screen resolution is 1366 x 768, which width is not
divisible by 8, the default font width. On such screens, when longer
lines are earlyprintk'ed, overflow-to-next-line can never trigger,
due to the left-most x-coordinate of the next character always less
than the screen width. Earlyprintk will infinite loop in trying to
print the rest of the string but unable to, due to the line being
full.
This patch makes the trigger consider the right-most x-coordinate,
instead of left-most, as the value to compare against the screen
width threshold.
Signed-off-by: YiFei Zhu <zhuyifei1999@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Snowberg <eric.snowberg@oracle.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jon Hunter <jonathanh@nvidia.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-12-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The following commit:
d64934019f6c ("x86/efi: Use efi_exit_boot_services()")
introduced a regression on systems with large memory maps causing them
to hang on boot. The first "goto get_map" that was removed from
exit_boot() ensured there was enough room for the memory map when
efi_call_early(exit_boot_services) was called. This happens when
(nr_desc > ARRAY_SIZE(params->e820_table).
Chain of events:
exit_boot()
efi_exit_boot_services()
efi_get_memory_map <- at this point the mm can't grow over 8 desc
priv_func()
exit_boot_func()
allocate_e820ext() <- new mm grows over 8 desc from e820 alloc
efi_call_early(exit_boot_services) <- mm key doesn't match so retry
efi_call_early(get_memory_map) <- not enough room for new mm
system hangs
This patch allocates the e820 buffer before calling efi_exit_boot_services()
and fixes the regression.
[ mingo: minor cleanliness edits. ]
Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jon Hunter <jonathanh@nvidia.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The tracefs file set_graph_function is used to only function graph functions
that are listed in that file (or all functions if the file is empty). The
way this is implemented is that the function graph tracer looks at every
function, and if the current depth is zero and the function matches
something in the file then it will trace that function. When other functions
are called, the depth will be greater than zero (because the original
function will be at depth zero), and all functions will be traced where the
depth is greater than zero.
The issue is that when a function is first entered, and the handler that
checks this logic is called, the depth is set to zero. If an interrupt comes
in and a function in the interrupt handler is traced, its depth will be
greater than zero and it will automatically be traced, even if the original
function was not. But because the logic only looks at depth it may trace
interrupts when it should not be.
The recent design change of the function graph tracer to fix other bugs
caused the depth to be zero while the function graph callback handler is
being called for a longer time, widening the race of this happening. This
bug was actually there for a longer time, but because the race window was so
small it seldom happened. The Fixes tag below is for the commit that widen
the race window, because that commit belongs to a series that will also help
fix the original bug.
Cc: stable@kernel.org
Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack")
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
After enabling KVM event tracing, almost all of trace_kvm_exit()'s
printk shows
"kvm_exit: IRQ: ..."
even if the actual exception_type is NOT IRQ. More specifically,
trace_kvm_exit() is defined in virt/kvm/arm/trace.h by TRACE_EVENT.
This slight problem may have existed after commit e6753f23d961
("tracepoint: Make rcuidle tracepoint callers use SRCU"). There are
two variables in trace_kvm_exit() and __DO_TRACE() which have the
same name, *idx*. Thus the actual value of *idx* will be overwritten
when tracing. Fix it by adding a simple prefix.
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Wang Haibin <wanghaibin.wang@huawei.com>
Cc: linux-trace-devel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Use d_instantiate() rather than d_add() and don't d_drop() in
afs_vnode_new_inode(). The dentry shouldn't be removed as it's not
changing its name.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
kAFS can be given certain network errors (EADDRNOTAVAIL, EHOSTDOWN and
ERFKILL) that it doesn't handle in its server/address rotation algorithms.
They cause the probing and rotation to abort immediately rather than
rotating.
Fix this by:
(1) Abstracting out the error prioritisation from the VL and FS rotation
algorithms into a common function and expand usage into the server
probing code.
When multiple errors are available, this code selects the one we'd
prefer to return.
(2) Add handling for EADDRNOTAVAIL, EHOSTDOWN and ERFKILL.
Fixes: 0fafdc9f888b ("afs: Fix file locking")
Fixes: 0338747d8454 ("afs: Probe multiple fileservers simultaneously")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When afs_validate() is called to validate a vnode (inode), there are two
unhandled cases in the fastpath at the top of the function:
(1) If the vnode is promised (AFS_VNODE_CB_PROMISED is set), the break
counters match and the data has expired, then there's an implicit case
in which the vnode needs revalidating.
This has no consequences since the default "valid = false" set at the
top of the function happens to do the right thing.
(2) If the vnode is not promised and it hasn't been deleted
(AFS_VNODE_DELETED is not set) then there's a default case we're not
handling in which the vnode is invalid. If the vnode is invalid, we
need to bring cb_s_break and cb_v_break up to date before we refetch
the status.
As a consequence, once the server loses track of the client
(ie. sufficient time has passed since we last sent it an operation),
it will send us a CB.InitCallBackState* operation when we next try to
talk to it. This calls afs_init_callback_state() which increments
afs_server::cb_s_break, but this then doesn't propagate to the
afs_vnode record.
The result being that every afs_validate() call thereafter sends a
status fetch operation to the server.
Clarify and fix this by:
(A) Setting valid in all the branches rather than initialising it at the
top so that the compiler catches where we've missed.
(B) Restructuring the logic in the 'promised' branch so that we set valid
to false if the callback is due to expire (or has expired) and so that
the final case is that the vnode is still valid.
(C) Adding an else-statement that ups cb_s_break and cb_v_break if the
promised and deleted cases don't match.
Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Print a debug message for every async FW event forwarded to mlx5
interfaces (mlx5e netdev and mlx5_ib rdma module).
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Allow forwarding of SRQ events to mlx5_core interfaces, e.g. mlx5_ib.
Use mlx5_notifier_register/unregister in srq.c in order to allow seamless
transition of srq.c to infiniband subsystem.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Allow forwarding QP and WQ events to mlx5_core interfaces, e.g. mlx5_ib
Use mlx5_notifier_register/unregister in qp.c in order to allow seamless
transition of qp.c to infiniband subsystem.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Before the new mlx5 event notification infrastructure and API,
mlx5_core used to process all events before forwarding them to mlx5
interfaces (mlx5e/mlx5_ib) and used to translate the event type enum
to a software defined enum, this is not needed anymore since it is ok
for mlx5e and mlx5_ib to receive FW events as is, at least the few ones
mlx5 core allows.
mlx5e and mlx5_ib already moved to use the new API and they only handle FW
events types, it is now safe to remove all equivalent software defined
events and the logic around them.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Handle FW general event rq delay drop as it was received from FW via mlx5
notifiers API, instead of handling the processed software version of that
event. After this patch we can safely remove all software processed FW
events types and definitions.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
FW general event is used by mlx5_ib for RQ delay drop timeout event
handling, in this patch we allow to forward FW general event type to mlx5
notifiers chain so mlx5_ib can handle it and to deprecate the software
version of it.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Use the FW version of the port change event as forwarded via new mlx5
notifiers API.
After this patch, processed software version of the port change event
will become deprecated and will be totally removed in downstream
patches.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The mlx5_interface->event callback is not used by mlx5e/mlx5_ib anymore.
We totally remove the delayed events logic work around, since with
the dynamic notifier registration API it is not needed anymore, mlx5_ib
can register its notifier and start receiving events exactly at the moment
it is ready to handle them.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Remove the deprecated mlx5_interface->event mlx5_ib callback and use new
mlx5 notifier API to subscribe for mlx5 events.
For native mlx5_ib devices profiles pf_profile/nic_rep_profile register
the notifier callback mlx5_ib_handle_event which treats the notifier
context as mlx5_ib_dev.
For vport repesentors, don't register any notifier, same as before, they
didn't receive any mlx5 events.
For slave port (mlx5_ib_multiport_info) register a different notifier
callback mlx5_ib_event_slave_port, which knows that the event is coming
for mlx5_ib_multiport_info and prepares the event job accordingly.
Before this on the event handler work we had to ask mlx5_core if this is
a slave port mlx5_core_is_mp_slave(work->dev), now it is not needed
anymore.
mlx5_ib_multiport_info notifier registration is done on
mlx5_ib_bind_slave_port and de-registration is done on
mlx5_ib_unbind_slave_port.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
This to allow seamless migration to the new notifier chain API, and to
eventually deprecate interfaces dev->event callback.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Remove the deprecated mlx5_interface->event mlx5e callback and use new
mlx5 notifier API to subscribe for mlx5 events, handle port change event
as received from FW rather than handling the mlx5 core processed port
change software version event.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The idea is to allow mlx5 core interfaces (mlx5e/mlx5_ib) to be able to
receive some allowed FW events as is via the new notifier API.
In this patch we allow forwarding port change event to mlx5 core interfaces
(mlx5e/mlx5_ib) as it was received from FW.
Once mlx5e and mlx5_ib start using this event we can safely remove the
redundant software version of it and its translation logic.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Use atomic notifier chain to fire events to mlx5 core driver
consumers (mlx5e/mlx5_ib) and provide mlx5 register/unregister notifier
API.
This API will replace the current mlx5_interface->event callback and all
the logic around it, especially the delayed events logic introduced by
commit 97834eba7c19 ("net/mlx5: Delay events till ib registration ends")
Which is not needed anymore with this new API where the mlx5 interface
can dynamically register/unregister its notifier.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
__qdisc_drop_all() accesses skb->prev to get to the tail of the
segment-list.
With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
the skb-list handling has been changed to set skb->next to NULL and set
the list-poison on skb->prev.
With that change, __qdisc_drop_all() will panic when it tries to
dereference skb->prev.
Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
__list_del_entry is used, leaving skb->prev unchanged (thus,
pointing to the list-head if it's the first skb of the list).
This will make __qdisc_drop_all modify the next-pointer of the list-head
and result in a panic later on:
[ 34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
[ 34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
[ 34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[ 34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
[ 34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
[ 34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
[ 34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
[ 34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
[ 34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
[ 34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
[ 34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
[ 34.512082] FS: 0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
[ 34.513036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
[ 34.514593] Call Trace:
[ 34.514893] <IRQ>
[ 34.515157] napi_gro_receive+0x93/0x150
[ 34.515632] receive_buf+0x893/0x3700
[ 34.516094] ? __netif_receive_skb+0x1f/0x1a0
[ 34.516629] ? virtnet_probe+0x1b40/0x1b40
[ 34.517153] ? __stable_node_chain+0x4d0/0x850
[ 34.517684] ? kfree+0x9a/0x180
[ 34.518067] ? __kasan_slab_free+0x171/0x190
[ 34.518582] ? detach_buf+0x1df/0x650
[ 34.519061] ? lapic_next_event+0x5a/0x90
[ 34.519539] ? virtqueue_get_buf_ctx+0x280/0x7f0
[ 34.520093] virtnet_poll+0x2df/0xd60
[ 34.520533] ? receive_buf+0x3700/0x3700
[ 34.521027] ? qdisc_watchdog_schedule_ns+0xd5/0x140
[ 34.521631] ? htb_dequeue+0x1817/0x25f0
[ 34.522107] ? sch_direct_xmit+0x142/0xf30
[ 34.522595] ? virtqueue_napi_schedule+0x26/0x30
[ 34.523155] net_rx_action+0x2f6/0xc50
[ 34.523601] ? napi_complete_done+0x2f0/0x2f0
[ 34.524126] ? kasan_check_read+0x11/0x20
[ 34.524608] ? _raw_spin_lock+0x7d/0xd0
[ 34.525070] ? _raw_spin_lock_bh+0xd0/0xd0
[ 34.525563] ? kvm_guest_apic_eoi_write+0x6b/0x80
[ 34.526130] ? apic_ack_irq+0x9e/0xe0
[ 34.526567] __do_softirq+0x188/0x4b5
[ 34.527015] irq_exit+0x151/0x180
[ 34.527417] do_IRQ+0xdb/0x150
[ 34.527783] common_interrupt+0xf/0xf
[ 34.528223] </IRQ>
This patch makes sure that skb->prev is set to NULL when entering
netem_enqueue.
Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current implementation of create QP requires contiguous memory, such a
requirement is problematic once the memory is fragmented or the system is
low in memory, it causes failures in dma_zalloc_coherent().
This patch takes advantage of the new mlx5_core API which allocates a
fragmented buffer. This makes the QP creation much more resilient to
memory fragmentation. Data-path code was adapted to the fact that WQEs can
cross buffers.
We also use the opportunity to fix some cosmetic legacy coding convention
errors which were in the feature scope.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The current implementation of create SRQ requires contiguous memory, such
a requirement is problematic once the memory is fragmented or the system
is low in memory, it causes failures in dma_zalloc_coherent().
This patch takes the advantage of the new mlx5_core API which allocates a
fragmented buffer, and makes the SRQ creation much more resilient to
memory fragmentation. Data-path code was adapted to the fact that WQEs can
cross buffers.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
FRWR memory registration is done with a series of calls and WRs.
1. ULP invokes ib_dma_map_sg()
2. ULP invokes ib_map_mr_sg()
3. ULP posts an IB_WR_REG_MR on the Send queue
Step 2 generates an iova. It is permissible for ULPs to change this
iova (with certain restrictions) between steps 2 and 3.
rxe_map_mr_sg captures the MR's iova but later when rxe processes the
REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova
after step 2 but before step 3, rxe never captures that change.
When the remote sends an RDMA Read targeting that MR, rxe looks up the
R_key, but the altered iova does not match the iova stored in the MR,
causing the RDMA Read request to fail.
Reported-by: Anna Schumaker <schumaker.anna@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a recent regression in ACPICA releted to the Generic Serial Bus
protocol handling and causing it to read or write too little or too
much data in some cases, so incorrect data may be written to hardware
as a result (Hans de Goede)"
* tag 'acpi-4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Fix handling of buffer-size in acpi_ex_write_data_to_field()
|
|
Allow a user to attach a DEVX counter via mlx5 raw flow creation. In order
to attach a counter we introduce a new attribute:
MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX
A counter can be attached to multiple flow steering rules.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two issues in the operating performance points (OPP)
framework.
Specifics:
- Fix the handling of the "operating-points-v2" property to avoid
failures if multiple phandles are present in it which is legitimate
(Viresh Kumar).
- Drop the unnecessary static initialization of the .owner field in
the ti_opp_supply_driver structure (YueHaibing)"
* tag 'pm-4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
OPP: Fix parsing of multiple phandles in "operating-points-v2" property
opp: ti-opp-supply: Fix platform_no_drv_owner.cocci warnings
|
|
QIB driver was added in 2010 with many BUG_ON(), most of them were cleaned
out after years of development and usages.
It looks like that it is safe now to remove rest of BUG_ONs.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
There is a spelling mistake in a usnic_err error message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|