Age | Commit message (Collapse) | Author |
|
Using list_move_tail() instead of list_del() + list_add_tail().
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
'qp' is malloced in qedr_create_qp() and should be freed before leaving
from the error handling cases, otherwise it will cause memory leak.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Currently, if pd is null then we hit a null pointer derference
on accessing pd->pd_id. Instead of just printing an error message
we should also return -EINVAL immediately.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
and rename class version define to indicate SM rather than SMP or SMI
Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Query the length of the fmb and abort fmb registration if the
size of the associated measurement block is too small.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
The ap_qci() inline assembly writes to memory (*config) but misses to
tell the compiler about it. Add the missing memory clobber to fix
this.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Add the missing memory clobber / barrier to dcss_set_subcodes() to
tell the compiler that the inline assembly accesses memory (name
string).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Add missing memory clobbers / barriers or use the Q constraint where
possible to tell the compiler that the inline assemblies actually
access memory and not only pointers to memory.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
We have a couple of inline assemblies like memchr() and strlen() that
read from memory, but tell the compiler only they need the addresses
of the strings they access.
This allows the compiler to omit the initialization of such strings
and therefore generate broken code. Add the missing memory barrier to
all string related inline assemblies to fix this potential issue. It
looks like the compiler currently does not generate broken code due to
these bugs.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
The qsi inline assembly takes an initialized "cc" variable as output
operand but specifies it as write-to operand only instead of
read/write operand. This allows the compiler to omit the
initialization, which in fact it also does (gcc 6.1).
Use the "+" constraint modifier to fix this. In addition also use the
Q constraint to specify the hws_qsi_info_block memory location, so the
compiler can generate slightly better code. Also get rid of the cc
clobber since none of the instructions within the inline assembly
modify the condition code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Two of the messages introduced by the memblock conversion are reworded.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Old gcc versions can't handle a bogus early clobber on a Q constraint:
arch/s390/kernel/early.c: In function 'memmove_early.part.1':
arch/s390/kernel/early.c:432:2: error: '&' constraint used with no register class
Simply remove it to fix this.
Reported-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Fixes: d543a106f96d ("s390: fix initrd corruptions with gcov/kcov instrumented kernels")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
This patch introduces tracepoint definitions and tracepoint
event invocations for the s390 zcrypt device.
Currently there are just two tracepoint events defined.
An s390_zcrypt_req request event occurs as soon as the
request is recognized by the zcrypt ioctl function. This
event may act as some kind of request-processing-starts-now
indication.
As late as possible within the zcrypt ioctl function there
occurs the s390_zcrypt_rep event which may act as the point
in time where the request has been processed by the kernel
and the result is about to be transferred back to userspace.
The glue which binds together request and reply event is the
ptr parameter, which is the local buffer address where the
request from userspace has been stored by the ioctl function.
The main purpose of this zcrypt tracepoint patch is to get
some data for performance measurements together with
information about the kind of request and on which card and
queue the request has been processed. It is not an ffdc
interface as there is already code in the zcrypt device
driver to serve the s390 debug feature interface.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Rework the debug feature calls and initialization. There
are now two debug feature entries used by the zcrypt code.
The first is 'ap' with all the AP bus related stuff and the
second is 'zcrypt' with all the zcrypt and devices and
driver related entries. However, there isn't much traffic on
both debug features. The ap bus code emits only some debug
info and for zcrypt devices on appearance and disappearance
there is an entry written.
The new dbf invocations use the sprintf buffer layout,
whereas the old implementation used the ascii dbf buffer.
There are now 5*8=40 bytes used for each entry, resulting in
5 parameters per call. As the sprintf buffer needs a format
string the first parameter provides this and so up to 4 more
parameters can be used. Alltogehter the new layout should be
much more human readable for customers and test.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Add defines and switch case code to handle the two invalid
domain response codes better. Until now these two response
codes are handled via default resulting in -EAGAIN and
switching the processed queue to offline. So this kind of
malformed request bounced through all suitable queues and
switched them off. Now this kind of malformed request is
just rejected with EINVAL without switching off the queue.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
According to the system architecture the current implementation
requires the presence of the N bit in GR2 in the TAPQ response
field to validate the max. number of domains (Nd).
Older machine types don't have this N bit, hence the max. domain
field was ignored.
Before the N bit was introduced the maximum number of domain was
a constant value of 15. So set this value in case of N bit absence.
Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
For the older CEX2x and CEX3x cards the function bits returned
by TAPQ do not reflect the functions of the card. Instead the
functionality is implicit by the type of the card. The reworked
zcrypt requires to have the function bits set correct, so this
patch fixes this. The queue selection is not only based on these
function bits but also on function pointers set by the individual
drivers.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Currently the first eligible AP adapter respectively domain will be
selected to service requests. In case of sequential workload, the
very same adapter/domain will be used.
The adapter/domain selection algorithm now considers the completed
transactions per adaper/domain and therefore ensures a homogeneous
utilization.
Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Introduce new ioctl (ZDEVICESTATUS) to provide detailed
information, like hardware type, domains, status and functionality
of available crypto devices.
Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Currently the ap infrastructure only supports one domain at a time.
This feature extends the generic cryptographic device driver to
support multiple cryptographic domains simultaneously.
There are now card and queue devices on the AP bus with independent
card and queue drivers. The new /sys layout is as follows:
/sys/bus/ap
devices
<xx>.<yyyy> -> ../../../devices/ap/card<xx>/<xx>.<yyyy>
...
card<xx> -> ../../../devices/ap/card<xx>
...
drivers
<drv>card
card<xx> -> ../../../../devices/ap/card<xx>
<drv>queue
<xx>.<yyyy> -> ../../../../devices/ap/card<xx>/<xx>.<yyyy>
...
/sys/devices/ap
card<xx>
<xx>.<yyyy>
driver -> ../../../../bus/ap/drivers/<zzz>queue
...
driver -> ../../../bus/ap/drivers/<drv>card
...
The two digit <xx> field is the card number, the four digit <yyyy>
field is the queue number and <drv> is the name of the device driver,
e.g. "cex4".
For compatability /sys/bus/ap/card<xx> for the old layout has to exist,
including the attributes that used to reside there.
With additional contributions from Harald Freudenberger and
Martin Schwidefsky.
Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Crypto requests are very different in complexity and thus runtime.
Also various crypto adapters are differ with regard to the execution
time. Crypto requests can be balanced much better when the request
type and eligible crypto adapters are rated in a more precise
granularity. Therefore, request weights and adapter speed rates for
dedicated requests will be introduced.
Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
The poll thread of the AP bus is burning CPU while waiting for
crypto requests to complete. We can as well burn a few more cycles
in the poll thread to check if there are pending requests and
remove the atomic operations with the ap_poll_requests.
This improves the code if the machine has adapter interrupts.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Move the inline assemblies for the AP bus into a separate header file.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Now that the message type modules are linked with the zcrypt_api
into a single module the zcrypt_ops_list is initialized by
the module init function of the zcyppt.ko module. After that
the list is static and all message types are present.
Drop the zcrypt_ops_list_lock spinlock and the module handling
in regard to the message types.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Move the ap bus into the kernel and make it general available.
Additionally include the message types and the API layer as a
preparation for the workload management facility.
Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.
I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system. The trace below was collected after the machine stalled,
waiting for the hotplug event completion.
The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO. With this patch, We couldn't hit the
issue anymore.
This should apply on top of Jens's for-next branch cleanly.
Changes since v1:
- Use GFP_NOIO instead of GFP_NOWAIT.
Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Conflicts:
arch/arm/include/asm/unistd.h
arch/arm/include/uapi/asm/unistd.h
arch/arm/kernel/calls.S
|
|
|
|
The new skcipher walk API may crash in the following way. (Interestingly,
the tcrypt boot time tests seem unaffected, while an explicit test using
the module triggers it)
Unable to handle kernel NULL pointer dereference at virtual address 00000000
...
[<ffff000008431d84>] __memcpy+0x84/0x180
[<ffff0000083ec0d0>] skcipher_walk_done+0x328/0x340
[<ffff0000080c5c04>] ctr_encrypt+0x84/0x100
[<ffff000008406d60>] simd_skcipher_encrypt+0x88/0x98
[<ffff0000083fa05c>] crypto_rfc3686_crypt+0x8c/0x98
[<ffff0000009b0900>] test_skcipher_speed+0x518/0x820 [tcrypt]
[<ffff0000009b31c0>] do_test+0x1408/0x3b70 [tcrypt]
[<ffff0000009bd050>] tcrypt_mod_init+0x50/0x1000 [tcrypt]
[<ffff0000080838f4>] do_one_initcall+0x44/0x138
[<ffff0000081aee60>] do_init_module+0x68/0x1e0
[<ffff0000081524d0>] load_module+0x1fd0/0x2458
[<ffff000008152c38>] SyS_finit_module+0xe0/0xf0
[<ffff0000080836f0>] el0_svc_naked+0x24/0x28
This is due to the fact that skcipher_done_slow() may be entered with
walk->buffer unset. Since skcipher_walk_done() already deals with the
case where walk->buffer == walk->page, it appears to be the intention
that walk->buffer point to walk->page after skcipher_next_slow(), so
ensure that is the case.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
When src and dst both are specified and they point to the same file
the sign-file utility will write only signature to the dst file and
the module (.ko file) body will not be written.
That happens because we open the same file with "rb" and "wb" flags,
from fopen man:
w Truncate file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.
...
bm = BIO_new_file(module_name, "rb");
...
bd = BIO_new_file(dest_name, "wb");
...
while ((n = BIO_read(bm, buf, sizeof(buf))),
n > 0) {
ERR(BIO_write(bd, buf, n) < 0, "%s", dest_name);
}
...
Signed-off-by: Alex Yashchenko <alexhoppus111@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
In function public_key_verify_signature(), returns variable ret on
error paths. When the call to kmalloc() fails, the value of ret is 0,
and it is not set to an errno before returning. This patch fixes the
bug.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188891
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The comment for ntb_spad_count incorrectly referred to ntb_mw_count.
Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Trivial fix to typo "repsonse" to "response" in error message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Commit 4fd06960f120 ("Use the new x86 setup code for i386") introduced a
reference to the make variable LINUX_INCLUDE. That reference got moved
around a bit and copied twice and now there are three references to it.
There has never been a definition of that variable. (Presumably that is
because it started out as a mistyped reference to LINUXINCLUDE.) So this
reference has always been an empty string. Let's remove it before it
spreads any further.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
The build system stopped generating ikconfig.h in v2.6.8. Remove an entry
for it in dontdiff. There's also a reference to it in a small comment.
Remove that comment too, as it is of little help in any case.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
This patch fix spelling typos in printk and kconfig.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
According to `man blockdev':
--getsize
Print device size (32-bit!) in sectors.
Deprecated in favor of the --getsz option.
...
--getsz
Get size in 512-byte sectors.
Hence, occurrences of `--getsize' should be replaced with `--getsz',
which this commit has achieved as follows:
$ cd "$repo"
$ git grep -l -e --getsz
Documentation/device-mapper/delay.txt
Documentation/device-mapper/dm-crypt.txt
Documentation/device-mapper/linear.txt
Documentation/device-mapper/log-writes.txt
Documentation/device-mapper/striped.txt
Documentation/device-mapper/switch.txt
$ cd Documentation/device-mapper
$ sed -i s/getsize/getsz/g *
Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
'for-4.10/i2c-hid-nopower', 'for-4.10/intel-ish', 'for-4.10/mayflash', 'for-4.10/microsoft-surface-3', 'for-4.10/multitouch', 'for-4.10/sony', 'for-4.10/udraw-ps3', 'for-4.10/upstream' and 'for-4.10/wacom/generic' into for-linus
|
|
start_cpu() pushes a text address on the stack so that stack traces from
idle tasks will show start_cpu() at the end. But it currently shows the
wrong function offset. It's more correct to show the address
immediately after the 'lretq' instruction.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2cadd9f16c77da7ee7957bfc5e1c67928c23ca48.1481685203.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
start_cpu() pushes a text address on the stack so that stack traces from
idle tasks will show start_cpu() at the end. But it uses a call
instruction to do that, which is rather obtuse. Use a straightforward
push instead.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4d8a1952759721d42d1e62ba9e4a7e3ac5df8574.1481685203.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
For sync direct IO, generic_file_direct_write/generic_file_read_iter
will update file access position. Don't duplicate the update in
.direct_IO. This cause my raid array can't assemble.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When debugging nvme controller crashes, it's nice to know whether
the controller died cleanly so that the failure is just reflected in
CSTS, it died and put an error in PCI_STATUS, or whether it died so
badly that it stopped responding to PCI configuration space reads.
I've seen a failure that gives 0xffff in PCI_STATUS on a Samsung
"SM951 NVMe SAMSUNG 256GB" with firmware "BXW75D0Q".
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Fixed up white space and hunk reject.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
bdev->bd_contains is not stable before calling __blkdev_get().
When __blkdev_get() is called on a parition with ->bd_openers == 0
it sets
bdev->bd_contains = bdev;
which is not correct for a partition.
After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
and then ->bd_contains is stable.
When FMODE_EXCL is used, blkdev_get() calls
bd_start_claiming() -> bd_prepare_to_claim() -> bd_may_claim()
This call happens before __blkdev_get() is called, so ->bd_contains
is not stable. So bd_may_claim() cannot safely use ->bd_contains.
It currently tries to use it, and this can lead to a BUG_ON().
This happens when a whole device is already open with a bd_holder (in
use by dm in my particular example) and two threads race to open a
partition of that device for the first time, one opening with O_EXCL and
one without.
The thread that doesn't use O_EXCL gets through blkdev_get() to
__blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;
Immediately thereafter the other thread, using FMODE_EXCL, calls
bd_start_claiming() from blkdev_get(). This should fail because the
whole device has a holder, but because bdev->bd_contains == bdev
bd_may_claim() incorrectly reports success.
This thread continues and blocks on bd_mutex.
The first thread then sets bdev->bd_contains correctly and drops the mutex.
The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
again in:
BUG_ON(!bd_may_claim(bdev, whole, holder));
The BUG_ON fires.
Fix this by removing the dependency on ->bd_contains in
bd_may_claim(). As bd_may_claim() has direct access to the whole
device, it can simply test if the target bdev is the whole device.
Fixes: 6b4517a7913a ("block: implement bd_claiming and claiming block")
Cc: stable@vger.kernel.org (v2.6.35+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This reverts commit 6d31e3ba232ea22458b2f36b6d3f2f9f11bf3fa4.
This causes bootup problems for me both on my laptop and my desktop.
What they have in common is that they have NVMe disks with dm-crypt, but
it's not the same controller, so it's not controller-specific.
Jens does not see it on his machine (also NVMe), so it's presumably
something that triggers just on bootup. Possibly related to dm-crypt
and the fact that I mark my luks volume with "allow-discards" in
/etc/crypttab.
It's 100% repeatable for me, which made it fairly straightforward to
bisect the problem to this commit. Small mercies.
So we don't know what the reason is yet, but the revert is needed to get
things going again.
Acked-by: Jens Axboe <axboe@fb.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|