Age | Commit message (Collapse) | Author |
|
... and the glue around it. It is not needed anymore.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Use mc_devices list instead to check whether we have EDAC driver
instances successfully registered with EDAC core.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Apparently, some machines used to report DRAM errors through a PCI SERR
NMI. This is why we have a call into EDAC in the NMI handler. See
c0d121720220 ("drivers/edac: add new nmi rescan").
From looking at the patch above, that's two drivers: e752x_edac.c and
e7xxx_edac.c. Now, I wanna say those are old machines which are probably
decommissioned already.
Tony says that "[t]the newest CPU supported by either of those drivers
is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some
folks are still using these ... but people that hold onto h/w for 7
years generally cling to old s/w too ... so I'd guess it unlikely that
we will get complaints for breaking these in upstream."
So even if there is a small number still in use, we did load EDAC with
edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact)
which means a default EDAC setup without any parameters supplied on the
command line or otherwise would never even log the error in the NMI
handler because we're polling by default:
inline int edac_handler_set(void)
{
if (edac_op_state == EDAC_OPSTATE_POLL)
return 0;
return atomic_read(&edac_handlers);
}
So, long story short, I'd like to get rid of that nastiness called
edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If
we ever have to do stuff like that again, it should be notifiers we're
using and not some insanity like this one.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
|
|
... like the rest of the file.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
calculate_min_delta() may incorrectly access a 4th element of buf2[]
which only has 3 elements. This may trigger undefined behaviour and has
been reported to cause strange crashes in start_kernel() sometime after
timer initialization when built with GCC 5.3, possibly due to
register/stack corruption:
sched_clock: 32 bits at 200MHz, resolution 5ns, wraps every 10737418237ns
CPU 0 Unable to handle kernel paging request at virtual address ffffb0aa, epc == 8067daa8, ra == 8067da84
Oops[#1]:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.18 #51
task: 8065e3e0 task.stack: 80644000
$ 0 : 00000000 00000001 00000000 00000000
$ 4 : 8065b4d0 00000000 805d0000 00000010
$ 8 : 00000010 80321400 fffff000 812de408
$12 : 00000000 00000000 00000000 ffffffff
$16 : 00000002 ffffffff 80660000 806a666c
$20 : 806c0000 00000000 00000000 00000000
$24 : 00000000 00000010
$28 : 80644000 80645ed0 00000000 8067da84
Hi : 00000000
Lo : 00000000
epc : 8067daa8 start_kernel+0x33c/0x500
ra : 8067da84 start_kernel+0x318/0x500
Status: 11000402 KERNEL EXL
Cause : 4080040c (ExcCode 03)
BadVA : ffffb0aa
PrId : 0501992c (MIPS 1004Kc)
Modules linked in:
Process swapper/0 (pid: 0, threadinfo=80644000, task=8065e3e0, tls=00000000)
Call Trace:
[<8067daa8>] start_kernel+0x33c/0x500
Code: 24050240 0c0131f9 24849c64 <a200b0a8> 41606020 000000c0 0c1a45e6 00000000 0c1a5f44
UBSAN also detects the same issue:
================================================================
UBSAN: Undefined behaviour in arch/mips/kernel/cevt-r4k.c:85:41
load of address 80647e4c with insufficient space
for an object of type 'unsigned int'
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.18 #47
Call Trace:
[<80028f70>] show_stack+0x88/0xa4
[<80312654>] dump_stack+0x84/0xc0
[<8034163c>] ubsan_epilogue+0x14/0x50
[<803417d8>] __ubsan_handle_type_mismatch+0x160/0x168
[<8002dab0>] r4k_clockevent_init+0x544/0x764
[<80684d34>] time_init+0x18/0x90
[<8067fa5c>] start_kernel+0x2f0/0x500
=================================================================
buf2[] is intentionally only 3 elements so that the last element is the
median once 5 samples have been inserted, so explicitly prevent the
possibility of comparing against the 4th element rather than extending
the array.
Fixes: 1fa405552e33f2 ("MIPS: cevt-r4k: Dynamically calculate min_delta_ns")
Reported-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Tested-by: Rabin Vincent <rabinv@axis.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.7.x-
Patchwork: https://patchwork.linux-mips.org/patch/15892/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
The algif_aead completion function tries to deduce the aead_request
from the crypto_async_request argument. This is broken because
the API does not guarantee that the same request will be pased to
the completion function. Only the value of req->data can be used
in the completion function.
This patch fixes it by storing a pointer to sk in areq and using
that instead of passing in sk through req->data.
Fixes: 83094e5e9e49 ("crypto: af_alg - add async support to...")
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The ahash API modifies the request's callback function in order
to clean up after itself in some corner cases (unaligned final
and missing finup).
When the request is complete ahash will restore the original
callback and everything is fine. However, when the request gets
an EBUSY on a full queue, an EINPROGRESS callback is made while
the request is still ongoing.
In this case the ahash API will incorrectly call its own callback.
This patch fixes the problem by creating a temporary request
object on the stack which is used to relay EINPROGRESS back to
the original completion function.
This patch also adds code to preserve the original flags value.
Fixes: ab6bf4e5e5e4 ("crypto: hash - Fix the pointer voodoo in...")
Cc: <stable@vger.kernel.org>
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
When we get an EINPROGRESS completion in lrw, we will end up marking
the request as done and freeing it. This then blows up when the
request is really completed as we've already freed the memory.
Fixes: 700cb3f5fe75 ("crypto: lrw - Convert to skcipher")
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
When we get an EINPROGRESS completion in xts, we will end up marking
the request as done and freeing it. This then blows up when the
request is really completed as we've already freed the memory.
Fixes: f1c131b45410 ("crypto: xts - Convert to skcipher")
Cc: <stable@vger.kernel.org>
Reported-by: Nathan Royce <nroycea+kernel@gmail.com>
Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
mipsxx_pmu_handle_shared_irq() calls irq_work_run() while holding the
pmuint_rwlock for read. irq_work_run() can, via perf_pending_event(),
call try_to_wake_up() which can try to take rq->lock.
However, perf can also call perf_pmu_enable() (and thus take the
pmuint_rwlock for write) while holding the rq->lock, from
finish_task_switch() via perf_event_context_sched_in().
This leads to an ABBA deadlock:
PID: 3855 TASK: 8f7ce288 CPU: 2 COMMAND: "process"
#0 [89c39ac8] __delay at 803b5be4
#1 [89c39ac8] do_raw_spin_lock at 8008fdcc
#2 [89c39af8] try_to_wake_up at 8006e47c
#3 [89c39b38] pollwake at 8018eab0
#4 [89c39b68] __wake_up_common at 800879f4
#5 [89c39b98] __wake_up at 800880e4
#6 [89c39bc8] perf_event_wakeup at 8012109c
#7 [89c39be8] perf_pending_event at 80121184
#8 [89c39c08] irq_work_run_list at 801151f0
#9 [89c39c38] irq_work_run at 80115274
#10 [89c39c50] mipsxx_pmu_handle_shared_irq at 8002cc7c
PID: 1481 TASK: 8eaac6a8 CPU: 3 COMMAND: "process"
#0 [8de7f900] do_raw_write_lock at 800900e0
#1 [8de7f918] perf_event_context_sched_in at 80122310
#2 [8de7f938] __perf_event_task_sched_in at 80122608
#3 [8de7f958] finish_task_switch at 8006b8a4
#4 [8de7f998] __schedule at 805e4dc4
#5 [8de7f9f8] schedule at 805e5558
#6 [8de7fa10] schedule_hrtimeout_range_clock at 805e9984
#7 [8de7fa70] poll_schedule_timeout at 8018e8f8
#8 [8de7fa88] do_select at 8018f338
#9 [8de7fd88] core_sys_select at 8018f5cc
#10 [8de7fee0] sys_select at 8018f854
#11 [8de7ff28] syscall_common at 80028fc8
The lock seems to be there to protect the hardware counters so there is
no need to hold it across irq_work_run().
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
Since commit 4cfffcfa5106 ("irqchip/mips-gic: Fix local interrupts"),
the gic driver has been allocating virq's for local interrupts during
its initialisation. Unfortunately on Malta platforms, these are the
first IRQs to be allocated and so are allocated virqs 1-3. The i8259
driver uses a legacy irq domain which expects to map virqs 0-15. Probing
of that driver therefore fails because some of those virqs are already
taken, with the warning:
WARNING: CPU: 0 PID: 0 at kernel/irq/irqdomain.c:344
irq_domain_associate+0x1e8/0x228
error: virq1 is already associated
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc6-00011-g4cfffcfa5106 #368
Stack : 00000000 00000000 807ae03a 0000004d 00000000 806c1010 0000000b ffff0a01
80725467 807258f4 806a64a4 00000000 00000000 807a9acc 00000100 80713e68
806d5598 8017593c 8072bf90 8072bf94 806ac358 00000000 806abb60 80713ce4
00000100 801b22d4 806d5598 8017593c 807ae03a 00000000 80713ce4 80720000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
Call Trace:
[<8010c480>] show_stack+0x88/0xa4
[<80376758>] dump_stack+0x88/0xd0
[<8012c4a8>] __warn+0x104/0x118
[<8012c4ec>] warn_slowpath_fmt+0x30/0x3c
[<8017edfc>] irq_domain_associate+0x1e8/0x228
[<8017efd0>] irq_domain_add_legacy+0x7c/0xb0
[<80764c50>] __init_i8259_irqs+0x64/0xa0
[<80764ca4>] i8259_of_init+0x18/0x74
[<8076ddc0>] of_irq_init+0x19c/0x310
[<80752dd8>] arch_init_irq+0x28/0x19c
[<80750a08>] start_kernel+0x2a8/0x434
Fix this by reserving the required i8259 virqs in malta platform code
before probing any irq chips.
Fixes: 4cfffcfa5106 ("irqchip/mips-gic: Fix local interrupts")
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15919/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
The commit 1259feddd0f8("pinctrl: samsung: Fix the width of
PINCFG_TYPE_DRV bitfields for Exynos5433") already fixed
the different width of PINCFG_TYPE_DRV from previous Exynos SoC.
However wrong merge conflict resolution was chosen in commit
7f36f5d11cda ("Merge tag 'v4.10-rc6' into devel") effectively dropping
the changes for PINCFG_TYPE_DRV. Re-do them here.
The macro EXYNOS_PIN_BANK_EINTW is no longer used so remove it.
Fixes: 7f36f5d11cda ("Merge tag 'v4.10-rc6' into devel")
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
In the (very unlikely) case a passive socket becomes a listener,
we do not want to duplicate its saved SYN headers.
This would lead to double frees, use after free, and please hackers and
various fuzzers
Tested:
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, IPPROTO_TCP, TCP_SAVE_SYN, [1], 4) = 0
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 5) = 0
+0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.1 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
+0 connect(4, AF_UNSPEC, ...) = 0
+0 close(3) = 0
+0 bind(4, ..., ...) = 0
+0 listen(4, 5) = 0
+0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.1 < . 1:1(0) ack 1 win 257
Fixes: cd8ae85299d5 ("tcp: provide SYN headers for passive connections")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
both our sqsize and the controller MQES cap are a 0 based value,
so making it 1 based is wrong.
Reported-by: Trapp, Darren <Darren.Trapp@cavium.com>
Reported-by: Daniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
both our sqsize and the controller MQES cap are a 0 based value,
so making it 1 based is wrong.
Reported-by: Trapp, Darren <Darren.Trapp@cavium.com>
Reported-by: Daniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
both our sqsize and the controller MQES cap are a 0 based value,
so making it 1 based is wrong.
Reported-by: Trapp, Darren <Darren.Trapp@cavium.com>
Reported-by: Daniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
Pull CIFS fixes from Steve French:
"This is a set of CIFS/SMB3 fixes for stable.
There is another set of four SMB3 reconnect fixes for stable in
progress but they are still being reviewed/tested, so didn't want to
wait any longer to send these five below"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
Reset TreeId to zero on SMB2 TREE_CONNECT
CIFS: Fix build failure with smb2
Introduce cifs_copy_file_range()
SMB3: Rename clone_range to copychunk_range
Handle mismatched open calls
|
|
Pull ARM fixes from Russell King:
"A number of ARM fixes:
- prevent oopses caused by dma_get_sgtable() and declared DMA
coherent memory
- fix boot failure on nommu caused by ID_PFR1 access
- a number of kprobes fixes from Jon Medhurst and Masami Hiramatsu"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8665/1: nommu: access ID_PFR1 only if CPUID scheme
ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
arm: kprobes: Align stack to 8-bytes in test code
arm: kprobes: Fix the return address of multiple kretprobes
arm: kprobes: Skip single-stepping in recursing path if possible
arm: kprobes: Allow to handle reentered kprobe on single-stepping
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are 3 small fixes for 4.11-rc6.
One resolves a reported issue with sysfs files that NeilBrown found,
one is a documenatation fix for the stable kernel rules, and the last
is a small MAINTAINERS file update for kernfs"
* tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
MAINTAINERS: separate out kernfs maintainership
sysfs: be careful of error returns from ops->show()
Documentation: stable-kernel-rules: fix stable-tag format
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver rfixes from Greg KH:
"Here are a number of small IIO and staging driver fixes for 4.11-rc6.
Nothing big here, just iio fixes for reported issues, and an ashmem
fix for a very old bug that has been reported by a number of Android
vendors"
* tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
iio: hid-sensor-attributes: Fix sensor property setting failure.
iio: accel: hid-sensor-accel-3d: Fix duplicate scan index error
iio: core: Fix IIO_VAL_FRACTIONAL_LOG2 for negative values
iio: st_pressure: initialize lps22hb bootime
iio: bmg160: reset chip when probing
iio: cros_ec_sensors: Fix return value to get raw and calibbias data.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
"statx followup fixes and a fix for stack-smashing on alpha"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
alpha: fix stack smashing in old_adjtimex(2)
statx: Include a mask for stx_attributes in struct statx
statx: Reserve the top bit of the mask for future struct expansion
xfs: report crtime and attribute flags to statx
ext4: Add statx support
statx: optimize copy of struct statx to userspace
statx: remove incorrect part of vfs_statx() comment
statx: reject unknown flags when using NULL path
Documentation/filesystems: fix documentation for ->getattr()
|
|
We should use proper RCU list APIs to manipulate help->expectations,
as we can dump the conntrack's expectations via nfnetlink, i.e. in
ctnetlink_exp_ct_dump_table(), where only rcu_read_lock is acquired.
So for list traversal, use hlist_for_each_entry_rcu; for list add/del,
use hlist_add_head_rcu and hlist_del_rcu.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
For IPCTNL_MSG_EXP_GET, if the CTA_EXPECT_MASTER attr is specified, then
the NLM_F_DUMP request will dump the expectations related to this
connection tracking.
But we forget to check whether the conntrack has nf_conn_help or not,
so if nfct_help(ct) is NULL, oops will happen:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: ctnetlink_exp_ct_dump_table+0xf9/0x1e0 [nf_conntrack_netlink]
Call Trace:
? ctnetlink_exp_ct_dump_table+0x75/0x1e0 [nf_conntrack_netlink]
netlink_dump+0x124/0x2a0
__netlink_dump_start+0x161/0x190
ctnetlink_dump_exp_ct+0x16c/0x1bc [nf_conntrack_netlink]
? ctnetlink_exp_fill_info.constprop.33+0xf0/0xf0 [nf_conntrack_netlink]
? ctnetlink_glue_seqadj+0x20/0x20 [nf_conntrack_netlink]
ctnetlink_get_expect+0x32e/0x370 [nf_conntrack_netlink]
? debug_lockdep_rcu_enabled+0x1d/0x20
nfnetlink_rcv_msg+0x60a/0x6a9 [nfnetlink]
? nfnetlink_rcv_msg+0x1b9/0x6a9 [nfnetlink]
[...]
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
inet6_dev->addr_list is protected by inet6_dev->lock, so only using
rcu_read_lock is not enough, we should acquire read_lock_bh(&idev->lock)
before the inet6_dev->addr_list traversal.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
One CPU is doing ctnetlink_change_helper(), while another CPU is doing
unhelp() at the same time. So even if help->helper is not NULL at first,
the later statement strcmp(help->helper->name, ...) may still access
the NULL pointer.
So we must use rcu_read_lock and rcu_dereference to avoid such _bad_
thing happen.
Fixes: f95d7a46bc57 ("netfilter: ctnetlink: Fix regression in CTA_HELP processing")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When invoke __nf_conntrack_helper_find, it needs the rcu lock to
protect the helper module which would not be unloaded.
Now there are two caller nf_conntrack_helper_try_module_get and
ctnetlink_create_expect which don't hold rcu lock. And the other
callers left like ctnetlink_change_helper, ctnetlink_create_conntrack,
and ctnetlink_glue_attach_expect, they already hold the rcu lock
or spin_lock_bh.
Remove the rcu lock in functions nf_ct_helper_expectfn_find_by_name
and nf_ct_helper_expectfn_find_by_symbol. Because they return one pointer
which needs rcu lock, so their caller should hold the rcu lock, not in
these two functions.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Otherwise, creating a new conntrack via nfnetlink:
# conntrack -I -p udp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20
will emit the wrong ct events(where UPDATE should be NEW):
# conntrack -E
[UPDATE] udp 17 10 src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20
[UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Denys provided an awesome KASAN report pointing to an use
after free in xt_TCPMSS
I have provided three patches to fix this issue, either in xt_TCPMSS or
in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible
impact.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull block fixes from Jens Axboe:
"Here's a pull request for 4.11-rc, fixing a set of issues mostly
centered around the new scheduling framework. These have been brewing
for a while, but split up into what we absolutely need in 4.11, and
what we can defer until 4.12. These are well tested, on both single
queue and multiqueue setups, and with and without shared tags. They
fix several hangs that have happened in testing.
This is obviously larger than I would have preferred at this point in
time, but I don't think we can shave much off this and still get the
desired results.
In detail, this pull request contains:
- a set of five fixes for NVMe, mostly from Christoph and one from
Roland.
- a series from Bart, fixing issues with dm-mq and SCSI shared tags
and scheduling. Note that one of those patches commit messages may
read like an optimization, but it is in fact an important fix for
queue restarts in particular.
- a series from Omar, most importantly fixing a hang with multiple
hardware queues when we fail to get a driver tag. Another important
fix in there is for resizing hardware queues, which nbd does when
handling multiple sockets for one connection.
- fixing an imbalance in putting the ctx for hctx request allocations
from Minchan"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: Restart a single queue if tag sets are shared
dm rq: Avoid that request processing stalls sporadically
scsi: Avoid that SCSI queues get stuck
blk-mq: Introduce blk_mq_delay_run_hw_queue()
blk-mq: remap queues when adding/removing hardware queues
blk-mq-sched: fix crash in switch error path
blk-mq-sched: set up scheduler tags when bringing up new queues
blk-mq-sched: refactor scheduler initialization
blk-mq: use the right hctx when getting a driver tag fails
nvmet: fix byte swap in nvmet_parse_io_cmd
nvmet: fix byte swap in nvmet_execute_write_zeroes
nvmet: add missing byte swap in nvmet_get_smart_log
nvme: add missing byte swap in nvme_setup_discard
nvme: Correct NVMF enum values to match NVMe-oF rev 1.0
block: do not put mq context in blk_mq_alloc_request_hctx
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:
"This late fix for pin control is hopefully the last I send this cycle.
The problem was detected early in the v4.11 release cycle and there
has been some back and forth on how to solve it. Sadly the proper fix
arrives late, but at least not too late.
An issue was detected with pin control on the Freescale i.MX after the
refactorings for more general group and function handling.
We now have the proper fix for this"
* tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 4.11:
Headed to stable:
- disable HFSCR[TM] if TM is not supported, fixes a potential host
kernel crash triggered by a hostile guest, but only in
configurations that no one uses
- don't try to fix up misaligned load-with-reservation instructions
- fix flush_(d|i)cache_range() called from modules on little endian
kernels
- add missing global TLB invalidate if cxl is active
- fix missing preempt_disable() in crc32c-vpmsum
And a fix for selftests build changes that went in this release:
- selftests/powerpc: Fix standalone powerpc build
Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
Paul Mackerras"
* tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
powerpc/mm: Add missing global TLB invalidate if cxl is active
powerpc/64: Fix flush_(d|i)cache_range() called from modules
powerpc: Don't try to fix up misaligned load-with-reservation instructions
powerpc: Disable HFSCR[TM] if TM is not supported
selftests/powerpc: Fix standalone powerpc build
|
|
In the case that compat_get_bitmap fails we do not want to copy the
bitmap to the user as it will contain uninitialized stack data and leak
sensitive data.
Signed-off-by: Chris Salls <salls@cs.ucsb.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, inputting the following command will succeed but actually the
value will be truncated:
# echo 0x12ffffffff > /proc/sys/net/ipv4/tcp_notsent_lowat
This is not friendly to the user, so instead, we should report error
when the value is larger than UINT_MAX.
Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Separating discards and zeroout operations allows us to remove the LBPRZ
block zeroing constraints from discards and honor the device preferences
for UNMAP commands.
If supported by the device, we'll also choose UNMAP over one of the
WRITE SAME variants for discards.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Now that zeroout and discards are distinct operations we need to
separate the policy of choosing the appropriate command. Create a
zeroing_mode which can be one of:
write: Zeroout assist not present, use regular WRITE
writesame: Allow WRITE SAME(10/16) with a zeroed payload
writesame_16_unmap: Allow WRITE SAME(16) with UNMAP
writesame_10_unmap: Allow WRITE SAME(10) with UNMAP
The last two are conditional on the device being thin provisioned with
LBPRZ=1 and LBPWS=1 or LBPWS10=1 respectively.
Whether to set the UNMAP bit or not depends on the REQ_NOUNMAP flag. And
if none of the _unmap variants are supported, regular WRITE SAME will be
used if the device supports it.
The zeroout_mode is exported in sysfs and the detected mode for a given
device can be overridden using the string constants above.
With this change in place we can now issue WRITE SAME(16) with UNMAP set
for block zeroing applications that require hard guarantees and
logical_block_size granularity. And at the same time use the UNMAP
command with the device's preferred granulary and alignment for discard
operations.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It seems like DRBD assumes its on the wire TRIM request always zeroes data.
Use that fact to implement REQ_OP_WRITE_ZEROES.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
drbd always wants its discard wire operations to zero the blocks, so
use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of
reinventing it poorly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Now that we have REQ_OP_WRITE_ZEROES implemented for all devices that
support efficient zeroing, we can remove the call to blkdev_issue_discard.
This means we only have two ways of zeroing left and can simplify the
code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
mmc only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
rsxx only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
rbd only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It's just a in-driver reimplementation of writing zeroes to the pages,
which fails if the discards aren't page aligned.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It's identical to discard as hole punches will always leave us with
zeroes on reads.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Just the same as discard if the block size equals the system page size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
But now for the real NVMe Write Zeroes yet, just to get rid of the
discard abuse for zeroing. Also rename the quirk flag to be a bit
more self-explanatory.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Try to use a write same with unmap bit variant if the device supports it
and the caller allows for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
and prepares for removing the discard_zeroes_data flag.
Also remove a pointless discard support check, which is done in
blkdev_issue_discard already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This avoids fallbacks to explicit zeroing in (__)blkdev_issue_zeroout if
the caller doesn't want them.
Also clean up the convoluted check for the return condition that this
new flag is added to.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|