Age | Commit message (Collapse) | Author |
|
Replace MAX_ADDR_LEN with its numeric value to fix the following
linux/packet_diag.h userspace compilation error:
/usr/include/linux/packet_diag.h:67:17: error: 'MAX_ADDR_LEN' undeclared here (not in a function)
__u8 pdmc_addr[MAX_ADDR_LEN];
This is not the first case in the UAPI where the numeric value
of MAX_ADDR_LEN is used instead of symbolic one, uapi/linux/if_link.h
already does the same:
$ grep MAX_ADDR_LEN include/uapi/linux/if_link.h
__u8 mac[32]; /* MAX_ADDR_LEN */
There are no UAPI headers besides these two that use MAX_ADDR_LEN.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The gso code of several tunnels type (gre and udp tunnels)
takes for granted that the skb->inner_protocol is properly
initialized and drops the packet elsewhere.
On the forwarding path no one is initializing such field,
so gro encapsulated packets are dropped on forward.
Since commit 38720352412a ("gre: Use inner_proto to obtain
inner header protocol"), this can be reproduced when the
encapsulated packets use gre as the tunneling protocol.
The issue happens also with vxlan and geneve tunnels since
commit 8bce6d7d0d1e ("udp: Generalize skb_udp_segment"), if the
forwarding host's ingress nic has h/w offload for such tunnel
and a vxlan/geneve device is configured on top of it, regardless
of the configured peer address and vni.
To address the issue, this change initialize the inner_protocol
field for encapsulated packets in both ipv4 and ipv6 gro complete
callbacks.
Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol")
Fixes: 8bce6d7d0d1e ("udp: Generalize skb_udp_segment")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In qed_ll2_start_ooo() the ll2_info variable is uninitialized and then
passed to qed_ll2_acquire_connection() where it is copied into a new
memory space.
This shouldn't cause any issue as long as non of the copied memory is
every read.
But the potential for a bug being introduced by reading this memory
is real.
Detected by CoverityScan, CID#1399632 ("Uninitialized scalar variable")
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sunil Goutham says:
====================
net: thunderx: Miscellaneous fixes
This patch set fixes multiple issues such as IOMMU
translation faults when kernel is booted with IOMMU enabled
on host, incorrect MAC ID reading from ACPI tables and IPv6
UDP packet drop due to failure of checksum validation.
Changes from v1:
- As suggested by David Miller, got rid of conditional
calling of DMA map/unmap APIs. Also updated commit message
in 'IOMMU translation faults' patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Do not consider IPv6 frames with zero UDP checksum as frames
with bad checksum and drop them.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@cavium.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When booted with ACPI, random mac addresses are being
assigned to node1 interfaces due to mismatch of bgx_id
in BGX driver and ACPI tables.
This patch fixes this issue by setting maximum BGX devices
per node based on platform/soc instead of a macro. This
change will set the bgx_id appropriately.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When BGX/LMACs are in QSGMII mode, for some LMACs, mode info is
not being printed. This patch will fix that. With changes already
done to not do any sort of serdes 2 lane mapping config calculation
in kernel driver, we can get rid of this logic.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ACPI support has been added to ARM IOMMU driver in 4.10 kernel
and that has resulted in VNIC interfaces throwing translation
faults when kernel is booted with ACPI as driver was not using
DMA API. This patch fixes the issue by using DMA API which inturn
will create translation tables when IOMMU is enabled.
Also VNIC doesn't have a seperate receive buffer ring per receive
queue, so there is no 1:1 descriptor index matching between CQE_RX
and the index in buffer ring from where a buffer has been used for
DMA'ing. Unlike other NICs, here it's not possible to maintain dma
address to virt address mappings within the driver. This leaves us
no other choice but to use IOMMU's IOVA address conversion API to
get buffer's virtual address which can be given to network stack
for processing.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the function rds_ib_setup_qp, the error handle is missing. When some
error occurs, it is possible that memory leak occurs. As such, error
handle is added.
Cc: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Guanglei Li <guanglei.li@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Improve UDP TX performance by:
* reducing the ring size from 2K to 512
* replacing the numerous streaming DMA allocations for info buffers and
gather lists with one large consistent DMA allocation per ring
BQL is not effective here. We reduced the ring size because there is heavy
overhead with dma_map_single every so often. With iommu=on, dma_map_single
in PF Tx data path was taking longer time (~700usec) for every ~250
packets. Debugged intel_iommu code, and found that PF driver is utilizing
too many static IO virtual address mapping entries (for gather list entries
and info buffers): about 100K entries for two PF's each using 8 rings.
Also, finding an empty entry (in rbtree of device domain's iova mapping in
kernel) during Tx path becomes a bottleneck every so often; the loop to
find the empty entry goes through over 40K iterations; this is too costly
and was the major overhead. Overhead is low when this loop quits quickly.
Netperf benchmark numbers before and after patch:
PF UDP TX
+--------+--------+------------+------------+---------+
| | | Before | After | |
| Number | | Patch | Patch | |
| of | Packet | Throughput | Throughput | Percent |
| Flows | Size | (Gbps) | (Gbps) | Change |
+--------+--------+------------+------------+---------+
| | 360 | 0.52 | 0.93 | +78.9 |
| 1 | 1024 | 1.62 | 2.84 | +75.3 |
| | 1518 | 2.44 | 4.21 | +72.5 |
+--------+--------+------------+------------+---------+
| | 360 | 0.45 | 1.59 | +253.3 |
| 4 | 1024 | 1.34 | 5.48 | +308.9 |
| | 1518 | 2.27 | 8.31 | +266.1 |
+--------+--------+------------+------------+---------+
| | 360 | 0.40 | 1.61 | +302.5 |
| 8 | 1024 | 1.64 | 4.24 | +158.5 |
| | 1518 | 2.87 | 6.52 | +127.2 |
+--------+--------+------------+------------+---------+
VF UDP TX
+--------+--------+------------+------------+---------+
| | | Before | After | |
| Number | | Patch | Patch | |
| of | Packet | Throughput | Throughput | Percent |
| Flows | Size | (Gbps) | (Gbps) | Change |
+--------+--------+------------+------------+---------+
| | 360 | 1.28 | 1.49 | +16.4 |
| 1 | 1024 | 4.44 | 4.39 | -1.1 |
| | 1518 | 6.08 | 6.51 | +7.1 |
+--------+--------+------------+------------+---------+
| | 360 | 2.35 | 2.35 | 0.0 |
| 4 | 1024 | 6.41 | 8.07 | +25.9 |
| | 1518 | 9.56 | 9.54 | -0.2 |
+--------+--------+------------+------------+---------+
| | 360 | 3.41 | 3.65 | +7.0 |
| 8 | 1024 | 9.35 | 9.34 | -0.1 |
| | 1518 | 9.56 | 9.57 | +0.1 |
+--------+--------+------------+------------+---------+
Signed-off-by: VSR Burru <veerasenareddy.burru@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dinesh reported that RTA_MULTIPATH nexthops are 8-bytes larger with IPv6
than IPv4. The recent refactoring for multipath support in netlink
messages does discriminate between non-multipath which needs the OIF
and multipath which adds a rtnexthop struct for each hop making the
RTA_OIF attribute redundant. Resolve by adding a flag to the info
function to skip the oif for multipath.
Fixes: beb1afac518d ("net: ipv6: Add support to dump multipath routes
via RTA_MULTIPATH attribute")
Reported-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix and cleanup from Juergen Gross:
"This contains one fix for MSIX handling under Xen and a trivial
cleanup patch"
* tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: Remove duplicate inclusion of linux/init.h
xen: do not re-use pirq number cached in pci device msi msg data
|
|
The xprt for backchannel is created separately, not in TCP/UDP code. It
needs the XPT_CONG_CTRL flag set on it too--otherwise requests on the
NFSv4.1 backchannel are rjected in svc_process_common():
1191 if (versp->vs_need_cong_ctrl &&
1192 !test_bit(XPT_CONG_CTRL, &rqstp->rq_xprt->xpt_flags))
1193 goto err_bad_vers;
Fixes: 5283b03ee5 ("nfs/nfsd/sunrpc: enforce transport...")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
For full 5-level paging we need a helper to allocate p4d page table.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Convert all non-architecture-specific code to 5-level paging.
It's mostly mechanical adding handling one more page table level in
places where we deal with pud_t.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Like with pgtable-nopud.h for 4-level paging, this new header is base
for converting an architectures to properly folded p4d_t level.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If an architecture uses 4level-fixup.h we don't need to do anything as
it includes 5level-fixup.h.
If an architecture uses pgtable-nop*d.h, define __ARCH_USE_5LEVEL_HACK
before inclusion of the header. It makes asm-generic code to use
5level-fixup.h.
If an architecture has 4-level paging or folds levels on its own,
include 5level-fixup.h directly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We are going to introduce <asm-generic/pgtable-nop4d.h> to provide
abstraction for properly (in opposite to 5level-fixup.h hack) folded
p4d level. The new header will be included from pgtable-nopud.h.
If an architecture uses <asm-generic/nop*d.h>, we cannot use
5level-fixup.h directly to quickly convert the architecture to 5-level
paging as it would conflict with pgtable-nop4d.h.
With this patch an architecture can define __ARCH_USE_5LEVEL_HACK before
inclusion <asm-genenric/nop*d.h> to use 5level-fixup.h.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We are going to switch core MM to 5-level paging abstraction.
This is preparation step which adds <asm-generic/5level-fixup.h>
As with 4level-fixup.h, the new header allows quickly make all
architectures compatible with 5-level paging in core MM.
In long run we would like to switch architectures to properly folded p4d
level by using <asm-generic/pgtable-nop4d.h>, but it requires more
changes to arch-specific code.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Look for 'la57' in /proc/cpuinfo to see if your machine supports 5-level
paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Also convert the fw_update_type and fw_timeout variables to
unsigned and update the printf specifier to %u.
The FW_MGMT_IOC_SET_TIMEOUT_MS ioctl takes an unsigned int
and checkpatch was complaining about not checking the sscanf
return values.
Signed-off-by: Michael Sartain <mikesart@fastmail.com>
Acked-by: Viresh Kumar <viresh.kumar at linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fixes checkpatch warning:
type 'long long unsigned int' should be specified
in [[un]signed] [short|int|long|long long] order
Signed-off-by: Michael Sartain <mikesart@fastmail.com>
Acked-by: Viresh Kumar <viresh.kumar at linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fixes checkpatch warning:
braces {} are not necessary for any arm of this statement
Signed-off-by: Michael Sartain <mikesart@fastmail.com>
Acked-by: Viresh Kumar <viresh.kumar at linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fixes checkpatch warning:
macros should not use a trailing semicolon
Signed-off-by: Michael Sartain <mikesart@fastmail.com>
Acked-by: Viresh Kumar <viresh.kumar at linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Refactor conditionals to reduce one level of indentation and improve
code readability.
Signed-off-by: Narcisa Ana Maria Vasile <narcisaanamaria12@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Remove 'Out of memory' message because kmalloc already prints a message
in case of error.
Signed-off-by: Narcisa Ana Maria Vasile <narcisaanamaria12@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commenting Code Is a Bad Idea.
Comments are their to explain the code and how the code achieves its
goal and as codes in the comments does not explain what the code is
doing so there is no use of commenting them.
So in this patch codes in the comments are removed.
Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Variable dac_data is already declared as of type u8. Again explicit type
casting of dac_data to u8, is not required. Hence this patch removes it
by using the following coccinelle script.
@@
type T;
T *ptr;
T p;
@@
(
- (T *)(&p)
+ &p
|
- (T *)ptr
+ ptr
|
- (T *)(ptr)
+ ptr
|
- (T)(p)
+ p
)
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Use ARRAY_SIZE to calculate the size of an array.
The semantic patch used can be found here:
https://github.com/coccinelle/coccinellery/blob/master/arraysize/array.cocci
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fixes the following sparse warnings around line 3514 in hfa384x_usb.c:
warning: incorrect type in assignment (different base types)
expected unsigned int [unsigned] [usertype] version
got restricted __be32 [usertype] <noident>
Signed-off-by: Aapo Vienamo <aapo.vienamo@iki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There is no need to include types.h and vmalloc.h twice.
This issue has been found by make includecheck.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Some functions like kmalloc/kzalloc return NULL on failure.
When NULL represents failure, !x is commonly used.
This was done using Coccinelle:
@@
expression *e;
identifier l1;
@@
e = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\)(...);
...
- e == NULL
+ !e
Signed-off-by: simran singhal <singhalsimran0@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Removed parentheses on the right hand side of assignment, as they are
not required. The following coccinelle script was used to fix this
issue:
@@
local idexpression id;
expression e;
@@
id =
-(
e
-)
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The following Coccinelle script was used to detect this:
@r@
expression x;
void* e;
type T;
identifier f;
@@
(
*((T *)e)
|
((T *)x)[...]
|
((T*)x)->f
|
- (T*)
e
)
Signed-off-by: simran singhal <singhalsimran0@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Use macro min() to get the minimum of two values for
brevity and readability. The macro MUX_TX_MAX_SIZE
has a value of 2048 which is well within the integer
limits. This check was done manually.
Found using Coccinelle:
@@ type T; T x; T y; @@
(
- x < y ? x : y
+ min(x,y)
|
- x > y ? x : y
+ max(x,y)
)
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The host bridge memory window resource is inserted into the iomem_resource
tree and cannot be deallocated until the host bridge itself is removed.
Previously, the window was on the stack, which meant the iomem_resource
entry pointed into the stack and was corrupted as soon as the probe
function returned, which caused memory corruption and errors like this:
pcie_iproc_bcma bcma0:8: resource collision: [mem 0x40000000-0x47ffffff] conflicts with PCIe MEM space [mem 0x40000000-0x47ffffff]
Move the memory window resource from the stack into struct iproc_pcie so
its lifetime matches that of the host bridge.
Fixes: c3245a566400 ("PCI: iproc: Request host bridge window resources")
Reported-and-tested-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v4.8+
|
|
Fix these warnings:
drivers/staging//rtl8192u/r8192U_dm.c:2307:49: warning: cast from restricted __le16
drivers/staging//rtl8192u/r8192U_dm.c:2308:44: warning: cast from restricted __le16
drivers/staging//rtl8192u/r8192U_dm.c:2309:44: warning: cast from restricted __le16
Signed-off-by: Matthieu Simon <gmatthsim@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Rename macro definition and usage to represent correct spelling of DEFAULT:
ODM_REG_RX_DEFUALT_A_11N => ODM_REG_RX_DEFAULT_A_11N
ODM_REG_RX_DEFUALT_B_11N => ODM_REG_RX_DEFAULT_B_11N
Signed-off-by: Sebastian Haas <sehaas@deebas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix typos reported by checkpatch.pl
Signed-off-by: Sebastian Haas <sehaas@deebas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Some functions like kmalloc/kzalloc return NULL on failure.
When NULL represents failure, !x is commonly used.
This was done using Coccinelle:
@@
expression *e;
identifier l1;
@@
e = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\)(...);
...
- e == NULL
+ !e
Signed-off-by: simran singhal <singhalsimran0@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch fixes two coding style errors, reported by the checkpatch
script.
Signed-off-by: Elia Geretto <elia.f.geretto@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Indent the code in order to follow the rules and to
increase the readability of the code.
Signed-off-by: Georgiana Rodica Chelu <georgiana.chelu93@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Some type conversions like casting a pointer/non-pointer to a pointer of same type,
casting to the original type using addressof(&) operator, etc. are not needed.
Therefore, remove them. Done using coccinelle:
@@
type t;
t *p;
t a;
@@
(
- (t)(a)
+ a
|
- (t *)(p)
+ p
|
- (t *)(&a)
+ &a
)
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Removed redundant code marked with #if 0 to fix the following
checkpath.pl issue: CHECK: if this code is redundant consider
removing it.
Signed-off-by: Tamara Diaconita <diaconita.tamara@gmail.com>
Reviewed-by: Daniel Baluta <daniel.baluta@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Changed the structure tag 'struct cxd' with the variable '*ci' to fix the
following checkpath.pl issue: CHECK: Prefer kzalloc(sizeof(*ci)...) over
kzalloc(sizeof(struct cxd)).
Signed-off-by: Tamara Diaconita <diaconita.tamara@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Removed multiple blank lines to fix the checkpath.pl issue:
CHECK: Please don't use multiple blank lines.
Signed-off-by: Tamara Diaconita <diaconita.tamara@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Added spaces around multiple arithmetical operators
to fix the checkpatch.pl issue: CHECK: spaces preferred around that 'operator'.
Signed-off-by: Tamara Diaconita <diaconita.tamara@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Neil Brown pointed out a potential deadlock in raid 10 code with
bio_split/chain. The raid1 code could have the same issue, but recent
barrier rework makes it less likely to happen. The deadlock happens in
below sequence:
1. generic_make_request(bio), this will set current->bio_list
2. raid10_make_request will split bio to bio1 and bio2
3. __make_request(bio1), wait_barrer, add underlayer disk bio to
current->bio_list
4. __make_request(bio2), wait_barrer
If raise_barrier happens between 3 & 4, since wait_barrier runs at 3,
raise_barrier waits for IO completion from 3. And since raise_barrier
sets barrier, 4 waits for raise_barrier. But IO from 3 can't be
dispatched because raid10_make_request() doesn't finished yet.
The solution is to adjust the IO ordering. Quotes from Neil:
"
It is much safer to:
if (need to split) {
split = bio_split(bio, ...)
bio_chain(...)
make_request_fn(split);
generic_make_request(bio);
} else
make_request_fn(mddev, bio);
This way we first process the initial section of the bio (in 'split')
which will queue some requests to the underlying devices. These
requests will be queued in generic_make_request.
Then we queue the remainder of the bio, which will be added to the end
of the generic_make_request queue.
Then we return.
generic_make_request() will pop the lower-level device requests off the
queue and handle them first. Then it will process the remainder
of the original bio once the first section has been fully processed.
"
Note, this only happens in read path. In write path, the bio is flushed to
underlaying disks either by blk flush (from schedule) or offladed to raid1/10d.
It's queued in current->bio_list.
Cc: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org (v3.14+, only the raid10 part)
Suggested-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Shaohua Li <shli@fb.com>
|
|
These arrays, created with "mdadm --build" don't benefit from a limit.
The default will be used, which is '0' and is interpreted as "don't
impose a limit".
Reported-by: ian_bruce@mail.ru
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
|
|
raid1_resize and raid5_resize should also check the
mddev->queue if run underneath dm-raid.
And both set_capacity and revalidate_disk are used in
pers->resize such as raid1, raid10 and raid5. So
move them from personality file to common code.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
|