Age | Commit message (Collapse) | Author |
|
__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.
Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped. Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.
It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there. The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...
Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This is missing a zeroing of the high bits of flags, and is also not
correct for big endian machines. Properly zero extend the 32 bit flags
into the 64 bit stack variable.
Reported-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Fixes: bccd06223f21 ("IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
|
|
Commit 33679a50370d ("MIPS: uasm: Remove needless ISA abstraction")
removed use of the MIPS_ISA preprocessor macro, but left a couple of
unused definitions of it behind.
Remove the dead code.
Signed-off-by: Paul Burton <paul.burton@mips.com>
|
|
Change t4fw_version.h to update latest firmware version
number to 1.20.8.0.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The definition of static_key_slow_inc() has cpus_read_lock in place. In the
virtio_net driver, XPS queues are initialized after setting the queue:cpu
affinity in virtnet_set_affinity() which is already protected within
cpus_read_lock. Lockdep prints a warning when we are trying to acquire
cpus_read_lock when it is already held.
This patch adds an ability to call __netif_set_xps_queue under
cpus_read_lock().
Acked-by: Jason Wang <jasowang@redhat.com>
============================================
WARNING: possible recursive locking detected
4.18.0-rc3-next-20180703+ #1 Not tainted
--------------------------------------------
swapper/0/1 is trying to acquire lock:
00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: static_key_slow_inc+0xe/0x20
but task is already holding lock:
00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: init_vqs+0x513/0x5a0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(cpu_hotplug_lock.rw_sem);
lock(cpu_hotplug_lock.rw_sem);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by swapper/0/1:
#0: 00000000244bc7da (&dev->mutex){....}, at: __driver_attach+0x5a/0x110
#1: 00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: init_vqs+0x513/0x5a0
#2: 000000005cd8463f (xps_map_mutex){+.+.}, at: __netif_set_xps_queue+0x8d/0xc60
v2: move cpus_read_lock() out of __netif_set_xps_queue()
Cc: "Nambiar, Amritha" <amritha.nambiar@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Fixes: 8af2c06ff4b1 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When specifying PCI devices on the kernel command line using a
bus/device/function address, bus numbers can change when adding or
replacing a device, changing motherboard firmware, or applying kernel
parameters like "pci=assign-buses". When bus numbers change, it's likely
the command line tweak will be applied to the wrong device.
Therefore, it is useful to be able to specify devices with a base bus
number and the path of devfns needed to get to it, similar to the "device
scope" structure in the Intel VT-d spec, Section 8.3.1.
Thus, we add an option to specify devices in the following format:
[<domain>:]<bus>:<device>.<func>[/<device>.<func>]*
The path can be any segment within the PCI hierarchy of any length and
determined through the use of 'lspci -t'. When specified this way, it is
less likely that a renumbered bus will result in a valid device
specification and the tweak won't be applied to the wrong device.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
|
|
Separate out the code to match a PCI device with a string (typically
originating from a kernel parameter) from the
pci_specified_resource_alignment() function into its own helper function.
While we are at it, this change fixes the kernel style of the function
(fixing a number of long lines and extra parentheses).
Additionally, make the analogous change to the kernel parameter
documentation: Separate the description of how to specify a PCI device
into its own section at the head of the "pci=" parameter.
This patch should have no functional alterations.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
|
|
mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test. Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files. One
of those files gets closed. Total refcount of mnt is 2. On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69. There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting. It observes the
decrement of component #69, but not the increment of component #0. As the
result, the total it gets is not 1 as it should've been - it's 0. At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down. However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.
It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput(). IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Move declarations for these functions:
pci_dev_specific_acs_enabled()
pci_dev_specific_enable_acs()
from include/linux/pci.h to drivers/pci/pci.h because nothing outside the
PCI core needs to use them.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Convert the state numbers, device state, etc from numbers to strings
when printing debug messages.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Array msi_tgt_status is defined but never used, hence it is
redundant and can be removed.
Cleans up clang warning:
warning: 'msi_tgt_status' defined but not used [-Wunused-const-variable=]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The totally undocumented IO mode needs to be set to enumerator
0 to enable port 4 also known as WAN in most configurations,
for ordinary traffic. The 3 bits in the register come up as
010 after reset, but need to be set to 000.
The Realtek source code contains a name for these bits, but
no explanation of what the 8 different IO modes may be.
Set it to zero for the time being and drop a comment so
people know what is going on if they run into trouble. This
"mode zero" works fine with the D-Link DIR-685 with
RTL8366RB.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix ptr_ret.cocci warnings:
drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c:543:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the refcnt is never decremented in case the value is not 1.
Fix it by adding decrement in case the refcnt is not 1.
Reported-by: Vlad Buslov <vladbu@mellanox.com>
Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes the following sparse warning:
net/decnet/dn_route.c:407:30: warning: Using plain integer as NULL pointer
net/decnet/dn_route.c:1923:22: warning: Using plain integer as NULL pointer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes the following sparse warning:
./include/linux/skbuff.h:2365:58: warning: Using plain integer as NULL pointer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add flags to enable/disable supported chips in be2net.
With disable support are removed coresponding PCI IDs and
also codepaths with [BE2|BE3|BEx|lancer|skyhawk]_chip checks.
Disable chip will reduce module size by:
BE2 ~2kb
BE3 ~3kb
Lancer ~10kb
Skyhawk ~9kb
When enable skyhawk only it will reduce module size by ~20kb
New help style in Kconfig
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPv6 GRO over GRE tap is not working while GRO is not set
over the native interface.
gro_list_prepare function updates the same_flow variable
of existing sessions to 1 if their mac headers match the one
of the incoming packet.
same_flow is used to filter out non-matching sessions and keep
potential ones for aggregation.
The number of bytes to compare should be the number of bytes
in the mac headers. In gro_list_prepare this number is set to
be skb->dev->hard_header_len. For GRE interfaces this hard_header_len
should be as it is set in the initialization process (when GRE is
created), it should not be overridden. But currently it is being overridden
by the value that is actually supposed to represent the needed_headroom.
Therefore, the number of bytes compared in order to decide whether the
the mac headers are the same is greater than the length of the headers.
As it's documented in netdevice.h, hard_header_len is the maximum
hardware header length, and needed_headroom is the extra headroom
the hardware may need.
hard_header_len is basically all the bytes received by the physical
till layer 3 header of the packet received by the interface.
For example, if the interface is a GRE tap then the needed_headroom
should be the total length of the following headers:
IP header of the physical, GRE header, mac header of GRE.
It is often used to calculate the MTU of the created interface.
This patch removes the override of the hard_header_len, and
assigns the calculated value to needed_headroom.
This way, the comparison in gro_list_prepare is really of
the mac headers, and if the packets have the same mac headers
the same_flow will be set to 1.
Performance testing: 45% higher bandwidth.
Measuring bandwidth of single-stream IPv4 TCP traffic over IPv6
GRE tap while GRO is not set on the native.
NIC: ConnectX-4LX
Before (GRO not working) : 7.2 Gbits/sec
After (GRO working): 10.5 Gbits/sec
Signed-off-by: Maria Pasechnik <mariap@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Manish Chopra says:
====================
qed*: Enhancements
This patch series adds following support in drivers -
1. Egress mqprio offload.
2. Add destination IP based flow profile.
3. Ingress flower offload (for drop action).
Please consider applying this series to "net-next".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The main motive of this patch is to lay down driver's
tc offload infrastructure in place.
With these changes tc can offload various supported flow
profiles (4 tuples, src-ip, dst-ip, l4 port) for the drop
action. Dropped flows statistic is a global counter for
all the offloaded flows for drop action and is populated
in ethtool statistics as common "gft_filter_drop".
Examples -
tc qdisc add dev p4p1 ingress
tc filter add dev p4p1 protocol ipv4 parent ffff: flower \
skip_sw ip_proto tcp dst_ip 192.168.40.200 action drop
tc filter add dev p4p1 protocol ipv4 parent ffff: flower \
skip_sw ip_proto udp src_ip 192.168.40.100 action drop
tc filter add dev p4p1 protocol ipv4 parent ffff: flower \
skip_sw ip_proto tcp src_ip 192.168.40.100 dst_ip 192.168.40.200 \
src_port 453 dst_port 876 action drop
tc filter add dev p4p1 protocol ipv4 parent ffff: flower \
skip_sw ip_proto tcp dst_port 98 action drop
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support for dropping and redirecting
the flows based on destination IP in the packet.
This also moves the profile mode settings in their own
functions which can be used through tc flows in successive
patch.
For example -
ethtool -N p5p1 flow-type tcp4 dst-ip 192.168.40.100 action -1
ethtool -N p5p1 flow-type udp4 dst-ip 192.168.50.100 action 1
ethtool -N p5p1 flow-type tcp4 dst-ip 192.168.60.100 action 0x100000000
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support for tc mqprio offload,
using this different traffic classes on the adapter
can be utilized based on configured priority to tc map.
For example -
tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3
This will cause SKBs with priority 0,1,2,3 to transmit
over tc 0,1,2,3 hardware queues respectively.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Julian Wiedmann says:
====================
s390/qeth: updates 2018-08-09
one more set of patches for net-next. This is all sorts of cleanups and
more refactoring on the way to using netdev_priv. Please apply.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Return statements in functions returning bool should use true or false
instead of an integer value.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allocating the main qeth_card struct with GFP_DMA blocks us from moving
it into netdev_priv(). But the only reason why we need DMA memory is the
ccw1 structs embedded into each ccw channel. So extract those into
separate allocations, like we already do for the cmd buffers.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The qeth_card struct is kzalloc-ed, so remove all the redundant
0-initializations. While at it, split up what's left of
qeth_determine_card_type().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The data channel currently doesn't need a setup operation, because we
don't use pre-allocated cmd buffers for its IO. But subsequent changes
will introduce further setup that also applies to the data channel.
This refactors things a bit, so that the new stuff can then be
automatically applied to all channels.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Re-work the helper a little bit, so that it can be used for all CCWs
that qeth issues.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Where possible use accessor macros and local pointers to access the ccw
channels. This makes it less likely to miss a spot.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Just a little code deduplication.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sparse complains:
drivers/block/null_blk_main.c:816:24: sparse: context imbalance in 'null_insert_page' - unexpected unlock
Fix it by adding the necessary annotations to the function.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a device-specific reset for Intel DC P3700 NVMe device which exhibits a
timeout failure in drivers waiting for the ready status to update after
NVMe enable if the driver interacts with the device too soon after FLR. As
this has been observed in device assignment scenarios, resolve this with a
device-specific reset quirk to add an additional, heuristically determined,
delay after the FLR completes.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1592654
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The Samsung SM961/PM961 (960 EVO) sometimes fails to return from FLR with
the PCI config space reading back as -1. A reproducible instance of this
behavior is resolved by clearing the enable bit in the NVMe configuration
register and waiting for the ready status to clear (disabling the NVMe
controller) prior to FLR.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1542494
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
pcie_flr() suggests pcie_has_flr() to ensure that PCIe FLR support is
present prior to calling. pcie_flr() is exported while pcie_has_flr()
is not. Resolve this.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The ret is not modified after initalization, So just remove the variable
and return 0.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Return statements in functions returning bool should use true or false
instead of an integer value.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
write_op[] is never modified, so make it 'const'.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
READ_BUF(8);
dummy = be32_to_cpup(p++);
dummy = be32_to_cpup(p++);
...
READ_BUF(4);
dummy = be32_to_cpup(p++);
Assigning value to "dummy" here, but that stored value
is overwritten before it can be used.
At the same time READ_BUF() will re-update the pointer p.
delete invalid assignment statements
Signed-off-by: nixiaoming <nixiaoming@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
I've given up on the idea of zero-copy handling of SYMLINK on the
server side. This is because the Linux VFS symlink API requires the
symlink pathname to be in a NUL-terminated kmalloc'd buffer. The
NUL-termination is going to be problematic (watching out for
landing on a page boundary and dealing with a 4096-byte pathname).
I don't believe that SYMLINK creation is on a performance path or is
requested frequently enough that it will cause noticeable CPU cache
pollution due to data copies.
There will be two places where a transport callout will be necessary
to fill in the rqstp: one will be in the svc_fill_symlink_pathname()
helper that is used by NFSv2 and NFSv3, and the other will be in
nfsd4_decode_create().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
fill_in_write_vector() is nearly the same logic as
svc_fill_write_vector(), but there are a few differences so that
the former can handle multiple WRITE payloads in a single COMPOUND.
svc_fill_write_vector() can be adjusted so that it can be used in
the NFSv4 WRITE code path too. Instead of assuming the pages are
coming from rq_args.pages, have the caller pass in the page list.
The immediate benefit is a reduction of code duplication. It also
prevents the NFSv4 WRITE decoder from passing an empty vector
element when the transport has provided the payload in the xdr_buf's
page array.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Simplify the error handling at the tail of recv_read_chunk() by
re-arranging rq_pages[] housekeeping and documenting it properly.
NB: In this path, svc_rdma_recvfrom returns zero. Therefore no
subsequent reply processing is done on the svc_rqstp, and thus the
rq_respages field does not need to be updated.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
svc_xprt_release() invokes svc_free_res_pages(), which releases
pages between rq_respages and rq_next_page.
Historically, the RPC/RDMA transport has set these two pointers to
be different by one, which means:
- one page gets released when svc_recv returns 0. This normally
happens whenever one or more RDMA Reads need to be dispatched to
complete construction of an RPC Call.
- one page gets released after every call to svc_send.
In both cases, this released page is immediately refilled by
svc_alloc_arg. There does not seem to be a reason for releasing this
page.
To avoid this unnecessary memory allocator traffic, set rq_next_page
more carefully.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Warning level 2 was used: -Wimplicit-fallthrough=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Variables 'checksumlen','blocksize' and 'data' are being assigned,
but are never used, hence they are redundant and can be removed.
Fix the following warning:
net/sunrpc/auth_gss/gss_krb5_wrap.c:443:7: warning: variable ‘blocksize’ set but not used [-Wunused-but-set-variable]
net/sunrpc/auth_gss/gss_krb5_crypto.c:376:15: warning: variable ‘checksumlen’ set but not used [-Wunused-but-set-variable]
net/sunrpc/xprtrdma/svc_rdma.c:97:9: warning: variable ‘data’ set but not used [-Wunused-but-set-variable]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
nfsd and lockd call vfs_lock_file() to lock/unlock the inode
returned by locks_inode(file).
Many places in nfsd/lockd code use the inode returned by
file_inode(file) for lock manipulation. With Overlayfs, file_inode()
(the underlying inode) is not the same object as locks_inode() (the
overlay inode). This can result in "Leaked POSIX lock" messages
and eventually to a kernel crash as reported by Eddie Horng:
https://marc.info/?l=linux-unionfs&m=153086643202072&w=2
Fix all the call sites in nfsd/lockd that should use locks_inode().
This is a correctness bug that manifested when overlayfs gained
NFS export support in v4.16.
Reported-by: Eddie Horng <eddiehorng.tw@gmail.com>
Tested-by: Eddie Horng <eddiehorng.tw@gmail.com>
Cc: Jeff Layton <jlayton@kernel.org>
Fixes: 8383f1748829 ("ovl: wire up NFS export operations")
Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
All announced Threadripper 29xx models have a temperature offset of
27 degrees C. Simplify temperature offset table to match all 29xx
Threadripper models with a single entry. Also simplify the table to match
all 19xx Threadripper models with a single entry. This effectively drops
entries for Threadripper 1910/1920/1950 which never saw the light of day.
Cc: Michael Larabel <Michael@phoronix.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Jesper Dangaard Brouer says:
====================
Removing entries from cpumap and devmap, goes through a number of
syncronization steps to make sure no new xdp_frames can be enqueued.
But there is a small chance, that xdp_frames remains which have not
been flushed/processed yet. Flushing these during teardown, happens
from RCU context and not as usual under RX NAPI context.
The optimization introduced in commt 389ab7f01af9 ("xdp: introduce
xdp_return_frame_rx_napi"), missed that the flush operation can also
be called from RCU context. Thus, we cannot always use the
xdp_return_frame_rx_napi call, which take advantage of the protection
provided by XDP RX running under NAPI protection.
The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
adjusted to easier reproduce (verified by Red Hat QA).
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Like cpumap teardown, the devmap teardown code also flush remaining
xdp_frames, via bq_xmit_all() in case map entry is removed. The code
can call xdp_return_frame_rx_napi, from the the wrong context, in-case
ndo_xdp_xmit() fails.
Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi")
Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The teardown race in cpumap is really hard to reproduce. These changes
makes it easier to reproduce, for QA.
The --stress-mode now have a case of a very small queue size of 8, that helps
to trigger teardown flush to encounter a full queue, which results in calling
xdp_return_frame API, in a non-NAPI protect context.
Also increase MAX_CPUS, as my QA department have larger machines than me.
Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When removing a cpumap entry, a number of syncronization steps happen.
Eventually the teardown code __cpu_map_entry_free is invoked from/via
call_rcu.
The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
The issues is that the teardown code is not running in the RX NAPI
code path. Thus, it is not allowed to invoke the NAPI variant of
xdp_return_frame.
This bug was found and triggered by using the --stress-mode option to
the samples/bpf program xdp_redirect_cpu. It is hard to trigger,
because the ptr_ring have to be full and cpumap bulk queue max
contains 8 packets, and a remote CPU is racing to empty the ptr_ring
queue.
Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi")
Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|