Age | Commit message (Collapse) | Author |
|
This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.
Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Updated README.
Added config file that contains the minimum required features enabled to
run the tests currently present in the kernel.
This must be updated when new unittests are created and require their own
modules.
Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Guillaume Nault says:
====================
l2tp: rework pppol2tp ioctl handling
The current ioctl() handling code can be simplified. It tests for
non-relevant conditions and uselessly holds sockets. Once useless
code is removed, it becomes even simpler to let pppol2tp_ioctl() handle
commands directly, rather than dispatch them to pppol2tp_tunnel_ioctl()
or pppol2tp_session_ioctl(). That is the approach taken by this series.
Patch #1 and #2 define helper functions aimed at simplifying the rest
of the patch set.
Patch #3 drops useless tests in pppol2p_ioctl() and avoid holding a
refcount on the socket.
Patches #4, #5 and #6 are the core of the series. They let
pppol2tp_ioctl() handle all ioctls and drop the tunnel and session
specific functions.
Then patch #6 brings a little bit of consolidation.
Finally, patch #7 takes advantage of the simplified code to make
pppol2tp sockets compatible with dev_ioctl(). Certainly not a killer
feature, but it is trivial and it is always nice to see l2tp getting
better integration with the rest of the stack.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Return -ENOIOCTLCMD for unknown ioctl commands. This lets dev_ioctl()
handle generic socket ioctls like SIOCGIFNAME or SIOCGIFINDEX.
PF_PPPOX/PX_PROTO_OL2TP was one of the few socket types not honouring
this mechanism.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Integrate memset(0) in pppol2tp_copy_stats() to avoid calling it
manually every time.
While there, constify 'stats'.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
pppol2tp_ioctl() has everything in place for handling PPPIOCGL2TPSTATS
on session sockets. We just need to copy the stats and set ->session_id.
As a side effect of sharing session and tunnel code, ->using_ipsec is
properly set even when the request was made using a session socket.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Handle PPPIOCGL2TPSTATS in pppol2tp_ioctl() if the socket represents a
tunnel. This one is a bit special because the caller may use the tunnel
socket to retrieve statistics of one of its sessions. If the session_id
is set, the corresponding session's statistics are returned, instead of
those of the tunnel. This is handled by the new
pppol2tp_tunnel_copy_stats() helper function.
Set ->tunnel_id and ->using_ipsec out of the conditional, so
that it can be used by the 'else' branch in the following patch.
We cannot do that for ->session_id, because tunnel sockets have to
report the value that was originally passed in 'stats.session_id',
while session sockets have to report their own session_id.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Let pppol2tp_ioctl() handle ioctl commands directly. It still relies on
pppol2tp_{session,tunnel}_ioctl() for PPPIOCGL2TPSTATS.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
* Drop test on 'sk': sock->sk cannot be NULL, or pppox_ioctl() could
not have called us.
* Drop test on 'SOCK_DEAD' state: if this flag was set, the socket
would be in the process of being released and no ioctl could be
running anymore.
* Drop test on 'PPPOX_*' state: we depend on ->sk_user_data to get
the session structure. If it is non-NULL, then the socket is
connected. Testing for PPPOX_* is redundant.
* Retrieve session using ->sk_user_data directly, instead of going
through pppol2tp_sock_to_session(). This avoids grabbing a useless
reference on the socket.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
l2tp_session_get() is used for two different purposes. If 'tunnel' is
NULL, the session is searched globally in the supplied network
namespace. Otherwise it is searched exclusively in the tunnel context.
Callers always know the context in which they need to search the
session. But some of them do provide both a namespace and a tunnel,
making the semantic of the call unclear.
This patch defines l2tp_tunnel_get_session() for lookups done in a
tunnel and restricts l2tp_session_get() to namespace searches.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use helper function to figure out if a tunnel is using ipsec.
Also, avoid accessing ->sk_policy directly since it's RCU protected.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ilias Apalodimas says:
====================
netsec driver improvements
This patchset introduces some improvements on socionext netsec driver.
- patch 1/2, avoids unneeded MMIO reads on the Rx path
- patch 2/2, is adjusting the numbers of descriptors used
Changes since v1:
- Move dma_rmb() to protect descriptor accesses until the device
has updated the NETSEC_RX_PKT_OWN_FIELD bit
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Increasing descriptors to 256 from 128 and adjusting the NAPI weight
to 64 increases performace on Rx by ~20% on 64byte packets
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MMIO reads for remaining packets in queue occur (at least)twice per
invocation of netsec_process_rx(). We can use the packet descriptor to
identify if it's owned by the hardware and break out, avoiding the more
expensive MMIO read operations. This has a ~2% increase on the pps of the
Rx path when tested with 64byte packets
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/neterion/vxge/vxge-config.c:1097:6: warning:
variable 'ret' set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/neterion/vxge/vxge-config.c:2263:6: warning:
variable 'req_out' set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/neterion/vxge/vxge-config.c:2262:22: warning:
variable 'status' set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/neterion/vxge/vxge-config.c:2360:22: warning:
variable 'status' set but not used [-Wunused-but-set-variable]
enum vxge_hw_status status = VXGE_HW_OK;
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Caleb Raitto says:
====================
virtio_net: Expand affinity to arbitrary numbers of cpu and vq
Virtio-net tries to pin each virtual queue rx and tx interrupt to a cpu if
there are as many queues as cpus.
Expand this heuristic to configure a reasonable affinity setting also
when the number of cpus != the number of virtual queues.
Patch 1 allows vqs to take an affinity mask with more than 1 cpu.
Patch 2 generalizes the algorithm in virtnet_set_affinity beyond
the case where #cpus == #vqs.
v2 changes:
Renamed "virtio_net: Make vp_set_vq_affinity() take a mask." to
"virtio: Make vp_set_vq_affinity() take a mask."
Tested:
[InstanceSetup]
set_multiqueue = false
$ cd /proc/irq
$ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done
0-15
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
0-15
0-15
0-15
0-15
$ cd /sys/class/net/eth0/queues/
$ for i in `seq 0 15` ; do sudo grep ".*" tx-$i/xps_cpus; done
0001
0002
0004
0008
0010
0020
0040
0080
0100
0200
0400
0800
1000
2000
4000
8000
$ sudo ethtool -L eth0 combined 15
$ cd /proc/irq
$ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done
0-15
0-1
0-1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
15
15
0-15
0-15
0-15
0-15
$ cd /sys/class/net/eth0/queues/
$ for i in `seq 0 14` ; do sudo grep ".*" tx-$i/xps_cpus; done
0003
0004
0008
0010
0020
0040
0080
0100
0200
0400
0800
1000
2000
4000
8000
$ sudo ethtool -L eth0 combined 8
$ cd /proc/irq
$ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done
0-15
0-1
0-1
2-3
2-3
4-5
4-5
6-7
6-7
8-9
8-9
10-11
10-11
12-13
12-13
14-15
14-15
9
9
10
10
11
11
12
12
13
13
14
14
15
15
15
15
0-15
0-15
0-15
0-15
$ cd /sys/class/net/eth0/queues/
$ for i in `seq 0 7` ; do sudo grep ".*" tx-$i/xps_cpus; done
0003
000c
0030
00c0
0300
0c00
3000
c000
$ sudo ethtool -L eth0 combined 16
$ sudo sh -c "echo 0 > /sys/devices/system/cpu/cpu15/online"
$ cd /proc/irq
$ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done
0-15
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
0
0
0-15
0-15
0-15
0-15
$ cd /sys/class/net/eth0/queues/
$ for i in `seq 0 15` ; do sudo grep ".*" tx-$i/xps_cpus; done
0001
0002
0004
0008
0010
0020
0040
0080
0100
0200
0400
0800
1000
2000
4000
0001
$ for i in `seq 8 15`; \
do sudo sh -c "echo 0 > /sys/devices/system/cpu/cpu$i/online"; done
$ cd /proc/irq
$ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done
0-15
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
0-15
0-15
0-15
0-15
$ cd /sys/class/net/eth0/queues/
$ for i in `seq 0 15` ; do sudo grep ".*" tx-$i/xps_cpus; done
0001
0002
0004
0008
0010
0020
0040
0080
0001
0002
0004
0008
0010
0020
0040
0080
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Always set the affinity hint, even if #cpu != #vq.
Handle the case where #cpu > #vq (including when #cpu % #vq != 0) and
when #vq > #cpu (including when #vq % #cpu != 0).
Signed-off-by: Caleb Raitto <caraitto@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jon Olson <jonolson@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make vp_set_vq_affinity() take a cpumask instead of taking a single CPU.
If there are fewer queues than cores, queue affinity should be able to
map to multiple cores.
Link: https://patchwork.ozlabs.org/patch/948149/
Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Caleb Raitto <caraitto@google.com>
Acked-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PTP support includes:
Ingress, and egress timestamping.
One step timestamping available.
PTP clock support.
Periodic output support.
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yuchung Cheng says:
====================
tcp: new mechanism to ACK immediately
This patch is a follow-up feature improvement to the recent fixes on
the performance issues in ECN (delayed) ACKs. Many of the fixes use
tcp_enter_quickack_mode routine to force immediate ACKs. However the
routine also reset tracking interactive session. This is not ideal
because these immediate ACKs are required by protocol specifics
unrelated to the interactiveness nature of the application.
This patch set introduces a new flag to send a one-time immediate ACK
without changing the status of interactive session tracking. With this
patch set the immediate ACKs are generated upon these protocol states:
1) When a hole is repaired
2) When CE status changes between subsequent data packets received
3) When a data packet carries CWR flag
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously commit 9aee40006190 ("tcp: ack immediately when a cwr
packet arrives") calls tcp_enter_quickack_mode to force sending
two immediate ACKs upon receiving a packet w/ CWR flag. The side
effect is it'll also reset the delayed ACK timer and interactive
session tracking. This patch removes that side effect by using the
new ACK_NOW flag to force an immmediate ACK.
Packetdrill to demonstrate:
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
+.1 < [ect0] . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
+0 < [ect0] . 1:1001(1000) ack 1 win 257
+0 > [ect01] . 1:1(0) ack 1001
+0 write(4, ..., 1) = 1
+0 > [ect01] P. 1:2(1) ack 1001
+0 < [ect0] . 1001:2001(1000) ack 2 win 257
+0 write(4, ..., 1) = 1
+0 > [ect01] P. 2:3(1) ack 2001
+0 < [ect0] . 2001:3001(1000) ack 3 win 257
+0 < [ect0] . 3001:4001(1000) ack 3 win 257
// Ack delayed ...
+.01 < [ce] P. 4001:4501(500) ack 3 win 257
+0 > [ect01] . 3:3(0) ack 4001
+0 > [ect01] E. 3:3(0) ack 4501
+.001 read(4, ..., 4500) = 4500
+0 write(4, ..., 1) = 1
+0 > [ect01] PE. 3:4(1) ack 4501 win 100
+.01 < [ect0] W. 4501:5501(1000) ack 4 win 257
// No delayed ACK on CWR flag
+0 > [ect01] . 4:4(0) ack 5501
+.31 < [ect0] . 5501:6501(1000) ack 4 win 257
+0 > [ect01] . 4:4(0) ack 6501
Fixes: 9aee40006190 ("tcp: ack immediately when a cwr packet arrives")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RFC 5681 sec 4.2:
To provide feedback to senders recovering from losses, the receiver
SHOULD send an immediate ACK when it receives a data segment that
fills in all or part of a gap in the sequence space.
When a gap is partially filled, __tcp_ack_snd_check already checks
the out-of-order queue and correctly send an immediate ACK. However
when a gap is fully filled, the previous implementation only resets
pingpong mode which does not guarantee an immediate ACK because the
quick ACK counter may be zero. This patch addresses this issue by
marking the one-time immediate ACK flag instead.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The recent fix of acking immediately in DCTCP on CE status change
has an undesirable side-effect: it also resets TCP ack timer and
disables pingpong mode (interactive session). But the CE status
change has nothing to do with them. This patch addresses that by
using the new one-time immediate ACK flag instead of calling
tcp_enter_quickack_mode().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new flag to indicate a one-time immediate ACK. This flag is
occasionaly set under specific TCP protocol states in addition to
the more common quickack mechanism for interactive application.
In several cases in the TCP code we want to force an immediate ACK
but do not want to call tcp_enter_quickack_mode() because we do
not want to forget the icsk_ack.pingpong or icsk_ack.ato state.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in this particular case, I placed the "fall through"
annotation at the bottom of the case, which is what GCC is expecting
to find.
Addresses-Coverity-ID: 115075 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in this particular case, I placed the "fall through"
annotation at the bottom of the case, which is what GCC is expecting
to find.
Addresses-Coverity-ID: 1369529 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in this particular case, I replaced the code comment at the
top of the switch statement with a proper "fall through" annotation for
each case, which is what GCC is expecting to find.
Addresses-Coverity-ID: 1056542 ("Missing break in switch")
Addresses-Coverity-ID: 1339579 ("Missing break in switch")
Addresses-Coverity-ID: 1369526 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The static int 'zero' is defined but is never used hence it is
redundant and can be removed. The use of this variable was removed
with commit a158bdd3247b ("rxrpc: Fix call timeouts").
Cleans up clang warning:
warning: 'zero' defined but not used [-Wunused-const-variable=]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rtl8152_system_suspend
rtl8152_system_suspend defines the variable "ret", but it is not modified
after initialization. So just remove it.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
"Last bit of straggler fixes...
1) Fix btf library licensing to LGPL, from Martin KaFai lau.
2) Fix error handling in bpf sockmap code, from Daniel Borkmann.
3) XDP cpumap teardown handling wrt. execution contexts, from Jesper
Dangaard Brouer.
4) Fix loss of runtime PM on failed vlan add/del, from Ivan
Khoronzhuk.
5) xen-netfront caches skb_shinfo(skb) across a __pskb_pull_tail()
call, which potentially changes the skb's data buffer, and thus
skb_shinfo(). Fix from Juergen Gross"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
xen/netfront: don't cache skb_shinfo()
net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlan
net: ethernet: ti: cpsw: clear all entries when delete vid
xdp: fix bug in devmap teardown code path
samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier
xdp: fix bug in cpumap teardown code path
bpf, sockmap: fix cork timeout for select due to epipe
bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path
bpf, sockmap: fix bpf_tcp_sendmsg sock error handling
bpf: btf: Change tools/lib/bpf/btf to LGPL
|
|
skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
its return value.
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Grygorii Strashko says:
====================
net: ethernet: ti: cpsw: fix runtime pm while add/del reserved vid
Here 2 not critical fixes for:
- vlan ale table leak while error if deleting vlan (simplifies next fix)
- runtime pm while try to set reserved vlan
====================
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's exclusive with normal behaviour but if try to set vlan to one of
the reserved values is made, the cpsw runtime pm is broken.
Fixes: a6c5d14f5136 ("drivers: net: cpsw: ndev: fix accessing to suspended device")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In cases if some of the entries were not found in forwarding table
while killing vlan, the rest not needed entries still left in the
table. No need to stop, as entry was deleted anyway. So fix this by
returning error only after all was cleaned. To implement this, return
-ENOENT in cpsw_ale_del_mcast() as it's supposed to be.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver uses genalloc functions. Select GENERIC_ALLOCATOR to prevent
build errors when selected through COMPILE_TEST.
Fixes: 88a40e7dca00 ("mtd: rawnand: atmel: Allow selection of this driver when COMPILE_TEST=y")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
|
|
Pull SPI NOR updates from Boris Brezillon:
"
Core changes:
- Apply reset hacks only when reset is explicitly marked as broken in
the DT
Driver changes:
- Minor cleanup/fixes in the m25p80 driver
- Release flash_np in the nxp-spifi driver
- Add suspend/resume hooks to the atmel-quadspi driver
- Include gpio/consumer.h instead of gpio.h in the atmel-quadspi driver
- Use %pK instead of %p in the stm32-quadspi driver
- Improve timeout handling in the cadence-quadspi driver
- Use mtd_device_register() instead of mtd_device_parse_register() in
the intel-spi driver
"
|
|
Pull NAND updates from Miquel Raynal:
"
NAND core changes:
- Add the SPI-NAND framework.
- Create a helper to find the best ECC configuration.
- Create NAND controller operations.
- Allocate dynamically ONFI parameters structure.
- Add defines for ONFI version bits.
- Add manufacturer fixup for ONFI parameter page.
- Add an option to specify NAND chip as a boot device.
- Add Reed-Solomon error correction algorithm.
- Better name for the controller structure.
- Remove unused caller_is_module() definition.
- Make subop helpers return unsigned values.
- Expose _notsupp() helpers for raw page accessors.
- Add default values for dynamic timings.
- Kill the chip->scan_bbt() hook.
- Rename nand_default_bbt() into nand_create_bbt().
- Start to clean the nand_chip structure.
- Remove stale prototype from rawnand.h.
Raw NAND controllers drivers changes:
- Qcom: structuring cleanup.
- Denali: use core helper to find the best ECC configuration.
- Possible build of almost all drivers by adding a dependency on
COMPILE_TEST for almost all of them in Kconfig, implies various
fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even
changes in sparc64 and ia64 architectures.
- Clean the ->probe() functions error path of a lot of drivers.
- Migrate all drivers to use nand_scan() instead of
nand_scan_ident()/nand_scan_tail() pair.
- Use mtd_device_register() where applicable to simplify the code.
- Marvell:
* Handle on-die ECC.
* Better clocks handling.
* Remove bogus comment.
* Add suspend and resume support.
- Tegra: add NAND controller driver.
- Atmel:
* Add module param to avoid using dma.
* Drop Wenyou Yang from MAINTAINERS.
- Denali: optimize timings handling.
- FSMC: Stop using chip->read_buf().
- FSL:
* Switch to SPDX license tag identifiers.
* Fix qualifiers in MXC init functions.
Raw NAND chip drivers changes:
- Micron:
* Add fixup for ONFI revision.
* Update ecc_stats.corrected.
* Make ECC activation stateful.
* Avoid enabling/disabling ECC when it can't be disabled.
* Get the actual number of bitflips.
* Allow forced on-die ECC.
* Support 8/512 on-die ECC.
* Fix on-die ECC detection logic.
- Hynix:
* Fix decoding the OOB size on H27UCG8T2BTR.
* Use ->exec_op() in hynix_nand_reg_write_op().
"
|
|
Holding uts_sem as a writer while accessing userspace memory allows a
namespace admin to stall all processes that attempt to take uts_sem.
Instead, move data through stack buffers and don't access userspace memory
while uts_sem is held.
Cc: stable@vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
The old code would hold the userns_state_mutex indefinitely if
memdup_user_nul stalled due to e.g. a userfault region. Prevent that by
moving the memdup_user_nul in front of the mutex_lock().
Note: This changes the error precedence of invalid buf/count/*ppos vs
map already written / capabilities missing.
Fixes: 22d917d80e84 ("userns: Rework the user_namespace adding uid/gid...")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Christian Brauner <christian@brauner.io>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
The code in cap_inode_getsecurity(), introduced by commit 8db6c34f1dbc
("Introduce v3 namespaced file capabilities"), should use
d_find_any_alias() instead of d_find_alias() do handle unhashed dentry
correctly. This is needed, for example, if execveat() is called with an
open but unlinked overlayfs file, because overlayfs unhashes dentry on
unlink.
This is a regression of real life application, first reported at
https://www.spinics.net/lists/linux-unionfs/msg05363.html
Below reproducer and setup can reproduce the case.
const char* exec="echo";
const char *newargv[] = { "echo", "hello", NULL};
const char *newenviron[] = { NULL };
int fd, err;
fd = open(exec, O_PATH);
unlink(exec);
err = syscall(322/*SYS_execveat*/, fd, "", newargv, newenviron,
AT_EMPTY_PATH);
if(err<0)
fprintf(stderr, "execveat: %s\n", strerror(errno));
gcc compile into ~/test/a.out
mount -t overlay -orw,lowerdir=/mnt/l,upperdir=/mnt/u,workdir=/mnt/w
none /mnt/m
cd /mnt/m
cp /bin/echo .
~/test/a.out
Expected result:
hello
Actually result:
execveat: Invalid argument
dmesg:
Invalid argument reading file caps for /dev/fd/3
The 2nd reproducer and setup emulates similar case but for
regular filesystem:
const char* exec="echo";
int fd, err;
char buf[256];
fd = open(exec, O_RDONLY);
unlink(exec);
err = fgetxattr(fd, "security.capability", buf, 256);
if(err<0)
fprintf(stderr, "fgetxattr: %s\n", strerror(errno));
gcc compile into ~/test_fgetxattr
cd /tmp
cp /bin/echo .
~/test_fgetxattr
Result:
fgetxattr: Invalid argument
On regular filesystem, for example, ext4 read xattr from
disk and return to execveat(), will not trigger this issue, however,
the overlay attr handler pass real dentry to vfs_getxattr() will.
This reproducer calls fgetxattr() with an unlinked fd, involkes
vfs_getxattr() then reproduced the case that d_find_alias() in
cap_inode_getsecurity() can't find the unlinked dentry.
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities")
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Eddie Horng <eddie.horng@mediatek.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
If zram supports writeback feature, it's no longer a
BD_CAP_SYNCHRONOUS_IO device beause zram does asynchronous IO operations
for incompressible pages.
Do not pretend to be synchronous IO device. It makes the system very
sluggish due to waiting for IO completion from upper layers.
Furthermore, it causes a user-after-free problem because swap thinks the
opearion is done when the IO functions returns so it can free the page
(e.g., lock_page_or_retry and goto out_release in do_swap_page) but in
fact, IO is asynchronous so the driver could access a just freed page
afterward.
This patch fixes the problem.
BUG: Bad page state in process qemu-system-x86 pfn:3dfab21
page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1
flags: 0x17fffc000000008(uptodate)
raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
bad because of flags: 0x8(uptodate)
CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G B 4.18.0-rc5+ #1
Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
Call Trace:
dump_stack+0x5c/0x7b
bad_page+0xba/0x120
get_page_from_freelist+0x1016/0x1250
__alloc_pages_nodemask+0xfa/0x250
alloc_pages_vma+0x7c/0x1c0
do_swap_page+0x347/0x920
__handle_mm_fault+0x7b4/0x1110
handle_mm_fault+0xfc/0x1f0
__get_user_pages+0x12f/0x690
get_user_pages_unlocked+0x148/0x1f0
__gfn_to_pfn_memslot+0xff/0x3c0 [kvm]
try_async_pf+0x87/0x230 [kvm]
tdp_page_fault+0x132/0x290 [kvm]
kvm_mmu_page_fault+0x74/0x570 [kvm]
kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm]
kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
do_vfs_ioctl+0xa2/0x630
ksys_ioctl+0x70/0x80
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/
Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org
[minchan@kernel.org: fix changelog, add comment]
Link: https://lore.kernel.org/lkml/0516ae2d-b0fd-92c5-aa92-112ba7bd32fc@contabo.de/
Link: http://lkml.kernel.org/r/20180802051112.86174-1-minchan@kernel.org
Link: http://lkml.kernel.org/r/20180805233722.217347-1-minchan@kernel.org
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Tino Lehnig <tino.lehnig@contabo.de>
Tested-by: Tino Lehnig <tino.lehnig@contabo.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ioremap_prot() can return NULL which could lead to an oops.
Link: http://lkml.kernel.org/r/1533195441-58594-1-git-send-email-chenjie6@huawei.com
Signed-off-by: chen jie <chenjie6@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: chenjie <chenjie6@huawei.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With gcc-8 fsanitize=null become very noisy. GCC started to complain
about things like &a->b, where 'a' is NULL pointer. There is no NULL
dereference, we just calculate address to struct member. It's
technically undefined behavior so UBSAN is correct to report it. But as
long as there is no real NULL-dereference, I think, we should be fine.
-fno-delete-null-pointer-checks compiler flag should protect us from any
consequences. So let's just no use -fsanitize=null as it's not useful
for us. If there is a real NULL-deref we will see crash. Even if
userspace mapped something at NULL (root can do this), with things like
SMAP should catch the issue.
Link: http://lkml.kernel.org/r/20180802153209.813-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This entry was created with my personal e-mail address. Update this entry
to my open-source kernel.org account.
Link: http://lkml.kernel.org/r/20180806143904.4716-4-kieran.bingham@ideasonboard.com
Signed-off-by: Kieran Bingham <kbingham@kernel.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch fixes following smatch warnings:
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:2826 bnxt_fill_coredump_seg_hdr() error: strcpy() '"sEgM"' too large for 'seg_hdr->signature' (5 vs 4)
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:2858 bnxt_fill_coredump_record() error: strcpy() '"cOrE"' too large for 'record->signature' (5 vs 4)
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:2879 bnxt_fill_coredump_record() error: strcpy() 'utsname()->sysname' too large for 'record->os_name' (65 vs 32)
Fixes: 6c5657d085ae ("bnxt_en: Add support for ethtool get dump.")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since at least the beginning of the git era we've declared our TLB
exception handling functions inconsistently. They're actually functions,
but we declare them as arrays of u32 where each u32 is an encoded
instruction. This has always been the case for arch/mips/mm/tlbex.c, and
has also been true for arch/mips/kernel/traps.c since commit
86a1708a9d54 ("MIPS: Make tlb exception handler definitions and
declarations match.") which aimed for consistency but did so by
consistently making the our C code inconsistent with our assembly.
This is all usually harmless, but when using GCC 7 or newer to build a
kernel targeting microMIPS (ie. CONFIG_CPU_MICROMIPS=y) it becomes
problematic. With microMIPS bit 0 of the program counter indicates the
ISA mode. When bit 0 is zero instructions are decoded using the standard
MIPS32 or MIPS64 ISA. When bit 0 is one instructions are decoded using
microMIPS. This means that function pointers become odd - their least
significant bit is one for microMIPS code. We work around this in cases
where we need to access code using loads & stores with our
msk_isa16_mode() macro which simply clears bit 0 of the value it is
given:
#define msk_isa16_mode(x) ((x) & ~0x1)
For example we do this for our TLB load handler in
build_r4000_tlb_load_handler():
u32 *p = (u32 *)msk_isa16_mode((ulong)handle_tlbl);
We then write code to p, expecting it to be suitably aligned (our LEAF
macro aligns functions on 4 byte boundaries, so (ulong)handle_tlbl will
give a value one greater than a multiple of 4 - ie. the start of a
function on a 4 byte boundary, with the ISA mode bit 0 set).
This worked fine up to GCC 6, but GCC 7 & onwards is smart enough to
presume that handle_tlbl which we declared as an array of u32s must be
aligned sufficiently that bit 0 of its address will never be set, and as
a result optimize out msk_isa16_mode(). This leads to p having an
address with bit 0 set, and when we go on to attempt to store code at
that address we take an address error exception due to the unaligned
memory access.
This leads to an exception prior to the kernel having configured its own
exception handlers, so we jump to whatever handlers the bootloader
configured. In the case of QEMU this results in a silent hang, since it
has no useful general exception vector.
Fix this by consistently declaring our TLB-related functions as
functions. For handle_tlbl(), handle_tlbs() & handle_tlbm() we do this
in asm/tlbex.h & we make use of the existing declaration of
tlbmiss_handler_setup_pgd() in asm/mmu_context.h. Our TLB handler
generation code in arch/mips/mm/tlbex.c is adjusted to deal with these
definitions, in most cases simply by casting the function pointers to
u32 pointers.
This allows us to include asm/mmu_context.h in arch/mips/mm/tlbex.c to
get the definitions of tlbmiss_handler_setup_pgd & pgd_current, removing
some needless duplication. Consistently using msk_isa16_mode() on
function pointers means we no longer need the
tlbmiss_handler_setup_pgd_start symbol so that is removed entirely.
Now that we're declaring our functions as functions GCC stops optimizing
out msk_isa16_mode() & a microMIPS kernel built with either GCC 7.3.0 or
8.1.0 boots successfully.
Signed-off-by: Paul Burton <paul.burton@mips.com>
|
|
We export tlbmiss_handler_setup_pgd in arch/mips/mm/tlbex.c close to a
declaration of it, rather than close to its definition as is standard.
We've supported exporting symbols in assembly code since commit
22823ab419d8 ("EXPORT_SYMBOL() for asm"), so move the export to follow
the function's (stub) definition.
Signed-off-by: Paul Burton <paul.burton@mips.com>
|
|
Martin KaFai Lau says:
====================
This series introduces a new map type "BPF_MAP_TYPE_REUSEPORT_SOCKARRAY"
and a new prog type BPF_PROG_TYPE_SK_REUSEPORT.
Here is a snippet from a commit message:
"To unleash the full potential of a bpf prog, it is essential for the
userspace to be capable of directly setting up a bpf map which can then
be consumed by the bpf prog to make decision. In this case, decide which
SO_REUSEPORT sk to serve the incoming request.
By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control
and visibility on where a SO_REUSEPORT sk should be located in a bpf map.
The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that
the bpf prog can directly select a sk from the bpf map. That will
raise the programmability of the bpf prog attached to a reuseport
group (a group of sk serving the same IP:PORT).
For example, in UDP, the bpf prog can peek into the payload (e.g.
through the "data" pointer introduced in the later patch) to learn
the application level's connection information and then decide which sk
to pick from a bpf map. The userspace can tightly couple the sk's location
in a bpf map with the application logic in generating the UDP payload's
connection information. This connection info contact/API stays within the
userspace.
Also, when used with map-in-map, the userspace can switch the
old-server-process's inner map to a new-server-process's inner map
in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)".
The bpf prog will then direct incoming requests to the new process instead
of the old process. The old process can finish draining the pending
requests (e.g. by "accept()") before closing the old-fds. [Note that
deleting a fd from a bpf map does not necessary mean the fd is closed]"
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch add tests for the new BPF_PROG_TYPE_SK_REUSEPORT.
The tests cover:
- IPv4/IPv6 + TCP/UDP
- TCP syncookie
- TCP fastopen
- Cases when the bpf_sk_select_reuseport() returning errors
- Cases when the bpf prog returns SK_DROP
- Values from sk_reuseport_md
- outer_map => reuseport_array
The test depends on
commit 3eee1f75f2b9 ("bpf: fix bpf_skb_load_bytes_relative pkt length check")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch adds tests for the new BPF_MAP_TYPE_REUSEPORT_SOCKARRAY.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|