Age | Commit message (Collapse) | Author |
|
This patch adds no functional change and cleans up some functions
that the following patches touch around so that we make them tidy
and easy to review/revert. The change is mainly to keep reverse
christmas tree order.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When built with Control Flow Integrity, function prototypes between
caller and function declaration must match. These mismatches are visible
at compile time with the new -Wcast-function-type-strict in Clang[1].
Fix a total of 73 warnings like these:
drivers/net/wireless/intersil/orinoco/wext.c:1379:27: warning: cast from 'int (*)(struct net_device *, struct iw_request_info *, struct iw_param *, char *)' to 'iw_handler' (aka 'int (*)(struct net_device *, struct iw_request_info *, union iwreq_data *, char *)') converts to incompatible function type [-Wcast-function-type-strict]
IW_HANDLER(SIOCGIWPOWER, (iw_handler)orinoco_ioctl_getpower),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../net/wireless/wext-compat.c:1607:33: warning: cast from 'int (*)(struct net_device *, struct iw_request_info *, struct iw_point *, char *)' to 'iw_handler' (aka 'int (*)(struct net_device *, struct iw_request_info *, union iwreq_data *, char *)') converts to incompatible function type [-Wcast-function-type-strict]
[IW_IOCTL_IDX(SIOCSIWGENIE)] = (iw_handler) cfg80211_wext_siwgenie,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/net/wireless/intersil/orinoco/wext.c:1390:27: error: incompatible function pointer types initializing 'const iw_handler' (aka 'int (*const)(struct net_device *, struct iw_request_info *, union iwreq_data *, char *)') with an expression of type 'int (struct net_device *, struct iw_request_info *, struct iw_param *, char *)' [-Wincompatible-function-pointer-types]
IW_HANDLER(SIOCGIWRETRY, cfg80211_wext_giwretry),
^~~~~~~~~~~~~~~~~~~~~~
The cfg80211 Wireless Extension handler callbacks (iw_handler) use a
union for the data argument. Actually use the union and perform explicit
member selection in the function body instead of having a function
prototype mismatch. There are no resulting binary differences
before/after changes.
These changes were made partly manually and partly with the help of
Coccinelle.
Link: https://github.com/KSPP/linux/issues/234
Link: https://reviews.llvm.org/D134831 [1]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/a68822bf8dd587988131bb6a295280cb4293f05d.1667934775.git.gustavoars@kernel.org
|
|
Fix network address validation on entry to uapi functions such as connect()
for AF_RXRPC. The check for address compatibility with the transport
socket isn't correct and allows an AF_INET6 address to be given to an
AF_INET socket, resulting in an oops now that rxrpc is calling
udp_sendmsg() directly.
Sample program:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <linux/rxrpc.h>
static unsigned char ctrl[256] =
"\x18\x00\x00\x00\x00\x00\x00\x00\x10\x01\x00\x00\x01";
int main(void)
{
struct sockaddr_rxrpc srx = {
.srx_family = AF_RXRPC,
.transport_type = SOCK_DGRAM,
.transport_len = 28,
.transport.sin6.sin6_family = AF_INET6,
};
struct mmsghdr vec = {
.msg_hdr.msg_control = ctrl,
.msg_hdr.msg_controllen = 0x18,
};
int s;
s = socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
if (s < 0) {
perror("socket");
exit(1);
}
if (connect(s, (struct sockaddr *)&srx, sizeof(srx)) < 0) {
perror("connect");
exit(1);
}
if (sendmmsg(s, &vec, 1, MSG_NOSIGNAL | MSG_MORE) < 0) {
perror("sendmmsg");
exit(1);
}
return 0;
}
If working properly, connect() should fail with EAFNOSUPPORT.
Fixes: ed472b0c8783 ("rxrpc: Call udp_sendmsg() directly")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
If rxrpc sees an IPv6 address, it assumes it can call udpv6_sendmsg() on it
- even if it got it on an IPv4 socket. Fix do_udp_sendmsg() to give an
error in such a case.
general protection fault, probably for non-canonical address
0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
...
RIP: 0010:ipv6_addr_v4mapped include/net/ipv6.h:749 [inline]
RIP: 0010:udpv6_sendmsg+0xd0a/0x2c70 net/ipv6/udp.c:1361
...
Call Trace:
do_udp_sendmsg net/rxrpc/output.c:27 [inline]
do_udp_sendmsg net/rxrpc/output.c:21 [inline]
rxrpc_send_abort_packet+0x73b/0x860 net/rxrpc/output.c:367
rxrpc_release_calls_on_socket+0x211/0x300 net/rxrpc/call_object.c:595
rxrpc_release_sock net/rxrpc/af_rxrpc.c:886 [inline]
rxrpc_release+0x263/0x5a0 net/rxrpc/af_rxrpc.c:917
__sock_release+0xcd/0x280 net/socket.c:650
sock_close+0x18/0x20 net/socket.c:1365
__fput+0x27c/0xa90 fs/file_table.c:320
task_work_run+0x16b/0x270 kernel/task_work.c:179
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xb35/0x2a20 kernel/exit.c:820
do_group_exit+0xd0/0x2a0 kernel/exit.c:950
__do_sys_exit_group kernel/exit.c:961 [inline]
__se_sys_exit_group kernel/exit.c:959 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:959
Fixes: ed472b0c8783 ("rxrpc: Call udp_sendmsg() directly")
Reported-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
In the initial commit dc452a471dba ("net: dsa: introduce tagger-owned
storage for private and shared data"), we had a call to
tag_ops->disconnect(dst) issued from dsa_tree_free(), which is called at
tree teardown time.
There were problems with connecting to a switch tree as a whole, so this
got reworked to connecting to individual switches within the tree. In
this process, tag_ops->disconnect(ds) was made to be called only from
switch.c (cross-chip notifiers emitted as a result of dynamic tag proto
changes), but the normal driver teardown code path wasn't replaced with
anything.
Solve this problem by adding a function that does the opposite of
dsa_switch_setup_tag_protocol(), which is called from the equivalent
spot in dsa_switch_teardown(). The positioning here also ensures that we
won't have any use-after-free in tagging protocol (*rcv) ops, since the
teardown sequence is as follows:
dsa_tree_teardown
-> dsa_tree_teardown_master
-> dsa_master_teardown
-> unsets master->dsa_ptr, making no further packets match the
ETH_P_XDSA packet type handler
-> dsa_tree_teardown_ports
-> dsa_port_teardown
-> dsa_slave_destroy
-> unregisters DSA net devices, there is even a synchronize_net()
in unregister_netdevice_many()
-> dsa_tree_teardown_switches
-> dsa_switch_teardown
-> dsa_switch_teardown_tag_protocol
-> finally frees the tagger-owned storage
Fixes: 7f2973149c22 ("net: dsa: make tagging protocols connect to individual switches from a tree")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221114143551.1906361-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As of now, no DSA driver uses a custom link mode validation procedure
anymore. So remove this DSA operation and let phylink determine what is
supported based on config->mac_capabilities (if provided by the driver).
Leave a comment why we left the code that we did, and that there is more
work to do.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
1) Fix sparse warning in the new nft_inner expression, reported
by Jakub Kicinski.
2) Incorrect vlan header check in nft_inner, from Peng Wu.
3) Two patches to pass reset boolean to expression dump operation,
in preparation for allowing to reset stateful expressions in rules.
This adds a new NFT_MSG_GETRULE_RESET command. From Phil Sutter.
4) Inconsistent indentation in nft_fib, from Jiapeng Chong.
5) Speed up siphash calculation in conntrack, from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: conntrack: use siphash_4u64
netfilter: rpfilter/fib: clean up some inconsistent indenting
netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET
netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters
netfilter: nft_inner: fix return value check in nft_inner_parse_l2l3()
netfilter: nft_payload: use __be16 to store gre version
====================
Link: https://lore.kernel.org/r/20221115095922.139954-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
x25_lapb_receive_frame() using skb_copy() to get a private copy of
skb, the new skb should be freed in the undersized/fragmented skb
error handling path. Otherwise there is a memory leak.
Fixes: cb101ed2c3c7 ("x25: Handle undersized/fragmented skbs")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20221114110519.514538-1-weiyongjun@huaweicloud.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a metadata dst is present with the type METADATA_HW_PORT_MUX on a dsa cpu
port netdev, assume that it carries the port number and that there is no DSA
tag present in the skb data.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For queueing packets in XDP we want to add a new redirect map type with
support for 64-bit indexes. To prepare fore this, expand the width of the
'key' argument to the bpf_redirect_map() helper. Since BPF registers are
always 64-bit, this should be safe to do after the fact.
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221108140601.149971-3-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch fixes a frame size warning, reported by kernel test robot.
>> net/dcb/dcbnl.c:1230:1: warning: the frame size of 1244 bytes is
>> larger than 1024 bytes [-Wframe-larger-than=]
The getapptrust part of dcbnl_ieee_fill is moved to a separate function,
and the selector array is now dynamically allocated, instead of stack
allocated.
Tested on microchip sparx5 driver.
Fixes: 6182d5875c33 ("net: dcb: add new apptrust attribute")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://lore.kernel.org/r/20221114092950.2490451-1-daniel.machon@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The bridge driver can offload VLANs to the underlying hardware either
via switchdev or the 8021q driver. When the former is used, the VLAN is
marked in the bridge driver with the 'BR_VLFLAG_ADDED_BY_SWITCHDEV'
private flag.
To avoid the memory leaks mentioned in the cited commit, the bridge
driver will try to delete a VLAN via the 8021q driver if the VLAN is not
marked with the previously mentioned flag.
When the VLAN protocol of the bridge changes, switchdev drivers are
notified via the 'SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL' attribute, but
the 8021q driver is also called to add the existing VLANs with the new
protocol and delete them with the old protocol.
In case the VLANs were offloaded via switchdev, the above behavior is
both redundant and buggy. Redundant because the VLANs are already
programmed in hardware and drivers that support VLAN protocol change
(currently only mlx5) change the protocol upon the switchdev attribute
notification. Buggy because the 8021q driver is called despite these
VLANs being marked with 'BR_VLFLAG_ADDED_BY_SWITCHDEV'. This leads to
memory leaks [1] when the VLANs are deleted.
Fix by not calling the 8021q driver for VLANs that were already
programmed via switchdev.
[1]
unreferenced object 0xffff8881f6771200 (size 256):
comm "ip", pid 446855, jiffies 4298238841 (age 55.240s)
hex dump (first 32 bytes):
00 00 7f 0e 83 88 ff ff 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000012819ac>] vlan_vid_add+0x437/0x750
[<00000000f2281fad>] __br_vlan_set_proto+0x289/0x920
[<000000000632b56f>] br_changelink+0x3d6/0x13f0
[<0000000089d25f04>] __rtnl_newlink+0x8ae/0x14c0
[<00000000f6276baf>] rtnl_newlink+0x5f/0x90
[<00000000746dc902>] rtnetlink_rcv_msg+0x336/0xa00
[<000000001c2241c0>] netlink_rcv_skb+0x11d/0x340
[<0000000010588814>] netlink_unicast+0x438/0x710
[<00000000e1a4cd5c>] netlink_sendmsg+0x788/0xc40
[<00000000e8992d4e>] sock_sendmsg+0xb0/0xe0
[<00000000621b8f91>] ____sys_sendmsg+0x4ff/0x6d0
[<000000000ea26996>] ___sys_sendmsg+0x12e/0x1b0
[<00000000684f7e25>] __sys_sendmsg+0xab/0x130
[<000000004538b104>] do_syscall_64+0x3d/0x90
[<0000000091ed9678>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: 279737939a81 ("net: bridge: Fix VLANs memory leak")
Reported-by: Vlad Buslov <vladbu@nvidia.com>
Tested-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20221114084509.860831-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
sk->sk_receive_queue is protected by skb queue lock, but for KCM
sockets its RX path takes mux->rx_lock to protect more than just
skb queue. However, kcm_recvmsg() still only grabs the skb queue
lock, so race conditions still exist.
We can teach kcm_recvmsg() to grab mux->rx_lock too but this would
introduce a potential performance regression as struct kcm_mux can
be shared by multiple KCM sockets.
So we have to enforce skb queue lock in requeue_rx_msgs() and handle
skb peek case carefully in kcm_wait_data(). Fortunately,
skb_recv_datagram() already handles it nicely and is widely used by
other sockets, we can just switch to skb_recv_datagram() after
getting rid of the unnecessary sock lock in kcm_recvmsg() and
kcm_splice_read(). Side note: SOCK_DONE is not used by KCM sockets,
so it is safe to get rid of this check too.
I ran the original syzbot reproducer for 30 min without seeing any
issue.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+278279efdd2730dd14bf@syzkaller.appspotmail.com
Reported-by: shaozhengchao <shaozhengchao@huawei.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20221114005119.597905-1-xiyou.wangcong@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This function is used for every packet, siphash_4u64 is noticeably faster
than using local buffer + siphash:
Before:
1.23% kpktgend_0 [kernel.vmlinux] [k] __siphash_unaligned
0.14% kpktgend_0 [nf_conntrack] [k] hash_conntrack_raw
After:
0.79% kpktgend_0 [kernel.vmlinux] [k] siphash_4u64
0.15% kpktgend_0 [nf_conntrack] [k] hash_conntrack_raw
In the pktgen test this gives about ~2.4% performance improvement.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
No functional modification involved.
net/ipv4/netfilter/nft_fib_ipv4.c:141 nft_fib4_eval() warn: inconsistent indenting.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2733
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Analogous to NFT_MSG_GETOBJ_RESET, but for rules: Reset stateful
expressions like counters or quotas. The latter two are the only
consumers, adjust their 'dump' callbacks to respect the parameter
introduced earlier.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add a 'reset' flag just like with nft_object_ops::dump. This will be
useful to reset "anonymous stateful objects", e.g. simple rule counters.
No functional change intended.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Instead of having to pass multiple arguments that describe the register,
pass the bpf_reg_state into the btf_struct_access callback. Currently,
all call sites simply reuse the btf and btf_id of the reg they want to
check the access of. The only exception to this pattern is the callsite
in check_ptr_to_map_access, hence for that case create a dummy reg to
simulate PTR_TO_BTF_ID access.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The SYN flood message prints the listening port number, but with many
processes bound to the same port on different IPs, it's impossible to
tell which socket is the problem.
Add the listen IP address to the SYN flood message.
For IPv6 use "[IP]:port" as per RFC-5952 and to provide ease of
copy-paste to "ss" filters. For IPv4 use "IP:port" to match.
Each protcol's "any" address and a host address now look like:
Possible SYN flooding on port 0.0.0.0:9001.
Possible SYN flooding on port 127.0.0.1:9001.
Possible SYN flooding on port [::]:9001.
Possible SYN flooding on port [fc00::1]:9001.
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/4fedab7ce54a389aeadbdc639f6b4f4988e9d2d7.1668386107.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are multi-generational drivers like mv88e6xxx which have code like
this:
int mv88e6xxx_port_hwtstamp_get(struct dsa_switch *ds, int port,
struct ifreq *ifr)
{
if (!chip->info->ptp_support)
return -EOPNOTSUPP;
...
}
DSA wants to deny PTP timestamping on the master if the switch supports
timestamping too. However it currently relies on the presence of the
port_hwtstamp_get() callback to determine PTP capability, and this
clearly does not work in that case (method is present but returns
-EOPNOTSUPP).
We should not deny PTP on the DSA master for those switches which truly
do not support hardware timestamping.
Create a dsa_port_supports_hwtstamp() method which actually probes for
support by calling port_hwtstamp_get() and seeing whether that returned
-EOPNOTSUPP or not.
Fixes: f685e609a301 ("net: dsa: Deny PTP on master if switch supports it")
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20221110124345.3901389-1-festevam@gmail.com/
Reported-by: Fabio Estevam <festevam@gmail.com>
Reported-by: Steffen Bätz <steffen@innosonix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds a new flow_rule_match_arp function that allows drivers
to be able to dissect ARP frames.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The value of 'st->state' has been verified as "TCP_SEQ_STATE_LISTENING",
it's unnecessary to assign TCP_SEQ_STATE_LISTENING to it, so we can remove it.
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When connecting to client timeout, disconnect client for twice in
chnl_net_open(). Remove one. Compile tested only.
Fixes: 2aa40aef9deb ("caif: Use link layer MTU instead of fixed MTU")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix rxrpc_encap_err_rcv() to make the call to ipv6_icmp_error conditional
on IPV6 support being enabled.
Fixes: b6c66c4324e7 ("rxrpc: Use the core ICMP/ICMP6 parsers")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
|
|
Use try_cmpxchg() (instead of cmpxchg()) in a more readable way.
oval = smp_load_acquire(&sk->sk_tsq_flags);
do {
...
} while (!try_cmpxchg(&sk->sk_tsq_flags, &oval, nval));
Reduce indentation level.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221110190239.3531280-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
try_cmpxchg() is slighly more efficient (at least on x86),
and smp_load_acquire(&sk->sk_tsq_flags) could avoid a KCSAN report.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221110174829.3403442-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No changes in generated code.
Reported-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20221110085422.521059-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We kept getting initial patches from new contributors to remove a
duplicate 'the' (since grammar checking scripts flag it), but submitters
never followed up after code review.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use '(struct sock *)msk' to get 'sk' from 'msk' in a more direct way
instead of using '&msk->sk.icsk_inet.sk'.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The function mptcp_subflow_process_delegated() uses the input ssk first,
while __mptcp_check_push() invokes the packet scheduler first.
So this patch adds a new parameter named 'first' for the function
__mptcp_subflow_push_pending() to deal with these two cases separately.
With this change, the code that invokes the packet scheduler in the
function __mptcp_check_push() can be removed, and replaced by invoking
__mptcp_subflow_push_pending() directly.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use msk instead of mptcp_sk(sk) in the functions where the variable
"msk = mptcp_sk(sk)" has been defined.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'aead_req' and 'aead_send' is allocated but not freed in default switch
case. This commit fixes the potential memory leak by freeing them under
the situation.
Note that the default cases here should never be reached as they'd
mean we allowed offloading an unsupported algorithm.
Fixes: ea7a9d88ba21 ("net/tls: Use cipher sizes structs")
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20221110090329.2036382-1-liaoyu15@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:
====================
bpf-next 2022-11-11
We've added 49 non-merge commits during the last 9 day(s) which contain
a total of 68 files changed, 3592 insertions(+), 1371 deletions(-).
The main changes are:
1) Veristat tool improvements to support custom filtering, sorting, and replay
of results, from Andrii Nakryiko.
2) BPF verifier precision tracking fixes and improvements,
from Andrii Nakryiko.
3) Lots of new BPF documentation for various BPF maps, from Dave Tucker,
Donald Hunter, Maryam Tahhan, Bagas Sanjaya.
4) BTF dedup improvements and libbpf's hashmap interface clean ups, from
Eduard Zingerman.
5) Fix veth driver panic if XDP program is attached before veth_open, from
John Fastabend.
6) BPF verifier clean ups and fixes in preparation for follow up features,
from Kumar Kartikeya Dwivedi.
7) Add access to hwtstamp field from BPF sockops programs,
from Martin KaFai Lau.
8) Various fixes for BPF selftests and samples, from Artem Savkov,
Domenico Cerasuolo, Kang Minchul, Rong Tao, Yang Jihong.
9) Fix redirection to tunneling device logic, preventing skb->len == 0, from
Stanislav Fomichev.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits)
selftests/bpf: fix veristat's singular file-or-prog filter
selftests/bpf: Test skops->skb_hwtstamp
selftests/bpf: Fix incorrect ASSERT in the tcp_hdr_options test
bpf: Add hwtstamp field for the sockops prog
selftests/bpf: Fix xdp_synproxy compilation failure in 32-bit arch
bpf, docs: Document BPF_MAP_TYPE_ARRAY
docs/bpf: Document BPF map types QUEUE and STACK
docs/bpf: Document BPF ARRAY_OF_MAPS and HASH_OF_MAPS
docs/bpf: Document BPF_MAP_TYPE_CPUMAP map
docs/bpf: Document BPF_MAP_TYPE_LPM_TRIE map
libbpf: Hashmap.h update to fix build issues using LLVM14
bpf: veth driver panics when xdp prog attached before veth_open
selftests: Fix test group SKIPPED result
selftests/bpf: Tests for btf_dedup_resolve_fwds
libbpf: Resolve unambigous forward declarations
libbpf: Hashmap interface update to allow both long and void* keys/values
samples/bpf: Fix sockex3 error: Missing BPF prog type
selftests/bpf: Fix u32 variable compared with less than zero
Documentation: bpf: Escape underscore in BPF type name prefix
selftests/bpf: Use consistent build-id type for liburandom_read.so
...
====================
Link: https://lore.kernel.org/r/20221111233733.1088228-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Andrii Nakryiko says:
====================
bpf 2022-11-11
We've added 11 non-merge commits during the last 8 day(s) which contain
a total of 11 files changed, 83 insertions(+), 74 deletions(-).
The main changes are:
1) Fix strncpy_from_kernel_nofault() to prevent out-of-bounds writes,
from Alban Crequy.
2) Fix for bpf_prog_test_run_skb() to prevent wrong alignment,
from Baisong Zhong.
3) Switch BPF_DISPATCHER to static_call() instead of ftrace infra, with
a small build fix on top, from Peter Zijlstra and Nathan Chancellor.
4) Fix memory leak in BPF verifier in some error cases, from Wang Yufen.
5) 32-bit compilation error fixes for BPF selftests, from Pu Lehui and
Yang Jihong.
6) Ensure even distribution of per-CPU free list elements, from Xu Kuohai.
7) Fix copy_map_value() to track special zeroed out areas properly,
from Xu Kuohai.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix offset calculation error in __copy_map_value and zero_map_value
bpf: Initialize same number of free nodes for each pcpu_freelist
selftests: bpf: Add a test when bpf_probe_read_kernel_str() returns EFAULT
maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault()
selftests/bpf: Fix test_progs compilation failure in 32-bit arch
selftests/bpf: Fix casting error when cross-compiling test_verifier for 32-bit platforms
bpf: Fix memory leaks in __check_func_call
bpf: Add explicit cast to 'void *' for __BPF_DISPATCHER_UPDATE()
bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")
bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb()
====================
Link: https://lore.kernel.org/r/20221111231624.938829-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We can remove a conditional test in gro_list_prepare()
by comparing vlan_all fields of the two skbs.
Notes:
While comparing the vlan_proto is not strictly needed,
because part of the following compare_ether_header() call,
using 32bit word is actually faster than using 16bit values.
napi_reuse_skb() makes sure to clear skb->vlan_all,
as it already calls __vlan_hwaccel_clear_tag()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
skb->vlan_present seems redundant.
We can instead derive it from this boolean expression:
vlan_present = skb->vlan_proto != 0 || skb->vlan_tci != 0
Add a new union, to access both fields in a single load/store
when possible.
union {
u32 vlan_all;
struct {
__be16 vlan_proto;
__u16 vlan_tci;
};
};
This allows following patch to remove a conditional test in GRO stack.
Note:
We move remcsum_offload to keep TC_AT_INGRESS_MASK
and SKB_MONO_DELIVERY_TIME_MASK unchanged.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bpf-tc prog has already been able to access the
skb_hwtstamps(skb)->hwtstamp. This patch extends the same hwtstamp
access to the sockops prog.
In sockops, the skb is also available to the bpf prog during
the BPF_SOCK_OPS_PARSE_HDR_OPT_CB event. There is a use case
that the hwtstamp will be useful to the sockops prog to better
measure the one-way-delay when the sender has put the tx
timestamp in the tcp header option.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-2-martin.lau@linux.dev
|
|
drivers/net/can/pch_can.c
ae64438be192 ("can: dev: fix skb drop check")
1dd1b521be85 ("can: remove obsolete PCH CAN driver")
https://lore.kernel.org/all/20221110102509.1f7d63cc@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jonathan reports crashes when running net-next in Meta's fleet.
Stats collection uses ethtool -I which does a per-op policy dump
to check if stats are supported. We don't initialize the dumpit
information if doit succeeds due to evaluation short-circuiting.
The crash may look like this:
BUG: kernel NULL pointer dereference, address: 0000000000000cc0
RIP: 0010:netlink_policy_dump_add_policy+0x174/0x2a0
ctrl_dumppolicy_start+0x19f/0x2f0
genl_start+0xe7/0x140
Or we may trigger a warning:
WARNING: CPU: 1 PID: 785 at net/netlink/policy.c:87 netlink_policy_dump_get_policy_idx+0x79/0x80
RIP: 0010:netlink_policy_dump_get_policy_idx+0x79/0x80
ctrl_dumppolicy_put_op+0x214/0x360
depending on what garbage we pick up from the stack.
Reported-by: Jonathan Lemon <bsd@meta.com>
Fixes: 26588edbef60 ("genetlink: support split policies in ctrl_dumppolicy_put_op()")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221109183254.554051-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a devlink port is unregistered, its type is expected to be unset or
otherwise a WARNING is generated [1]. This was supposed to be handled by
cited commit by clearing the type upon 'NETDEV_PRE_UNINIT'.
The assumption was that no other events can be generated for the netdev
after this event, but this proved to be wrong. After the event is
generated, netdev_wait_allrefs_any() will rebroadcast a
'NETDEV_UNREGISTER' until the netdev's reference count drops to 1. This
causes devlink to set the port type back to Ethernet.
Fix by only setting and clearing the port type upon 'NETDEV_POST_INIT'
and 'NETDEV_PRE_UNINIT', respectively. For all other events, preserve
the port type.
[1]
WARNING: CPU: 0 PID: 11 at net/core/devlink.c:9998 devl_port_unregister+0x2f6/0x390 net/core/devlink.c:9998
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u4:1 Not tainted 6.1.0-rc3-next-20221107-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: netns cleanup_net
RIP: 0010:devl_port_unregister+0x2f6/0x390 net/core/devlink.c:9998
[...]
Call Trace:
<TASK>
__nsim_dev_port_del+0x1bb/0x240 drivers/net/netdevsim/dev.c:1433
nsim_dev_port_del_all drivers/net/netdevsim/dev.c:1443 [inline]
nsim_dev_reload_destroy+0x171/0x510 drivers/net/netdevsim/dev.c:1660
nsim_dev_reload_down+0x6b/0xd0 drivers/net/netdevsim/dev.c:968
devlink_reload+0x1c2/0x6b0 net/core/devlink.c:4501
devlink_pernet_pre_exit+0x104/0x1c0 net/core/devlink.c:12609
ops_pre_exit_list net/core/net_namespace.c:159 [inline]
cleanup_net+0x451/0xb10 net/core/net_namespace.c:594
process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
Fixes: 02a68a47eade ("net: devlink: track netdev with devlink_port assigned")
Reported-by: syzbot+85e47e1a08b3e159b159@syzkaller.appspotmail.com
Reported-by: syzbot+c2ca18f0fccdd1f09c66@syzkaller.appspotmail.com
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20221110085150.520800-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If ethtool_ops::get_drvinfo() callback isn't set,
ethtool_get_drvinfo() will fill the ethtool_drvinfo::name and
ethtool_drvinfo::bus_info fields.
However, if the driver provides the callback function, those two
fields are not touched. This means that the driver has to fill these
itself.
Allow the driver to leave those two fields empty and populate them in
such case. This way, the driver can rely on the default values for the
name and the bus_info. If the driver provides values, do nothing.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20221108035754.2143-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After searching for a protocol handler in dev_gro_receive, checking for
failure is redundant. Skip the failure code after finding the
corresponding handler.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221108123320.GA59373@debian
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If mctp_neigh_init() return error, the routes resources should
be released in the error handling path. Otherwise some resources
leak.
Fixes: 4d8b9319282a ("mctp: Add neighbour implementation")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20221108095517.620115-1-weiyongjun@huaweicloud.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add packet traps for 802.1X operation. The "eapol" control trap is used
to trap EAPOL packets and is required for the correct operation of the
control plane. The "locked_port" drop trap can be enabled to gain
visibility into packets that were dropped by the device due to the
locked bridge port check.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reflect the 'BR_PORT_MAB' flag to device drivers so that:
* Drivers that support MAB could act upon the flag being toggled.
* Drivers that do not support MAB will prevent MAB from being enabled.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the bridge is offloaded to hardware, FDB entries are learned and
aged-out by the hardware. Some device drivers synchronize the hardware
and software FDBs by generating switchdev events towards the bridge.
When a port is locked, the hardware must not learn autonomously, as
otherwise any host will blindly gain authorization. Instead, the
hardware should generate events regarding hosts that are trying to gain
authorization and their MAC addresses should be notified by the device
driver as locked FDB entries towards the bridge driver.
Allow device drivers to notify the bridge driver about such entries by
extending the 'switchdev_notifier_fdb_info' structure with the 'locked'
bit. The bit can only be set by device drivers and not by the bridge
driver.
Prevent a locked entry from being installed if MAB is not enabled on the
bridge port.
If an entry already exists in the bridge driver, reject the locked entry
if the current entry does not have the "locked" flag set or if it points
to a different port. The same semantics are implemented in the software
data path.
Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, FDB entries that are notified to the bridge via
'SWITCHDEV_FDB_ADD_TO_BRIDGE' are always marked as offloaded. With MAB
enabled, this will no longer be universally true. Device drivers will
report locked FDB entries to the bridge to let it know that the
corresponding hosts required authorization, but it does not mean that
these entries are necessarily programmed in the underlying hardware.
Solve this by determining the offload indication based of the
'offloaded' bit in the FDB notification.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The notifier block tracking netdev changes in devlink is registered
during devlink_alloc() per-net, it is then unregistered
in devlink_free(). When devlink moves from net namespace to another one,
the notifier block needs to move along.
Fix this by adding forgotten call to move the block.
Reported-by: Ido Schimmel <idosch@idosch.org>
Fixes: 02a68a47eade ("net: devlink: track netdev with devlink_port assigned")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, net_dev() netdev notifier variant follows the netdev with
per-net notifier from namespace to namespace. This is implemented
by move_netdevice_notifiers_dev_net() helper.
For devlink it is needed to re-register per-net notifier during
devlink reload. Introduce a new helper called
move_netdevice_notifier_net() and share the unregister/register code
with existing move_netdevice_notifiers_dev_net() helper.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The return value from genl_op_iter_init() only tells us if
there are any policies but to begin the iteration (and therefore
load the first entry) we need to call genl_op_iter_next().
Note that it's safe to call genl_op_iter_next() on a family
with no ops, it will just return false.
This may lead to various crashes, a warning in
netlink_policy_dump_get_policy_idx() when policy is not found
or.. no problem at all if the kmalloc'ed memory happens to be
zeroed.
Fixes: b502b3185cd6 ("genetlink: use iterator in the op to policy map dumping")
Link: https://lore.kernel.org/r/20221108204128.330287-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|