summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-07-29mctp: Add neighbour implementationMatt Johnston
Add an initial neighbour table implementation, to be used in the route output path. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add netlink route managementMatt Johnston
This change adds RTM_GETROUTE, RTM_NEWROUTE & RTM_DELROUTE handlers, allowing management of the MCTP route table. Includes changes from Jeremy Kerr <jk@codeconstruct.com.au>. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add initial routing frameworkJeremy Kerr
Add a simple routing table, and a couple of route output handlers, and the mctp packet_type & handler. Includes changes from Matt Johnston <matt@codeconstruct.com.au>. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add device handling and netlink interfaceJeremy Kerr
This change adds the infrastructure for managing MCTP netdevices; we add a pointer to the AF_MCTP-specific data to struct netdevice, and hook up the rtnetlink operations for adding and removing addresses. Includes changes from Matt Johnston <matt@codeconstruct.com.au>. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add base socket/protocol definitionsJeremy Kerr
Add an empty socket implementation, plus initialisation/destruction handlers. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29mctp: Add MCTP baseJeremy Kerr
Add basic Kconfig, an initial (empty) af_mctp source object, and {AF,PF}_MCTP definitions, and the required definitions for a new protocol type. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29Bluetooth: skip invalid hci_sync_conn_complete_evtDesmond Cheong Zhi Xi
Syzbot reported a corrupted list in kobject_add_internal [1]. This happens when multiple HCI_EV_SYNC_CONN_COMPLETE event packets with status 0 are sent for the same HCI connection. This causes us to register the device more than once which corrupts the kset list. As this is forbidden behavior, we add a check for whether we're trying to process the same HCI_EV_SYNC_CONN_COMPLETE event multiple times for one connection. If that's the case, the event is invalid, so we report an error that the device is misbehaving, and ignore the packet. Link: https://syzkaller.appspot.com/bug?extid=66264bf2fd0476be7e6c [1] Reported-by: syzbot+66264bf2fd0476be7e6c@syzkaller.appspotmail.com Tested-by: syzbot+66264bf2fd0476be7e6c@syzkaller.appspotmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-07-29skbuff: allow 'slow_gro' for skb carring sock referencePaolo Abeni
This change leverages the infrastructure introduced by the previous patches to allow soft devices passing to the GRO engine owned skbs without impacting the fast-path. It's up to the GRO caller ensuring the slow_gro bit validity before invoking the GRO engine. The new helper skb_prepare_for_gro() is introduced for that goal. On slow_gro, skbs are aggregated only with equal sk. Additionally, skb truesize on GRO recycle and free is correctly updated so that sk wmem is not changed by the GRO processing. rfc-> v1: - fixed bad truesize on dev_gro_receive NAPI_FREE - use the existing state bit Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29net: optimize GRO for the common case.Paolo Abeni
After the previous patches, at GRO time, skb->slow_gro is usually 0, unless the packets comes from some H/W offload slowpath or tunnel. We can optimize the GRO code assuming !skb->slow_gro is likely. This remove multiple conditionals in the most common path, at the price of an additional one when we hit the above "slow-paths". Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29sk_buff: track extension status in slow_groPaolo Abeni
Similar to the previous one, but tracking the active_extensions field status. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29net: xfrm: fix shift-out-of-bouncePavel Skripkin
We need to check up->dirmask to avoid shift-out-of-bounce bug, since up->dirmask comes from userspace. Also, added XFRM_USERPOLICY_DIRMASK_MAX constant to uapi to inform user-space that up->dirmask has maximum possible value Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy") Reported-and-tested-by: syzbot+9cd5837a045bbee5b810@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-07-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2021-07-29 The following pull-request contains BPF updates for your *net* tree. We've added 9 non-merge commits during the last 14 day(s) which contain a total of 20 files changed, 446 insertions(+), 138 deletions(-). The main changes are: 1) Fix UBSAN out-of-bounds splat for showing XDP link fdinfo, from Lorenz Bauer. 2) Fix insufficient Spectre v4 mitigation in BPF runtime, from Daniel Borkmann, Piotr Krysiuk and Benedict Schlueter. 3) Batch of fixes for BPF sockmap found under stress testing, from John Fastabend. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28Bluetooth: mgmt: Fix wrong opcode in the response for add_adv cmdTedd Ho-Jeong An
This patch fixes the MGMT add_advertising command repsones with the wrong opcode when it is trying to return the not supported error. Fixes: cbbdfa6f33198 ("Bluetooth: Enable controller RPA resolution using Experimental feature") Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2021-07-28Bluetooth: defer cleanup of resources in hci_unregister_dev()Tetsuo Handa
syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to calling lock_sock() with rw spinlock held [1]. It seems that history of this locking problem is a trial and error. Commit b40df5743ee8aed8 ("[PATCH] bluetooth: fix socket locking in hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to lock_sock() as an attempt to fix lockdep warning. Then, commit 4ce61d1c7a8ef4c1 ("[BLUETOOTH]: Fix locking in hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to local_bh_disable() + bh_lock_sock_nested() as an attempt to fix sleep in atomic context warning. Then, commit 4b5dd696f81b210c ("Bluetooth: Remove local_bh_disable() from hci_sock.c") in 3.3-rc1 removed local_bh_disable(). Then, commit e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to lock_sock() as an attempt to fix CVE-2021-3573. This difficulty comes from current implementation that hci_sock_dev_event(HCI_DEV_UNREG) is responsible for dropping all references from sockets because hci_unregister_dev() immediately reclaims resources as soon as returning from hci_sock_dev_event(HCI_DEV_UNREG). But the history suggests that hci_sock_dev_event(HCI_DEV_UNREG) was not doing what it should do. Therefore, instead of trying to detach sockets from device, let's accept not detaching sockets from device at hci_sock_dev_event(HCI_DEV_UNREG), by moving actual cleanup of resources from hci_unregister_dev() to hci_release_dev() which is called by bt_host_release when all references to this unregistered device (which is a kobject) are gone. Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1] Reported-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com> Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2021-07-28net: bridge: switchdev: treat local FDBs the same as entries towards the bridgeVladimir Oltean
Currently the following script: 1. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up 2. ip link set swp2 up && ip link set swp2 master br0 3. ip link set swp3 up && ip link set swp3 master br0 4. ip link set swp4 up && ip link set swp4 master br0 5. bridge vlan del dev swp2 vid 1 6. bridge vlan del dev swp3 vid 1 7. ip link set swp4 nomaster 8. ip link set swp3 nomaster produces the following output: [ 641.010738] sja1105 spi0.1: port 2 failed to delete 00:1f:7b:63:02:48 vid 1 from fdb: -2 [ swp2, swp3 and br0 all have the same MAC address, the one listed above ] In short, this happens because the number of FDB entry additions notified to switchdev is unbalanced with the number of deletions. At step 1, the bridge has a random MAC address. At step 2, the br_fdb_replay of swp2 receives this initial MAC address. Then the bridge inherits the MAC address of swp2 via br_fdb_change_mac_address(), and it notifies switchdev (only swp2 at this point) of the deletion of the random MAC address and the addition of 00:1f:7b:63:02:48 as a local FDB entry with fdb->dst == swp2, in VLANs 0 and the default_pvid (1). During step 7: del_nbp -> br_fdb_delete_by_port(br, p, vid=0, do_all=1); -> fdb_delete_local(br, p, f); br_fdb_delete_by_port() deletes all entries towards the ports, regardless of vid, because do_all is 1. fdb_delete_local() has logic to migrate local FDB entries deleted from one port to another port which shares the same MAC address and is in the same VLAN, or to the bridge device itself. This migration happens without notifying switchdev of the deletion on the old port and the addition on the new one, just fdb->dst is changed and the added_by_user flag is cleared. In the example above, the del_nbp(swp4) causes the "addr 00:1f:7b:63:02:48 vid 1" local FDB entry with fdb->dst == swp4 that existed up until then to be migrated directly towards the bridge (fdb->dst == NULL). This is because it cannot be migrated to any of the other ports (swp2 and swp3 are not in VLAN 1). After the migration to br0 takes place, swp4 requests a deletion replay of all FDB entries. Since the "addr 00:1f:7b:63:02:48 vid 1" entry now point towards the bridge, a deletion of it is replayed. There was just a prior addition of this address, so the switchdev driver deletes this entry. Then, the del_nbp(swp3) at step 8 triggers another br_fdb_replay, and switchdev is notified again to delete "addr 00:1f:7b:63:02:48 vid 1". But it can't because it no longer has it, so it returns -ENOENT. There are other possibilities to trigger this issue, but this is by far the simplest to explain. To fix this, we must avoid the situation where the addition of an FDB entry is notified to switchdev as a local entry on a port, and the deletion is notified on the bridge itself. Considering that the 2 types of FDB entries are completely equivalent and we cannot have the same MAC address as a local entry on 2 bridge ports, or on a bridge port and pointing towards the bridge at the same time, it makes sense to hide away from switchdev completely the fact that a local FDB entry is associated with a given bridge port at all. Just say that it points towards the bridge, it should make no difference whatsoever to the switchdev driver and should even lead to a simpler overall implementation, will less cases to handle. This also avoids any modification at all to the core bridge driver, just what is reported to switchdev changes. With the local/permanent entries on bridge ports being already reported to user space, it is hard to believe that the bridge behavior can change in any backwards-incompatible way such as making all local FDB entries point towards the bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28net: bridge: switchdev: replay the entire FDB for each portVladimir Oltean
Currently when a switchdev port joins a bridge, we replay all FDB entries pointing towards that port or towards the bridge. However, this is insufficient in certain situations: (a) DSA, through its assisted_learning_on_cpu_port logic, snoops dynamically learned FDB entries on foreign interfaces. These are FDB entries that are pointing neither towards the newly joined switchdev port, nor towards the bridge. So these addresses would be missed when joining a bridge where a foreign interface has already learned some addresses, and they would also linger on if the DSA port leaves the bridge before the foreign interface forgets them. None of this happens if we replay the entire FDB when the port joins. (b) There is a desire to treat local FDB entries on a port (i.e. the port's termination MAC address) identically to FDB entries pointing towards the bridge itself. More details on the reason behind this in the next patch. The point is that this cannot be done given the current structure of br_fdb_replay() in this situation: ip link set swp0 master br0 # br0 inherits its MAC address from swp0 ip link set swp1 master br0 What is desirable is that when swp1 joins the bridge, br_fdb_replay() also notifies swp1 of br0's MAC address, but this won't in fact happen because the MAC address of br0 does not have fdb->dst == NULL (it doesn't point towards the bridge), but it has fdb->dst == swp0. So our current logic makes it impossible for that address to be replayed. But if we dump the entire FDB instead of just the entries with fdb->dst == swp1 and fdb->dst == NULL, then the inherited MAC address of br0 will be replayed too, which is what we need. A natural question arises: say there is an FDB entry to be replayed, like a MAC address dynamically learned on a foreign interface that belongs to a bridge where no switchdev port has joined yet. If 10 switchdev ports belonging to the same driver join this bridge, one by one, won't every port get notified 10 times of the foreign FDB entry, amounting to a total of 100 notifications for this FDB entry in the switchdev driver? Well, yes, but this is where the "void *ctx" argument for br_fdb_replay is useful: every port of the switchdev driver is notified whenever any other port requests an FDB replay, but because the replay was initiated by a different port, its context is different from the initiating port's context, so it ignores those replays. So the foreign FDB entry will be installed only 10 times, once per port. This is done so that the following 4 code paths are always well balanced: (a) addition of foreign FDB entry is replayed when port joins bridge (b) deletion of foreign FDB entry is replayed when port leaves bridge (c) addition of foreign FDB entry is notified to all ports currently in bridge (c) deletion of foreign FDB entry is notified to all ports currently in bridge Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28af_unix: fix garbage collect vs MSG_PEEKMiklos Szeredi
unix_gc() assumes that candidate sockets can never gain an external reference (i.e. be installed into an fd) while the unix_gc_lock is held. Except for MSG_PEEK this is guaranteed by modifying inflight count under the unix_gc_lock. MSG_PEEK does not touch any variable protected by unix_gc_lock (file count is not), yet it needs to be serialized with garbage collection. Do this by locking/unlocking unix_gc_lock: 1) increment file count 2) lock/unlock barrier to make sure incremented file count is visible to garbage collection 3) install file into fd This is a lock barrier (unlike smp_mb()) that ensures that garbage collection is run completely before or completely after the barrier. Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-28net/sched: act_skbmod: Add SKBMOD_F_ECN option supportPeilin Ye
Currently, when doing rate limiting using the tc-police(8) action, the easiest way is to simply drop the packets which exceed or conform the configured bandwidth limit. Add a new option to tc-skbmod(8), so that users may use the ECN [1] extension to explicitly inform the receiver about the congestion instead of dropping packets "on the floor". The 2 least significant bits of the Traffic Class field in IPv4 and IPv6 headers are used to represent different ECN states [2]: 0b00: "Non ECN-Capable Transport", Non-ECT 0b10: "ECN Capable Transport", ECT(0) 0b01: "ECN Capable Transport", ECT(1) 0b11: "Congestion Encountered", CE As an example: $ tc filter add dev eth0 parent 1: protocol ip prio 10 \ matchall action skbmod ecn Doing the above marks all ECT(0) and ECT(1) packets as CE. It does NOT affect Non-ECT or non-IP packets. In the tc-police scenario mentioned above, users may pipe a tc-police action and a tc-skbmod "ecn" action together to achieve ECN-based rate limiting. For TCP connections, upon receiving a CE packet, the receiver will respond with an ECE packet, asking the sender to reduce their congestion window. However ECN also works with other L4 protocols e.g. DCCP and SCTP [2], and our implementation does not touch or care about L4 headers. The updated tc-skbmod SYNOPSIS looks like the following: tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ... Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod command. Trying to use more than one of them at a time is considered undefined behavior; pipe multiple tc-skbmod commands together instead. "set" and "swap" only affect Ethernet packets, while "ecn" only affects IPv{4,6} packets. It is also worth mentioning that, in theory, the same effect could be achieved by piping a "police" action and a "bpf" action using the bpf_skb_ecn_set_ce() helper, but this requires eBPF programming from the user, thus impractical. Depends on patch "net/sched: act_skbmod: Skip non-Ethernet packets". [1] https://datatracker.ietf.org/doc/html/rfc3168 [2] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28net: let flow have same hash in two directionszhang kai
using same source and destination ip/port for flow hash calculation within the two directions. Signed-off-by: zhang kai <zhangkaiheb@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28devlink: Remove duplicated registration checkLeon Romanovsky
Both registered flag and devlink pointer are set at the same time and indicate the same thing - devlink/devlink_port are ready. Instead of checking ->registered use devlink pointer as an indication. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28sctp: fix return value check in __sctp_rcv_asconf_lookupMarcelo Ricardo Leitner
As Ben Hutchings noticed, this check should have been inverted: the call returns true in case of success. Reported-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 0c5dc070ff3d ("sctp: validate from_addr_param return") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27bpf, sockmap: Fix memleak on ingress msg enqueueJohn Fastabend
If backlog handler is running during a tear down operation we may enqueue data on the ingress msg queue while tear down is trying to free it. sk_psock_backlog() sk_psock_handle_skb() skb_psock_skb_ingress() sk_psock_skb_ingress_enqueue() sk_psock_queue_msg(psock,msg) spin_lock(ingress_lock) sk_psock_zap_ingress() _sk_psock_purge_ingerss_msg() _sk_psock_purge_ingress_msg() -- free ingress_msg list -- spin_unlock(ingress_lock) spin_lock(ingress_lock) list_add_tail(msg,ingress_msg) <- entry on list with no one left to free it. spin_unlock(ingress_lock) To fix we only enqueue from backlog if the ENABLED bit is set. The tear down logic clears the bit with ingress_lock set so we wont enqueue the msg in the last step. Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210727160500.1713554-4-john.fastabend@gmail.com
2021-07-27bpf, sockmap: On cleanup we additionally need to remove cached skbJohn Fastabend
Its possible if a socket is closed and the receive thread is under memory pressure it may have cached a skb. We need to ensure these skbs are free'd along with the normal ingress_skb queue. Before 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") tear down and backlog processing both had sock_lock for the common case of socket close or unhash. So it was not possible to have both running in parrallel so all we would need is the kfree in those kernels. But, latest kernels include the commit 799aa7f98d5e and this requires a bit more work. Without the ingress_lock guarding reading/writing the state->skb case its possible the tear down could run before the state update causing it to leak memory or worse when the backlog reads the state it could potentially run interleaved with the tear down and we might end up free'ing the state->skb from tear down side but already have the reference from backlog side. To resolve such races we wrap accesses in ingress_lock on both sides serializing tear down and backlog case. In both cases this only happens after an EAGAIN error case so having an extra lock in place is likely fine. The normal path will skip the locks. Note, we check state->skb before grabbing lock. This works because we can only enqueue with the mutex we hold already. Avoiding a race on adding state->skb after the check. And if tear down path is running that is also fine if the tear down path then removes state->skb we will simply set skb=NULL and the subsequent goto is skipped. This slight complication avoids locking in normal case. With this fix we no longer see this warning splat from tcp side on socket close when we hit the above case with redirect to ingress self. [224913.935822] WARNING: CPU: 3 PID: 32100 at net/core/stream.c:208 sk_stream_kill_queues+0x212/0x220 [224913.935841] Modules linked in: fuse overlay bpf_preload x86_pkg_temp_thermal intel_uncore wmi_bmof squashfs sch_fq_codel efivarfs ip_tables x_tables uas xhci_pci ixgbe mdio xfrm_algo xhci_hcd wmi [224913.935897] CPU: 3 PID: 32100 Comm: fgs-bench Tainted: G I 5.14.0-rc1alu+ #181 [224913.935908] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019 [224913.935914] RIP: 0010:sk_stream_kill_queues+0x212/0x220 [224913.935923] Code: 8b 83 20 02 00 00 85 c0 75 20 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 df e8 2b 11 fe ff eb c3 0f 0b e9 7c ff ff ff 0f 0b eb ce <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 90 0f 1f 44 00 00 41 57 41 [224913.935932] RSP: 0018:ffff88816271fd38 EFLAGS: 00010206 [224913.935941] RAX: 0000000000000ae8 RBX: ffff88815acd5240 RCX: dffffc0000000000 [224913.935948] RDX: 0000000000000003 RSI: 0000000000000ae8 RDI: ffff88815acd5460 [224913.935954] RBP: ffff88815acd5460 R08: ffffffff955c0ae8 R09: fffffbfff2e6f543 [224913.935961] R10: ffffffff9737aa17 R11: fffffbfff2e6f542 R12: ffff88815acd5390 [224913.935967] R13: ffff88815acd5480 R14: ffffffff98d0c080 R15: ffffffff96267500 [224913.935974] FS: 00007f86e6bd1700(0000) GS:ffff888451cc0000(0000) knlGS:0000000000000000 [224913.935981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [224913.935988] CR2: 000000c0008eb000 CR3: 00000001020e0005 CR4: 00000000003706e0 [224913.935994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [224913.936000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [224913.936007] Call Trace: [224913.936016] inet_csk_destroy_sock+0xba/0x1f0 [224913.936033] __tcp_close+0x620/0x790 [224913.936047] tcp_close+0x20/0x80 [224913.936056] inet_release+0x8f/0xf0 [224913.936070] __sock_release+0x72/0x120 [224913.936083] sock_close+0x14/0x20 Fixes: a136678c0bdbb ("bpf: sk_msg, zap ingress queue on psock down") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210727160500.1713554-3-john.fastabend@gmail.com
2021-07-27bpf, sockmap: Zap ingress queues after stopping strparserJohn Fastabend
We don't want strparser to run and pass skbs into skmsg handlers when the psock is null. We just sk_drop them in this case. When removing a live socket from map it means extra drops that we do not need to incur. Move the zap below strparser close to avoid this condition. This way we stop the stream parser first stopping it from processing packets and then delete the psock. Fixes: a136678c0bdbb ("bpf: sk_msg, zap ingress queue on psock down") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210727160500.1713554-2-john.fastabend@gmail.com
2021-07-27net: cipso: fix warnings in netlbl_cipsov4_add_stdPavel Skripkin
Syzbot reported warning in netlbl_cipsov4_add(). The problem was in too big doi_def->map.std->lvl.local_size passed to kcalloc(). Since this value comes from userpace there is no need to warn if value is not correct. The same problem may occur with other kcalloc() calls in this function, so, I've added __GFP_NOWARN flag to all kcalloc() calls there. Reported-and-tested-by: syzbot+cdd51ee2e6b0b2e18c0d@syzkaller.appspotmail.com Fixes: 96cb8e3313c7 ("[NetLabel]: CIPSOv4 and Unlabeled packet integration") Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: bonding: move ioctl handling to private ndo operationArnd Bergmann
All other user triggered operations are gone from ndo_ioctl, so move the SIOCBOND family into a custom operation as well. The .ndo_ioctl() helper is no longer called by the dev_ioctl.c code now, but there are still a few definitions in obsolete wireless drivers as well as the appletalk and ieee802154 layers to call SIOCSIFADDR/SIOCGIFADDR helpers from inside the kernel. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: bridge: move bridge ioctls out of .ndo_do_ioctlArnd Bergmann
Working towards obsoleting the .ndo_do_ioctl operation entirely, stop passing the SIOCBRADDIF/SIOCBRDELIF device ioctl commands into this callback. My first attempt was to add another ndo_siocbr() callback, but as there is only a single driver that takes these commands and there is already a hook mechanism to call directly into this driver, extend this hook instead, and use it for both the deviceless and the device specific ioctl commands. Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: bridge@lists.linux-foundation.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: socket: return changed ifreq from SIOCDEVPRIVATEArnd Bergmann
Some drivers that use SIOCDEVPRIVATE ioctl commands modify the ifreq structure and expect it to be passed back to user space, which has never really happened for compat mode because the calling these drivers through ndo_do_ioctl requires overwriting the ifr_data pointer. Now that all drivers are converted to ndo_siocdevprivate, change it to handle this correctly in both compat and native mode. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: split out ndo_siowandev ioctlArnd Bergmann
In order to further reduce the scope of ndo_do_ioctl(), move out the SIOCWANDEV handling into a new network device operation function. Adjust the prototype to only pass the if_settings sub-structure in place of the ifreq, and remove the redundant 'cmd' argument in the process. Cc: Krzysztof Halasa <khc@pm.waw.pl> Cc: "Jan \"Yenya\" Kasprzak" <kas@fi.muni.cz> Cc: Kevin Curtis <kevin.curtis@farsite.co.uk> Cc: Zhao Qiang <qiang.zhao@nxp.com> Cc: Martin Schiller <ms@dev.tdt.de> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: linux-x25@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27dev_ioctl: split out ndo_eth_ioctlArnd Bergmann
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP. Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands. This is a purely cosmetic change intended to help readers find their way through the implementation. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27dev_ioctl: pass SIOCDEVPRIVATE data separatelyArnd Bergmann
The compat handlers for SIOCDEVPRIVATE are incorrect for any driver that passes data as part of struct ifreq rather than as an ifr_data pointer, or that passes data back this way, since the compat_ifr_data_ioctl() helper overwrites the ifr_data pointer and does not copy anything back out. Since all drivers using devprivate commands are now converted to the new .ndo_siocdevprivate callback, fix this by adding the missing piece and passing the pointer separately the whole way. This further unifies the native and compat logic for socket ioctls, as the new code now passes the correct pointer as well as the correct data for both native and compat ioctls. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27ip_tunnel: use ndo_siocdevprivateArnd Bergmann
The various ipv4 and ipv6 tunnel drivers each implement a set of 12 SIOCDEVPRIVATE commands for managing tunnels. These all work correctly in compat mode. Move them over to the new .ndo_siocdevprivate operation. Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27phonet: use siocdevprivateArnd Bergmann
phonet has a single private ioctl that is broken in compat mode on big-endian machines today because the data returned from it is never copied back to user space. Move it over to the ndo_siocdevprivate callback, which also fixes the compat issue. Cc: Remi Denis-Courmont <courmisch@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Rémi Denis-Courmont <courmisch@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27bridge: use ndo_siocdevprivateArnd Bergmann
The bridge driver has an old set of ioctls using the SIOCDEVPRIVATE namespace that have never worked in compat mode and are explicitly forbidden already. Move them over to ndo_siocdevprivate and fix compat mode for these, because we can. Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: bridge@lists.linux-foundation.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: split out SIOCDEVPRIVATE handling from dev_ioctlArnd Bergmann
SIOCDEVPRIVATE ioctl commands are mainly used in really old drivers, and they have a number of problems: - They hide behind the normal .ndo_do_ioctl function that is also used for other things in modern drivers, so it's hard to spot a driver that actually uses one of these - Since drivers use a number different calling conventions, it is impossible to support compat mode for them in a generic way. - With all drivers using the same 16 commands codes, there is no way to introspect the data being passed through things like strace. Add a new net_device_ops callback pointer, to address the first two of these. Separating them from .ndo_do_ioctl makes it easy to grep for drivers with a .ndo_siocdevprivate callback, and the unwieldy name hopefully makes it easier to spot in code review. By passing the ifreq structure and the ifr_data pointer separately, it is no longer necessary to overload these, and the driver can use either one for a given command. Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27tcp: more accurately check DSACKs to grow RACK reordering windowNeal Cardwell
Previously, a DSACK could expand the RACK reordering window when no reordering has been seen, and/or when the DSACK was due to an unnecessary TLP retransmit (rather than a spurious fast recovery due to reordering). This could result in unnecessarily growing the RACK reordering window and thus unnecessarily delaying RACK-based fast recovery episodes. To avoid these issues, this commit tightens the conditions under which a DSACK triggers the RACK reordering window to grow, so that a connection only expands its RACK reordering window if: (a) reordering has been seen in the connection (b) a DSACKed range does not match the most recent TLP retransmit Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27tcp: more accurately detect spurious TLP probesYuchung Cheng
Previously TLP is considered spurious if the sender receives any DSACK during a TLP episode. This patch further checks the DSACK sequences match the TLP's to improve accuracy. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-279p/xen: Fix end of loop tests for list_for_each_entryHarshvardhan Jha
This patch addresses the following problems: - priv can never be NULL, so this part of the check is useless - if the loop ran through the whole list, priv->client is invalid and it is more appropriate and sufficient to check for the end of list_for_each_entry loop condition. Link: http://lkml.kernel.org/r/20210727000709.225032-1-harshvardhan.jha@oracle.com Signed-off-by: Harshvardhan Jha <harshvardhan.jha@oracle.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Tested-by: Stefano Stabellini <sstabellini@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2021-07-279p/trans_virtio: Remove sysfs file on probe failureXie Yongji
This ensures we don't leak the sysfs file if we failed to allocate chan->vc_wq during probe. Link: http://lkml.kernel.org/r/20210517083557.172-1-xieyongji@bytedance.com Fixes: 86c8437383ac ("net/9p: Add sysfs mount_tag file for virtio 9P device") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2021-07-27net: llc: fix skb_over_panicPavel Skripkin
Syzbot reported skb_over_panic() in llc_pdu_init_as_xid_cmd(). The problem was in wrong LCC header manipulations. Syzbot's reproducer tries to send XID packet. llc_ui_sendmsg() is doing following steps: 1. skb allocation with size = len + header size len is passed from userpace and header size is 3 since addr->sllc_xid is set. 2. skb_reserve() for header_len = 3 3. filling all other space with memcpy_from_msg() Ok, at this moment we have fully loaded skb, only headers needs to be filled. Then code comes to llc_sap_action_send_xid_c(). This function pushes 3 bytes for LLC PDU header and initializes it. Then comes llc_pdu_init_as_xid_cmd(). It initalizes next 3 bytes *AFTER* LLC PDU header and call skb_push(skb, 3). This looks wrong for 2 reasons: 1. Bytes rigth after LLC header are user data, so this function was overwriting payload. 2. skb_push(skb, 3) call can cause skb_over_panic() since all free space was filled in llc_ui_sendmsg(). (This can happen is user passed 686 len: 686 + 14 (eth header) + 3 (LLC header) = 703. SKB_DATA_ALIGN(703) = 704) So, in this patch I added 2 new private constansts: LLC_PDU_TYPE_U_XID and LLC_PDU_LEN_U_XID. LLC_PDU_LEN_U_XID is used to correctly reserve header size to handle LLC + XID case. LLC_PDU_TYPE_U_XID is used by llc_pdu_header_init() function to push 6 bytes instead of 3. And finally I removed skb_push() call from llc_pdu_init_as_xid_cmd(). This changes should not affect other parts of LLC, since after all steps we just transmit buffer. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+5e5a981ad7cc54c4b2b4@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27openvswitch: fix sparse warning incorrect typeMark Gray
fix incorrect type in argument 1 (different address spaces) ../net/openvswitch/datapath.c:169:17: warning: incorrect type in argument 1 (different address spaces) ../net/openvswitch/datapath.c:169:17: expected void const * ../net/openvswitch/datapath.c:169:17: got struct dp_nlsk_pids [noderef] __rcu *upcall_portids Found at: https://patchwork.kernel.org/project/netdevbpf/patch/20210630095350.817785-1-mark.d.gray@redhat.com/#24285159 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27openvswitch: fix alignment issuesMark Gray
Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27net: netlink: add the case when nlh is NULLYajun Deng
Add the case when nlh is NULL in nlmsg_report(), so that the caller doesn't need to deal with this case. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27tty: drop put_tty_driverJiri Slaby
put_tty_driver() is an alias for tty_driver_kref_put(). There is no need for two exported identical functions, therefore switch all users of old put_tty_driver() to new tty_driver_kref_put() and remove the former for good. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Cc: Jens Taprogge <jens.taprogge@taprogge.org> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Scott Branden <scott.branden@broadcom.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David Lin <dtwlin@gmail.com> Cc: Johan Hovold <johan@kernel.org> Cc: Alex Elder <elder@kernel.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Pengutronix Kernel Team <kernel@pengutronix.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: NXP Linux Team <linux-imx@nxp.com> Cc: Oliver Neukum <oneukum@suse.com> Cc: Felipe Balbi <balbi@kernel.org> Cc: Mathias Nyman <mathias.nyman@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Acked-by: Alex Elder <elder@linaro.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210723074317.32690-8-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27tty: stop using alloc_tty_driverJiri Slaby
alloc_tty_driver was deprecated by tty_alloc_driver in commit 7f0bc6a68ed9 (TTY: pass flags to alloc_tty_driver) in 2012. I never got into eliminating alloc_tty_driver until now. So we still have two functions for allocating drivers which might be confusing. So get rid of alloc_tty_driver uses to eliminate it for good in the next patch. Note we need to switch return value checking as tty_alloc_driver uses ERR_PTR. And flags are now a parameter of tty_alloc_driver. Cc: Richard Henderson <rth@twiddle.net>(odd fixer:ALPHA PORT) Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Cc: Jens Taprogge <jens.taprogge@taprogge.org> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Oliver Neukum <oneukum@suse.com> Cc: Felipe Balbi <balbi@kernel.org> Cc: Johan Hovold <johan@kernel.org> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20210723074317.32690-5-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27ethtool: Fix rxnfc copy to user buffer overflowSaeed Mahameed
In the cited commit, copy_to_user() got called with the wrong pointer, instead of passing the actual buffer ptr to copy from, a pointer to the pointer got passed, which causes a buffer overflow calltrace to pop up when executing "ethtool -x ethX". Fix ethtool_rxnfc_copy_to_user() to use the rxnfc pointer as passed to the function, instead of a pointer to it. This fixes below call trace: [ 15.533533] ------------[ cut here ]------------ [ 15.539007] Buffer overflow detected (8 < 192)! [ 15.544110] WARNING: CPU: 3 PID: 1801 at include/linux/thread_info.h:200 copy_overflow+0x15/0x20 [ 15.549308] Modules linked in: [ 15.551449] CPU: 3 PID: 1801 Comm: ethtool Not tainted 5.14.0-rc2+ #1058 [ 15.553919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 15.558378] RIP: 0010:copy_overflow+0x15/0x20 [ 15.560648] Code: e9 7c ff ff ff b8 a1 ff ff ff eb c4 66 0f 1f 84 00 00 00 00 00 55 48 89 f2 89 fe 48 c7 c7 88 55 78 8a 48 89 e5 e8 06 5c 1e 00 <0f> 0b 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 [ 15.565114] RSP: 0018:ffffad49c0523bd0 EFLAGS: 00010286 [ 15.566231] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 0000000000000000 [ 15.567616] RDX: 0000000000000001 RSI: ffffffff8a7912e7 RDI: 00000000ffffffff [ 15.569050] RBP: ffffad49c0523bd0 R08: ffffffff8ab2ae28 R09: 00000000ffffdfff [ 15.570534] R10: ffffffff8aa4ae40 R11: ffffffff8aa4ae40 R12: 0000000000000000 [ 15.571899] R13: 00007ffd4cc2a230 R14: ffffad49c0523c00 R15: 0000000000000000 [ 15.573584] FS: 00007f538112f740(0000) GS:ffff96d5bdd80000(0000) knlGS:0000000000000000 [ 15.575639] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.577092] CR2: 00007f5381226d40 CR3: 0000000013542000 CR4: 00000000001506e0 [ 15.578929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 15.580695] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 15.582441] Call Trace: [ 15.582970] ethtool_rxnfc_copy_to_user+0x30/0x46 [ 15.583815] ethtool_get_rxnfc.cold+0x23/0x2b [ 15.584584] dev_ethtool+0x29c/0x25f0 [ 15.585286] ? security_netlbl_sid_to_secattr+0x77/0xd0 [ 15.586728] ? do_set_pte+0xc4/0x110 [ 15.587349] ? _raw_spin_unlock+0x18/0x30 [ 15.588118] ? __might_sleep+0x49/0x80 [ 15.588956] dev_ioctl+0x2c1/0x490 [ 15.589616] sock_ioctl+0x18e/0x330 [ 15.591143] __x64_sys_ioctl+0x41c/0x990 [ 15.591823] ? irqentry_exit_to_user_mode+0x9/0x20 [ 15.592657] ? irqentry_exit+0x33/0x40 [ 15.593308] ? exc_page_fault+0x32f/0x770 [ 15.593877] ? exit_to_user_mode_prepare+0x3c/0x130 [ 15.594775] do_syscall_64+0x35/0x80 [ 15.595397] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 15.596037] RIP: 0033:0x7f5381226d4b [ 15.596492] Code: 0f 1e fa 48 8b 05 3d b1 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d b1 0c 00 f7 d8 64 89 01 48 [ 15.598743] RSP: 002b:00007ffd4cc2a1f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 15.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5381226d4b [ 15.600795] RDX: 00007ffd4cc2a350 RSI: 0000000000008946 RDI: 0000000000000003 [ 15.601712] RBP: 00007ffd4cc2a340 R08: 00007ffd4cc2a350 R09: 0000000000000001 [ 15.602751] R10: 00007f538128a990 R11: 0000000000000246 R12: 0000000000000000 [ 15.603882] R13: 00007ffd4cc2a350 R14: 00007ffd4cc2a4b0 R15: 0000000000000000 [ 15.605042] ---[ end trace 325cf185e2795048 ]--- Fixes: dd98d2895de6 ("ethtool: improve compat ioctl handling") Reported-by: Shannon Nelson <snelson@pensando.io> CC: Arnd Bergmann <arnd@arndb.de> CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Tested-by: Shannon Nelson <snelson@pensando.io> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26flow_dissector: Fix out-of-bounds warningsGustavo A. R. Silva
Fix the following out-of-bounds warnings: net/core/flow_dissector.c: In function '__skb_flow_dissect': >> net/core/flow_dissector.c:1104:4: warning: 'memcpy' offset [24, 39] from the object at '<unknown>' is out of the bounds of referenced subobject 'saddr' with type 'struct in6_addr' at offset 8 [-Warray-bounds] 1104 | memcpy(&key_addrs->v6addrs, &iph->saddr, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1105 | sizeof(key_addrs->v6addrs)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/linux/ipv6.h:5, from net/core/flow_dissector.c:6: include/uapi/linux/ipv6.h:133:18: note: subobject 'saddr' declared here 133 | struct in6_addr saddr; | ^~~~~ >> net/core/flow_dissector.c:1059:4: warning: 'memcpy' offset [16, 19] from the object at '<unknown>' is out of the bounds of referenced subobject 'saddr' with type 'unsigned int' at offset 12 [-Warray-bounds] 1059 | memcpy(&key_addrs->v4addrs, &iph->saddr, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1060 | sizeof(key_addrs->v4addrs)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/linux/ip.h:17, from net/core/flow_dissector.c:5: include/uapi/linux/ip.h:103:9: note: subobject 'saddr' declared here 103 | __be32 saddr; | ^~~~~ The problem is that the original code is trying to copy data into a couple of struct members adjacent to each other in a single call to memcpy(). So, the compiler legitimately complains about it. As these are just a couple of members, fix this by copying each one of them in separate calls to memcpy(). This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/d5ae2e65-1f18-2577-246f-bada7eee6ccd@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs()Gustavo A. R. Silva
Fix the following out-of-bounds warning: In function 'ip_copy_addrs', inlined from '__ip_queue_xmit' at net/ipv4/ip_output.c:517:2: net/ipv4/ip_output.c:449:2: warning: 'memcpy' offset [40, 43] from the object at 'fl' is out of the bounds of referenced subobject 'saddr' with type 'unsigned int' at offset 36 [-Warray-bounds] 449 | memcpy(&iph->saddr, &fl4->saddr, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 450 | sizeof(fl4->saddr) + sizeof(fl4->daddr)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The problem is that the original code is trying to copy data into a couple of struct members adjacent to each other in a single call to memcpy(). This causes a legitimate compiler warning because memcpy() overruns the length of &iph->saddr and &fl4->saddr. As these are just a couple of struct members, fix this by using direct assignments, instead of memcpy(). This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/d5ae2e65-1f18-2577-246f-bada7eee6ccd@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26Revert "net: dsa: Allow drivers to filter packets they can decode source ↵Vladimir Oltean
port from" This reverts commit cc1939e4b3aaf534fb2f3706820012036825731c. Currently 2 classes of DSA drivers are able to send/receive packets directly through the DSA master: - drivers with DSA_TAG_PROTO_NONE - sja1105 Now that sja1105 has gained the ability to perform traffic termination even under the tricky case (VLAN-aware bridge), and that is much more functional (we can perform VLAN-aware bridging with foreign interfaces), there is no reason to keep this code in the receive path of the network core. So delete it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26net: dsa: sja1105: add bridge TX data plane offload based on tag_8021qVladimir Oltean
The main desire for having this feature in sja1105 is to support network stack termination for traffic coming from a VLAN-aware bridge. For sja1105, offloading the bridge data plane means sending packets as-is, with the proper VLAN tag, to the chip. The chip will look up its FDB and forward them to the correct destination port. But we support bridge data plane offload even for VLAN-unaware bridges, and the implementation there is different. In fact, VLAN-unaware bridging is governed by tag_8021q, so it makes sense to have the .bridge_fwd_offload_add() implementation fully within tag_8021q. The key difference is that we only support 1 VLAN-aware bridge, but we support multiple VLAN-unaware bridges. So we need to make sure that the forwarding domain is not crossed by packets injected from the stack. For this, we introduce the concept of a tag_8021q TX VLAN for bridge forwarding offload. As opposed to the regular TX VLANs which contain only 2 ports (the user port and the CPU port), a bridge data plane TX VLAN is "multicast" (or "imprecise"): it contains all the ports that are part of a certain bridge, and the hardware will select where the packet goes within this "imprecise" forwarding domain. Each VLAN-unaware bridge has its own "imprecise" TX VLAN, so we make use of the unique "bridge_num" provided by DSA for the data plane offload. We use the same 3 bits from the tag_8021q VLAN ID format to encode this bridge number. Note that these 3 bit positions have been used before for sub-VLANs in best-effort VLAN filtering mode. The difference is that for best-effort, the sub-VLANs were only valid on RX (and it was documented that the sub-VLAN field needed to be transmitted as zero). Whereas for the bridge data plane offload, these 3 bits are only valid on TX. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>