summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-11-26net: vlan: fix underflow for the real_dev refcntZiyang Xuan
Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute the following testcase: ip link add dev dummy1 type dummy ip link add name dummy1.100 link dummy1 type vlan id 100 ip link del dev dummy1 When the dummy netdevice is removed, we will get a WARNING as following: ======================================================================= refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 2 PID: 0 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 and an endless loop of: ======================================================================= unregister_netdevice: waiting for dummy1 to become free. Usage count = -1073741824 That is because dev_put(real_dev) in vlan_dev_free() be called without dev_hold(real_dev) in register_vlan_dev(). It makes the refcnt of real_dev underflow. Move the dev_hold(real_dev) to vlan_dev_init() which is the call-back of ndo_init(). That makes dev_hold() and dev_put() for vlan's real_dev symmetrical. Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()") Reported-by: Petr Machata <petrm@nvidia.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Link: https://lore.kernel.org/r/20211126015942.2918542-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()Julian Wiedmann
ethtool_set_coalesce() now uses both the .get_coalesce() and .set_coalesce() callbacks. But the check for their availability is buggy, so changing the coalesce settings on a device where the driver provides only _one_ of the callbacks results in a NULL pointer dereference instead of an -EOPNOTSUPP. Fix the condition so that the availability of both callbacks is ensured. This also matches the netlink code. Note that reproducing this requires some effort - it only affects the legacy ioctl path, and needs a specific combination of driver options: - have .get_coalesce() and .coalesce_supported but no .set_coalesce(), or - have .set_coalesce() but no .get_coalesce(). Here eg. ethtool doesn't cause the crash as it first attempts to call ethtool_get_coalesce() and bails out on error. Fixes: f3ccfda19319 ("ethtool: extend coalesce setting uAPI with CQE mode") Cc: Yufeng Mo <moyufeng@huawei.com> Cc: Huazhong Tan <tanhuazhong@huawei.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Link: https://lore.kernel.org/r/20211126175543.28000-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26net/sched: sch_ets: don't peek at classes beyond 'nbands'Davide Caratti
when the number of DRR classes decreases, the round-robin active list can contain elements that have already been freed in ets_qdisc_change(). As a consequence, it's possible to see a NULL dereference crash, caused by the attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 910 Comm: mausezahn Not tainted 5.16.0-rc1+ #475 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 RIP: 0010:ets_qdisc_dequeue+0x129/0x2c0 [sch_ets] Code: c5 01 41 39 ad e4 02 00 00 0f 87 18 ff ff ff 49 8b 85 c0 02 00 00 49 39 c4 0f 84 ba 00 00 00 49 8b ad c0 02 00 00 48 8b 7d 10 <48> 8b 47 18 48 8b 40 38 0f ae e8 ff d0 48 89 c3 48 85 c0 0f 84 9d RSP: 0000:ffffbb36c0b5fdd8 EFLAGS: 00010287 RAX: ffff956678efed30 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff9b938dc9 RDI: 0000000000000000 RBP: ffff956678efed30 R08: e2f3207fe360129c R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff956678efeac0 R13: ffff956678efe800 R14: ffff956611545000 R15: ffff95667ac8f100 FS: 00007f2aa9120740(0000) GS:ffff95667b800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000011070c000 CR4: 0000000000350ee0 Call Trace: <TASK> qdisc_peek_dequeued+0x29/0x70 [sch_ets] tbf_dequeue+0x22/0x260 [sch_tbf] __qdisc_run+0x7f/0x630 net_tx_action+0x290/0x4c0 __do_softirq+0xee/0x4f8 irq_exit_rcu+0xf4/0x130 sysvec_apic_timer_interrupt+0x52/0xc0 asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0033:0x7f2aa7fc9ad4 Code: b9 ff ff 48 8b 54 24 18 48 83 c4 08 48 89 ee 48 89 df 5b 5d e9 ed fc ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <53> 48 83 ec 10 48 8b 05 10 64 33 00 48 8b 00 48 85 c0 0f 85 84 00 RSP: 002b:00007ffe5d33fab8 EFLAGS: 00000202 RAX: 0000000000000002 RBX: 0000561f72c31460 RCX: 0000561f72c31720 RDX: 0000000000000002 RSI: 0000561f72c31722 RDI: 0000561f72c31720 RBP: 000000000000002a R08: 00007ffe5d33fa40 R09: 0000000000000014 R10: 0000000000000000 R11: 0000000000000246 R12: 0000561f7187e380 R13: 0000000000000000 R14: 0000000000000000 R15: 0000561f72c31460 </TASK> Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt intel_rapl_msr iTCO_vendor_support intel_rapl_common joydev virtio_balloon lpc_ich i2c_i801 i2c_smbus pcspkr ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel serio_raw libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000018 Ensuring that 'alist' was never zeroed [1] was not sufficient, we need to remove from the active list those elements that are no more SP nor DRR. [1] https://lore.kernel.org/netdev/60d274838bf09777f0371253416e8af71360bc08.1633609148.git.dcaratti@redhat.com/ v3: fix race between ets_qdisc_change() and ets_qdisc_dequeue() delisting DRR classes beyond 'nbands' in ets_qdisc_change() with the qdisc lock acquired, thanks to Cong Wang. v2: when a NULL qdisc is found in the DRR active list, try to dequeue skb from the next list item. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/7a5c496eed2d62241620bdbb83eb03fb9d571c99.1637762721.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26mac80211: notify non-transmitting BSS of color changesJohn Crispin
When color change is triggered in multiple bssid case, allow only for transmitting BSS, and when it changes its bss color, notify the non transmitting BSSs also of the new bss color. Signed-off-by: John Crispin <john@phrozen.org> Co-developed-by: Lavanya Suresh <lavaks@codeaurora.org> Signed-off-by: Lavanya Suresh <lavaks@codeaurora.org> Co-developed-by: Rameshkumar Sundaram <quic_ramess@quicinc.com> Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com> Link: https://lore.kernel.org/r/1637146647-16282-1-git-send-email-quic_ramess@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26mac80211: minstrel_ht: remove unused SAMPLE_SWITCH_THR definePeter Seiderer
Remove unused SAMPLE_SWITCH_THR define. Signed-off-by: Peter Seiderer <ps.report@gmx.net> Link: https://lore.kernel.org/r/20211116221244.30844-1-ps.report@gmx.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26cfg80211: allow continuous radar monitoring on offchannel chainLorenzo Bianconi
Allow continuous radar detection on the offchannel chain in order to switch to the monitored channel whenever the underlying driver reports a radar pattern on the main channel. Tested-by: Owen Peng <owen.peng@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/d46217310a49b14ff0e9c002f0a6e0547d70fd2c.1637071350.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26cfg80211: schedule offchan_cac_abort_wk in cfg80211_radar_eventLorenzo Bianconi
If necessary schedule offchan_cac_abort_wk work in cfg80211_radar_event routine adding offchan parameter to cfg80211_radar_event signature. Rename cfg80211_radar_event in __cfg80211_radar_event and introduce the two following inline helpers: - cfg80211_radar_event - cfg80211_offchan_radar_event Doing so the drv will not need to run cfg80211_offchan_cac_abort() after radar detection on the offchannel chain. Tested-by: Owen Peng <owen.peng@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/3ff583e021e3343a3ced54a7b09b5e184d1880dc.1637062727.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26cfg80211: delete redundant free codeliuguoqiang
When kzalloc failed and rdev->sacn_req or rdev->scan_msg is null, pass a null pointer to kfree is redundant, delete it and return directly. Signed-off-by: liuguoqiang <liuguoqiang@uniontech.com> Link: https://lore.kernel.org/r/20211115092139.24407-1-liuguoqiang@uniontech.com [remove now unused creq = NULL assigment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26mac80211: add support for .ndo_fill_forward_pathFelix Fietkau
This allows drivers to provide a destination device + info for flow offload Only supported in combination with 802.3 encap offload Signed-off-by: Felix Fietkau <nbd@nbd.name> Tested-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/20211112112223.1209-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26mac80211: Remove unused assignment statementsluo penghao
The assignment of these three local variables in the file will not be used in the corresponding functions, so they should be deleted. The clang_analyzer complains as follows: net/mac80211/wpa.c:689:2 warning: net/mac80211/wpa.c:883:2 warning: net/mac80211/wpa.c:452:2 warning: Value stored to 'hdr' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Link: https://lore.kernel.org/r/20211104061411.1744-1-luo.penghao@zte.com.cn Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26cfg80211: fix possible NULL pointer dereference in ↵Lorenzo Bianconi
cfg80211_stop_offchan_radar_detection Fix the following NULL pointer dereference in cfg80211_stop_offchan_radar_detection routine that occurs when hostapd is stopped during the CAC on offchannel chain: Sat Jan 1 0[ 779.567851] ESR = 0x96000005 0:12:50 2000 dae[ 779.572346] EC = 0x25: DABT (current EL), IL = 32 bits mon.debug hostap[ 779.578984] SET = 0, FnV = 0 d: hostapd_inter[ 779.583445] EA = 0, S1PTW = 0 face_deinit_free[ 779.587936] Data abort info: : num_bss=1 conf[ 779.592224] ISV = 0, ISS = 0x00000005 ->num_bss=1 Sat[ 779.597403] CM = 0, WnR = 0 Jan 1 00:12:50[ 779.601749] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000418b2000 2000 daemon.deb[ 779.609601] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 ug hostapd: host[ 779.619657] Internal error: Oops: 96000005 [#1] SMP [ 779.770810] CPU: 0 PID: 2202 Comm: hostapd Not tainted 5.10.75 #0 [ 779.776892] Hardware name: MediaTek MT7622 RFB1 board (DT) [ 779.782370] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) [ 779.788384] pc : cfg80211_chandef_valid+0x10/0x490 [cfg80211] [ 779.794128] lr : cfg80211_check_station_change+0x3190/0x3950 [cfg80211] [ 779.800731] sp : ffffffc01204b7e0 [ 779.804036] x29: ffffffc01204b7e0 x28: ffffff80039bdc00 [ 779.809340] x27: 0000000000000000 x26: ffffffc008cb3050 [ 779.814644] x25: 0000000000000000 x24: 0000000000000002 [ 779.819948] x23: ffffff8002630000 x22: ffffff8003e748d0 [ 779.825252] x21: 0000000000000cc0 x20: ffffff8003da4a00 [ 779.830556] x19: 0000000000000000 x18: ffffff8001bf7ce0 [ 779.835860] x17: 00000000ffffffff x16: 0000000000000000 [ 779.841164] x15: 0000000040d59200 x14: 00000000000019c0 [ 779.846467] x13: 00000000000001c8 x12: 000636b9e9dab1c6 [ 779.851771] x11: 0000000000000141 x10: 0000000000000820 [ 779.857076] x9 : 0000000000000000 x8 : ffffff8003d7d038 [ 779.862380] x7 : 0000000000000000 x6 : ffffff8003d7d038 [ 779.867683] x5 : 0000000000000e90 x4 : 0000000000000038 [ 779.872987] x3 : 0000000000000002 x2 : 0000000000000004 [ 779.878291] x1 : 0000000000000000 x0 : 0000000000000000 [ 779.883594] Call trace: [ 779.886039] cfg80211_chandef_valid+0x10/0x490 [cfg80211] [ 779.891434] cfg80211_check_station_change+0x3190/0x3950 [cfg80211] [ 779.897697] nl80211_radar_notify+0x138/0x19c [cfg80211] [ 779.903005] cfg80211_stop_offchan_radar_detection+0x7c/0x8c [cfg80211] [ 779.909616] __cfg80211_leave+0x2c/0x190 [cfg80211] [ 779.914490] cfg80211_register_netdevice+0x1c0/0x6d0 [cfg80211] [ 779.920404] raw_notifier_call_chain+0x50/0x70 [ 779.924841] call_netdevice_notifiers_info+0x54/0xa0 [ 779.929796] __dev_close_many+0x40/0x100 [ 779.933712] __dev_change_flags+0x98/0x190 [ 779.937800] dev_change_flags+0x20/0x60 [ 779.941628] devinet_ioctl+0x534/0x6d0 [ 779.945370] inet_ioctl+0x1bc/0x230 [ 779.948849] sock_do_ioctl+0x44/0x200 [ 779.952502] sock_ioctl+0x268/0x4c0 [ 779.955985] __arm64_sys_ioctl+0xac/0xd0 [ 779.959900] el0_svc_common.constprop.0+0x60/0x110 [ 779.964682] do_el0_svc+0x1c/0x24 [ 779.967990] el0_svc+0x10/0x1c [ 779.971036] el0_sync_handler+0x9c/0x120 [ 779.974950] el0_sync+0x148/0x180 [ 779.978259] Code: a9bc7bfd 910003fd a90153f3 aa0003f3 (f9400000) [ 779.984344] ---[ end trace 0e67b4f5d6cdeec7 ]--- [ 779.996400] Kernel panic - not syncing: Oops: Fatal exception [ 780.002139] SMP: stopping secondary CPUs [ 780.006057] Kernel Offset: disabled [ 780.009537] CPU features: 0x0000002,04002004 [ 780.013796] Memory Limit: none Fixes: b8f5facf286b ("cfg80211: implement APIs for dedicated radar detection HW") Reported-by: Evelyn Tsai <evelyn.tsai@mediatek.com> Tested-by: Evelyn Tsai <evelyn.tsai@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/c2e34c065bf8839c5ffa45498ae154021a72a520.1635958796.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26mac80211: fix a memory leak where sta_info is not freedAhmed Zaki
The following is from a system that went OOM due to a memory leak: wlan0: Allocated STA 74:83:c2:64:0b:87 wlan0: Allocated STA 74:83:c2:64:0b:87 wlan0: IBSS finish 74:83:c2:64:0b:87 (---from ieee80211_ibss_add_sta) wlan0: Adding new IBSS station 74:83:c2:64:0b:87 wlan0: moving STA 74:83:c2:64:0b:87 to state 2 wlan0: moving STA 74:83:c2:64:0b:87 to state 3 wlan0: Inserted STA 74:83:c2:64:0b:87 wlan0: IBSS finish 74:83:c2:64:0b:87 (---from ieee80211_ibss_work) wlan0: Adding new IBSS station 74:83:c2:64:0b:87 wlan0: moving STA 74:83:c2:64:0b:87 to state 2 wlan0: moving STA 74:83:c2:64:0b:87 to state 3 . . wlan0: expiring inactive not authorized STA 74:83:c2:64:0b:87 wlan0: moving STA 74:83:c2:64:0b:87 to state 2 wlan0: moving STA 74:83:c2:64:0b:87 to state 1 wlan0: Removed STA 74:83:c2:64:0b:87 wlan0: Destroyed STA 74:83:c2:64:0b:87 The ieee80211_ibss_finish_sta() is called twice on the same STA from 2 different locations. On the second attempt, the allocated STA is not destroyed creating a kernel memory leak. This is happening because sta_info_insert_finish() does not call sta_info_free() the second time when the STA already exists (returns -EEXIST). Note that the caller sta_info_insert_rcu() assumes STA is destroyed upon errors. Same fix is applied to -ENOMEM. Signed-off-by: Ahmed Zaki <anzaki@gmail.com> Link: https://lore.kernel.org/r/20211002145329.3125293-1-anzaki@gmail.com [change the error path label to use the existing code] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26mac80211: set up the fwd_skb->dev for mesh forwardingXing Song
Mesh forwarding requires that the fwd_skb->dev is set up for TX handling, otherwise the following warning will be generated, so set it up for the pending frames. [ 72.835674 ] WARNING: CPU: 0 PID: 1193 at __skb_flow_dissect+0x284/0x1298 [ 72.842379 ] Modules linked in: ksmbd pppoe ppp_async l2tp_ppp ... [ 72.962020 ] CPU: 0 PID: 1193 Comm: kworker/u5:1 Tainted: P S 5.4.137 #0 [ 72.969938 ] Hardware name: MT7622_MT7531 RFB (DT) [ 72.974659 ] Workqueue: napi_workq napi_workfn [ 72.979025 ] pstate: 60000005 (nZCv daif -PAN -UAO) [ 72.983822 ] pc : __skb_flow_dissect+0x284/0x1298 [ 72.988444 ] lr : __skb_flow_dissect+0x54/0x1298 [ 72.992977 ] sp : ffffffc010c738c0 [ 72.996293 ] x29: ffffffc010c738c0 x28: 0000000000000000 [ 73.001615 ] x27: 000000000000ffc2 x26: ffffff800c2eb818 [ 73.006937 ] x25: ffffffc010a987c8 x24: 00000000000000ce [ 73.012259 ] x23: ffffffc010c73a28 x22: ffffffc010a99c60 [ 73.017581 ] x21: 000000000000ffc2 x20: ffffff80094da800 [ 73.022903 ] x19: 0000000000000000 x18: 0000000000000014 [ 73.028226 ] x17: 00000000084d16af x16: 00000000d1fc0bab [ 73.033548 ] x15: 00000000715f6034 x14: 000000009dbdd301 [ 73.038870 ] x13: 00000000ea4dcbc3 x12: 0000000000000040 [ 73.044192 ] x11: 000000000eb00ff0 x10: 0000000000000000 [ 73.049513 ] x9 : 000000000eb00073 x8 : 0000000000000088 [ 73.054834 ] x7 : 0000000000000000 x6 : 0000000000000001 [ 73.060155 ] x5 : 0000000000000000 x4 : 0000000000000000 [ 73.065476 ] x3 : ffffffc010a98000 x2 : 0000000000000000 [ 73.070797 ] x1 : 0000000000000000 x0 : 0000000000000000 [ 73.076120 ] Call trace: [ 73.078572 ] __skb_flow_dissect+0x284/0x1298 [ 73.082846 ] __skb_get_hash+0x7c/0x228 [ 73.086629 ] ieee80211_txq_may_transmit+0x7fc/0x17b8 [mac80211] [ 73.092564 ] ieee80211_tx_prepare_skb+0x20c/0x268 [mac80211] [ 73.098238 ] ieee80211_tx_pending+0x144/0x330 [mac80211] [ 73.103560 ] tasklet_action_common.isra.16+0xb4/0x158 [ 73.108618 ] tasklet_action+0x2c/0x38 [ 73.112286 ] __do_softirq+0x168/0x3b0 [ 73.115954 ] do_softirq.part.15+0x88/0x98 [ 73.119969 ] __local_bh_enable_ip+0xb0/0xb8 [ 73.124156 ] napi_workfn+0x58/0x90 [ 73.127565 ] process_one_work+0x20c/0x478 [ 73.131579 ] worker_thread+0x50/0x4f0 [ 73.135249 ] kthread+0x124/0x128 [ 73.138484 ] ret_from_fork+0x10/0x1c Signed-off-by: Xing Song <xing.song@mediatek.com> Tested-By: Frank Wunderlich <frank-w@public-files.de> Link: https://lore.kernel.org/r/20211123033123.2684-1-xing.song@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26mac80211: fix regression in SSN handling of addba txFelix Fietkau
Some drivers that do their own sequence number allocation (e.g. ath9k) rely on being able to modify params->ssn on starting tx ampdu sessions. This was broken by a change that modified it to use sta->tid_seq[tid] instead. Cc: stable@vger.kernel.org Fixes: 31d8bb4e07f8 ("mac80211: agg-tx: refactor sending addba") Reported-by: Eneas U de Queiroz <cotequeiroz@gmail.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20211124094024.43222-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26mac80211: fix rate control for retransmitted framesFelix Fietkau
Since retransmission clears info->control, rate control needs to be called again, otherwise the driver might crash due to invalid rates. Cc: stable@vger.kernel.org # 5.14+ Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reported-by: Robert W <rwbugreport@lost-in-the-void.net> Fixes: 03c3911d2d67 ("mac80211: call ieee80211_tx_h_rate_ctrl() when dequeue") Signed-off-by: Felix Fietkau <nbd@nbd.name> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Link: https://lore.kernel.org/r/20211122204323.9787-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26mac80211: track only QoS data frames for admission controlJohannes Berg
For admission control, obviously all of that only works for QoS data frames, otherwise we cannot even access the QoS field in the header. Syzbot reported (see below) an uninitialized value here due to a status of a non-QoS nullfunc packet, which isn't even long enough to contain the QoS header. Fix this to only do anything for QoS data packets. Reported-by: syzbot+614e82b88a1a4973e534@syzkaller.appspotmail.com Fixes: 02219b3abca5 ("mac80211: add WMM admission control support") Link: https://lore.kernel.org/r/20211122124737.dad29e65902a.Ieb04587afacb27c14e0de93ec1bfbefb238cc2a0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-26mac80211: fix TCP performance on mesh interfaceMaxime Bizon
sta is NULL for mesh point (resolved later), so sk pacing parameters were not applied. Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Link: https://lore.kernel.org/r/66f51659416ac35d6b11a313bd3ffe8b8a43dd55.camel@freebox.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-11-25sctp: make the raise timer more simple and accurateXin Long
Currently, the probe timer is reused as the raise timer when PLPMTUD is in the Search Complete state. raise_count was introduced to count how many times the probe timer has timed out. When raise_count reaches to 30, the raise timer handler will be triggered. During the whole processing above, the timer keeps timing out every probe_ interval. It is a waste for the Search Complete state, as the raise timer only needs to time out after 30 * probe_interval. Since the raise timer and probe timer are never used at the same time, it is no need to keep probe timer 'alive' in the Search Complete state. This patch to introduce sctp_transport_reset_raise_timer() to start the timer as the raise timer when entering the Search Complete state. When entering the other states, sctp_transport_reset_probe_timer() will still be called to reset the timer to the probe timer. raise_count can be removed from sctp_transport as no need to count probe timer timeout for raise timer timeout. last_rtx_chunks can be removed as sctp_transport_reset_probe_timer() can be called in the place where asoc rtx_data_chunks is changed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/edb0e48988ea85997488478b705b11ddc1ba724a.1637781974.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25tipc: delete the unlikely branch in tipc_aead_encryptXin Long
When a skb comes to tipc_aead_encrypt(), it's always linear. The unlikely check 'skb_cloned(skb) && tailen <= skb_tailroom(skb)' can completely be taken care of in skb_cow_data() by the code in branch "if (!skb_has_frag_list())". Also, remove the 'TODO:' annotation, as the pages in skbs are not writable, see more on commit 3cf4375a0904 ("tipc: do not write skb_shinfo frags when doing decrytion"). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/47a478da0b6095b76e3cbe7a75cbd25d9da1df9a.1637773872.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25tls: fix replacing proto_opsJakub Kicinski
We replace proto_ops whenever TLS is configured for RX. But our replacement also overrides sendpage_locked, which will crash unless TX is also configured. Similarly we plug both of those in for TLS_HW (NIC crypto offload) even tho TLS_HW has a completely different implementation for TX. Last but not least we always plug in something based on inet_stream_ops even though a few of the callbacks differ for IPv6 (getname, release, bind). Use a callback building method similar to what we do for struct proto. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Fixes: d4ffb02dee2f ("net/tls: enable sk_msg redirect to tls socket egress") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25tls: splice_read: fix accessing pre-processed recordsJakub Kicinski
recvmsg() will put peek()ed and partially read records onto the rx_list. splice_read() needs to consult that list otherwise it may miss data. Align with recvmsg() and also put partially-read records onto rx_list. tls_sw_advance_skb() is pretty pointless now and will be removed in net-next. Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25tls: splice_read: fix record type checkJakub Kicinski
We don't support splicing control records. TLS 1.3 changes moved the record type check into the decrypt if(). The skb may already be decrypted and still be an alert. Note that decrypt_skb_update() is idempotent and updates ctx->decrypted so the if() is pointless. Reorder the check for decryption errors with the content type check while touching them. This part is not really a bug, because if decryption failed in TLS 1.3 content type will be DATA, and for TLS 1.2 it will be correct. Nevertheless its strange to touch output before checking if the function has failed. Fixes: fedf201e1296 ("net: tls: Refactor control message handling on recv") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25Bluetooth: Limit duration of Remote Name ResolveArchie Pusaka
When doing remote name request, we cannot scan. In the normal case it's OK since we can expect it to finish within a short amount of time. However, there is a possibility to scan lots of devices that (1) requires Remote Name Resolve (2) is unresponsive to Remote Name Resolve When this happens, we are stuck to do Remote Name Resolve until all is done before continue scanning. This patch adds a time limit to stop us spending too long on remote name request. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-11-25Bluetooth: Send device found event on name resolve failureArchie Pusaka
Introducing NAME_REQUEST_FAILED flag that will be sent together with device found event on name resolve failure. This will provide the userspace with an information so it can decide not to resolve the name for these devices in the future. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-11-25Bluetooth: HCI: Fix definition of hci_rp_delete_stored_link_keyLuiz Augusto von Dentz
num_keys is actually 2 octects not 1: BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 1989: Num_Keys_Deleted: Size: 2 octets 0xXXXX Number of Link Keys Deleted Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-11-25Bluetooth: HCI: Fix definition of hci_rp_read_stored_link_keyLuiz Augusto von Dentz
Both max_num_keys and num_key are 2 octects: BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 1985: Max_Num_Keys: Size: 2 octets Range: 0x0000 to 0xFFFF Num_Keys_Read: Size: 2 octets Range: 0x0000 to 0xFFFF Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-11-24net/smc: Fix loop in smc_listenGuo DaXing
The kernel_listen function in smc_listen will fail when all the available ports are occupied. At this point smc->clcsock->sk->sk_data_ready has been changed to smc_clcsock_data_ready. When we call smc_listen again, now both smc->clcsock->sk->sk_data_ready and smc->clcsk_data_ready point to the smc_clcsock_data_ready function. The smc_clcsock_data_ready() function calls lsmc->clcsk_data_ready which now points to itself resulting in an infinite loop. This patch restores smc->clcsock->sk->sk_data_ready with the old value. Fixes: a60a2b1e0af1 ("net/smc: reduce active tcp_listen workers") Signed-off-by: Guo DaXing <guodaxing@huawei.com> Acked-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk()Karsten Graul
Coverity reports a possible NULL dereferencing problem: in smc_vlan_by_tcpsk(): 6. returned_null: netdev_lower_get_next returns NULL (checked 29 out of 30 times). 7. var_assigned: Assigning: ndev = NULL return value from netdev_lower_get_next. 1623 ndev = (struct net_device *)netdev_lower_get_next(ndev, &lower); CID 1468509 (#1 of 1): Dereference null return value (NULL_RETURNS) 8. dereference: Dereferencing a pointer that might be NULL ndev when calling is_vlan_dev. 1624 if (is_vlan_dev(ndev)) { Remove the manual implementation and use netdev_walk_all_lower_dev() to iterate over the lower devices. While on it remove an obsolete function parameter comment. Fixes: cb9d43f67754 ("net/smc: determine vlan_id of stacked net_device") Suggested-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24net-ipv6: changes to ->tclass (via IPV6_TCLASS) should sk_dst_reset()Maciej Żenczykowski
This is to match ipv4 behaviour, see __ip_sock_set_tos() implementation. Technically for ipv6 this might not be required because normally we do not allow tclass to influence routing, yet the cli tooling does support it: lpk11:~# ip -6 rule add pref 5 tos 45 lookup 5 lpk11:~# ip -6 rule 5: from all tos 0x45 lookup 5 and in general dscp/tclass based routing does make sense. We already have cases where dscp can affect vlan priority and/or transmit queue (especially on wifi). So let's just make things match. Easier to reason about and no harm. Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Link: https://lore.kernel.org/r/20211123223208.1117871-1-zenczykowski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24net-ipv6: do not allow IPV6_TCLASS to muck with tcp's ECNMaciej Żenczykowski
This is to match ipv4 behaviour, see __ip_sock_set_tos() implementation at ipv4/ip_sockglue.c:579 void __ip_sock_set_tos(struct sock *sk, int val) { if (sk->sk_type == SOCK_STREAM) { val &= ~INET_ECN_MASK; val |= inet_sk(sk)->tos & INET_ECN_MASK; } if (inet_sk(sk)->tos != val) { inet_sk(sk)->tos = val; sk->sk_priority = rt_tos2priority(val); sk_dst_reset(sk); } } Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211123223154.1117794-1-zenczykowski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24net: allow SO_MARK with CAP_NET_RAWMaciej Żenczykowski
A CAP_NET_RAW capable process can already spoof (on transmit) anything it desires via raw packet sockets... There is no good reason to not allow it to also be able to play routing tricks on packets from its own normal sockets. There is a desire to be able to use SO_MARK for routing table selection (via ip rule fwmark) from within a user process without having to run it as root. Granting it CAP_NET_RAW is much less dangerous than CAP_NET_ADMIN (CAP_NET_RAW doesn't permit persistent state change, while CAP_NET_ADMIN does - by for example allowing the reconfiguration of the routing tables and/or bringing up/down devices). Let's keep CAP_NET_ADMIN for persistent state changes, while using CAP_NET_RAW for non-configuration related stuff. Signed-off-by: Maciej Żenczykowski <maze@google.com> Link: https://lore.kernel.org/r/20211123203715.193413-1-zenczykowski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24net: allow CAP_NET_RAW to setsockopt SO_PRIORITYMaciej Żenczykowski
CAP_NET_ADMIN is and should continue to be about configuring the system as a whole, not about configuring per-socket or per-packet parameters. Sending and receiving raw packets is what CAP_NET_RAW is all about. It can already send packets with any VLAN tag, and any IPv4 TOS mark, and any IPv6 TCLASS mark, simply by virtue of building such a raw packet. Not to mention using any protocol and source/ /destination ip address/port tuple. These are the fields that networking gear uses to prioritize packets. Hence, a CAP_NET_RAW process is already capable of affecting traffic prioritization after it hits the wire. This change makes it capable of affecting traffic prioritization even in the host at the nic and before that in the queueing disciplines (provided skb->priority is actually being used for prioritization, and not the TOS/TCLASS field) Hence it makes sense to allow a CAP_NET_RAW process to set the priority of sockets and thus packets it sends. Signed-off-by: Maciej Żenczykowski <maze@google.com> Link: https://lore.kernel.org/r/20211123203702.193221-1-zenczykowski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flowsEric Dumazet
While testing BIG TCP patch series, I was expecting that TCP_RR workloads with 80KB requests/answers would send one 80KB TSO packet, then being received as a single GRO packet. It turns out this was not happening, and the root cause was that cubic Hystart ACK train was triggering after a few (2 or 3) rounds of RPC. Hystart was wrongly setting CWND/SSTHRESH to 30, while my RPC needed a budget of ~20 segments. Ideally these TCP_RR flows should not exit slow start. Cubic Hystart should reset itself at each round, instead of assuming every TCP flow is a bulk one. Note that even after this patch, Hystart can still trigger, depending on scheduling artifacts, but at a higher CWND/SSTHRESH threshold, keeping optimal TSO packet sizes. Tested: ip link set dev eth0 gro_ipv6_max_size 131072 gso_ipv6_max_size 131072 nstat -n; netperf -H ... -t TCP_RR -l 5 -- -r 80000,80000 -K cubic; nstat|egrep "Ip6InReceives|Hystart|Ip6OutRequests" Before: 8605 Ip6InReceives 87541 0.0 Ip6OutRequests 129496 0.0 TcpExtTCPHystartTrainDetect 1 0.0 TcpExtTCPHystartTrainCwnd 30 0.0 After: 8760 Ip6InReceives 88514 0.0 Ip6OutRequests 87975 0.0 Fixes: ae27e98a5152 ("[TCP] CUBIC v2.3") Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20211123202535.1843771-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24net: bridge: Allow base 16 inputs in sysfsIdo Schimmel
Cited commit converted simple_strtoul() to kstrtoul() as suggested by the former's documentation. However, it also forced all the inputs to be decimal resulting in user space breakage. Fix by setting the base to '0' so that the base is automatically detected. Before: # ip link add name br0 type bridge vlan_filtering 1 # echo "0x88a8" > /sys/class/net/br0/bridge/vlan_protocol bash: echo: write error: Invalid argument After: # ip link add name br0 type bridge vlan_filtering 1 # echo "0x88a8" > /sys/class/net/br0/bridge/vlan_protocol # echo $? 0 Fixes: 520fbdf7fb19 ("net/bridge: replace simple_strtoul to kstrtol") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20211124101122.3321496-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlersEric Dumazet
All gro_complete() handlers are called from napi_gro_complete() while rcu_read_lock() has been called. There is no point stacking more rcu_read_lock() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24gro: remove rcu_read_lock/rcu_read_unlock from gro_receive handlersEric Dumazet
All gro_receive() handlers are called from dev_gro_receive() while rcu_read_lock() has been called. There is no point stacking more rcu_read_lock() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24Bluetooth: refactor malicious adv data checkBrian Gix
Check for out-of-bound read was being performed at the end of while num_reports loop, and would fill journal with false positives. Added check to beginning of loop processing so that it doesn't get checked after ptr has been advanced. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-11-24net/ncsi : Add payload to be 32-bit aligned to fix dropped packetsKumar Thangavel
Update NC-SI command handler (both standard and OEM) to take into account of payload paddings in allocating skb (in case of payload size is not 32-bit aligned). The checksum field follows payload field, without taking payload padding into account can cause checksum being truncated, leading to dropped packets. Fixes: fb4ee67529ff ("net/ncsi: Add NCSI OEM command support") Signed-off-by: Kumar Thangavel <thangavel.k@hcl.com> Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-23dccp: Inline dccp_listen_start().Kuniyuki Iwashima
This patch inlines dccp_listen_start() and removes a stale comment in inet_dccp_listen() so that it looks like inet_listen(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Reviewed-by: Richard Sailer <richard_siegfried@systemli.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-23dccp/tcp: Remove an unused argument in inet_csk_listen_start().Kuniyuki Iwashima
The commit 1295e2cf3065 ("inet: minor optimization for backlog setting in listen(2)") added change so that sk_max_ack_backlog is initialised earlier in inet_dccp_listen() and inet_listen(). Since then, we no longer use backlog in inet_csk_listen_start(), so let's remove it. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Richard Sailer <richard_siegfried@systemli.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-23net: remove .ndo_change_proto_downJakub Kicinski
.ndo_change_proto_down was added seemingly to enable out-of-tree implementations. Over 2.5yrs later we still have no real users upstream. Hardwire the generic implementation for now, we can revert once real users materialize. (rocker is a test vehicle, not a user.) We need to drop the optimization on the sysfs side, because unlike ndos priv_flags will be changed at runtime, so we'd need READ_ONCE/WRITE_ONCE everywhere.. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-23net/smc: Ensure the active closing peer first closes clcsockTony Lu
The side that actively closed socket, it's clcsock doesn't enter TIME_WAIT state, but the passive side does it. It should show the same behavior as TCP sockets. Consider this, when client actively closes the socket, the clcsock in server enters TIME_WAIT state, which means the address is occupied and won't be reused before TIME_WAIT dismissing. If we restarted server, the service would be unavailable for a long time. To solve this issue, shutdown the clcsock in [A], perform the TCP active close progress first, before the passive closed side closing it. So that the actively closed side enters TIME_WAIT, not the passive one. Client | Server close() // client actively close | smc_release() | smc_close_active() // PEERCLOSEWAIT1 | smc_close_final() // abort or closed = 1| smc_cdc_get_slot_and_msg_send() | [A] | |smc_cdc_msg_recv_action() // ACTIVE | queue_work(smc_close_wq, &conn->close_work) | smc_close_passive_work() // PROCESSABORT or APPCLOSEWAIT1 | smc_close_passive_abort_received() // only in abort | |close() // server recv zero, close | smc_release() // PROCESSABORT or APPCLOSEWAIT1 | smc_close_active() | smc_close_abort() or smc_close_final() // CLOSED | smc_cdc_get_slot_and_msg_send() // abort or closed = 1 smc_cdc_msg_recv_action() | smc_clcsock_release() queue_work(smc_close_wq, &conn->close_work) | sock_release(tcp) // actively close clc, enter TIME_WAIT smc_close_passive_work() // PEERCLOSEWAIT1 | smc_conn_free() smc_close_passive_abort_received() // CLOSED| smc_conn_free() | smc_clcsock_release() | sock_release(tcp) // passive close clc | Link: https://www.spinics.net/lists/netdev/msg780407.html Fixes: b38d732477e4 ("smc: socket closing and linkgroup cleanup") Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-23net/smc: Clean up local struct sock variablesTony Lu
There remains some variables to replace with local struct sock. So clean them up all. Fixes: 3163c5071f25 ("net/smc: use local struct sock variables consistently") Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-23net: nexthop: fix null pointer dereference when IPv6 is not enabledNikolay Aleksandrov
When we try to add an IPv6 nexthop and IPv6 is not enabled (!CONFIG_IPV6) we'll hit a NULL pointer dereference[1] in the error path of nh_create_ipv6() due to calling ipv6_stub->fib6_nh_release. The bug has been present since the beginning of IPv6 nexthop gateway support. Commit 1aefd3de7bc6 ("ipv6: Add fib6_nh_init and release to stubs") tells us that only fib6_nh_init has a dummy stub because fib6_nh_release should not be called if fib6_nh_init returns an error, but the commit below added a call to ipv6_stub->fib6_nh_release in its error path. To fix it return the dummy stub's -EAFNOSUPPORT error directly without calling ipv6_stub->fib6_nh_release in nh_create_ipv6()'s error path. [1] Output is a bit truncated, but it clearly shows the error. BUG: kernel NULL pointer dereference, address: 000000000000000000 #PF: supervisor instruction fetch in kernel modede #PF: error_code(0x0010) - not-present pagege PGD 0 P4D 0 Oops: 0010 [#1] PREEMPT SMP NOPTI CPU: 4 PID: 638 Comm: ip Kdump: loaded Not tainted 5.16.0-rc1+ #446 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. RSP: 0018:ffff888109f5b8f0 EFLAGS: 00010286^Ac RAX: 0000000000000000 RBX: ffff888109f5ba28 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881008a2860 RBP: ffff888109f5b9d8 R08: 0000000000000000 R09: 0000000000000000 R10: ffff888109f5b978 R11: ffff888109f5b948 R12: 00000000ffffff9f R13: ffff8881008a2a80 R14: ffff8881008a2860 R15: ffff8881008a2840 FS: 00007f98de70f100(0000) GS:ffff88822bf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000100efc000 CR4: 00000000000006e0 Call Trace: <TASK> nh_create_ipv6+0xed/0x10c rtm_new_nexthop+0x6d7/0x13f3 ? check_preemption_disabled+0x3d/0xf2 ? lock_is_held_type+0xbe/0xfd rtnetlink_rcv_msg+0x23f/0x26a ? check_preemption_disabled+0x3d/0xf2 ? rtnl_calcit.isra.0+0x147/0x147 netlink_rcv_skb+0x61/0xb2 netlink_unicast+0x100/0x187 netlink_sendmsg+0x37f/0x3a0 ? netlink_unicast+0x187/0x187 sock_sendmsg_nosec+0x67/0x9b ____sys_sendmsg+0x19d/0x1f9 ? copy_msghdr_from_user+0x4c/0x5e ? rcu_read_lock_any_held+0x2a/0x78 ___sys_sendmsg+0x6c/0x8c ? asm_sysvec_apic_timer_interrupt+0x12/0x20 ? lockdep_hardirqs_on+0xd9/0x102 ? sockfd_lookup_light+0x69/0x99 __sys_sendmsg+0x50/0x6e do_syscall_64+0xcb/0xf2 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f98dea28914 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53 RSP: 002b:00007fff859f5e68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e2e RAX: ffffffffffffffda RBX: 00000000619cb810 RCX: 00007f98dea28914 RDX: 0000000000000000 RSI: 00007fff859f5ed0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000008 R10: fffffffffffffce6 R11: 0000000000000246 R12: 0000000000000001 R13: 000055c0097ae520 R14: 000055c0097957fd R15: 00007fff859f63a0 </TASK> Modules linked in: bridge stp llc bonding virtio_net Cc: stable@vger.kernel.org Fixes: 53010f991a9f ("nexthop: Add support for IPv6 gateways") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22SUNRPC: use different lock keys for INET6 and LOCALNeilBrown
xprtsock.c reclassifies sock locks based on the protocol. However there are 3 protocols and only 2 classification keys. The same key is used for both INET6 and LOCAL. This causes lockdep complaints. The complaints started since Commit ea9afca88bbe ("SUNRPC: Replace use of socket sk_callback_lock with sock_lock") which resulted in the sock locks beings used more. So add another key, and renumber them slightly. Fixes: ea9afca88bbe ("SUNRPC: Replace use of socket sk_callback_lock with sock_lock") Fixes: 176e21ee2ec8 ("SUNRPC: Support for RPC over AF_LOCAL transports") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-11-22devlink: Add 'enable_iwarp' generic device paramShiraz Saleem
Add a new device generic parameter to enable and disable iWARP functionality on a multi-protocol RDMA device. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-22net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop groupNikolay Aleksandrov
When replacing a nexthop group, we must release the IPv6 per-cpu dsts of the removed nexthop entries after an RCU grace period because they contain references to the nexthop's net device and to the fib6 info. With specific series of events[1] we can reach net device refcount imbalance which is unrecoverable. IPv4 is not affected because dsts don't take a refcount on the route. [1] $ ip nexthop list id 200 via 2002:db8::2 dev bridge.10 scope link onlink id 201 via 2002:db8::3 dev bridge scope link onlink id 203 group 201/200 $ ip -6 route 2001:db8::10 nhid 203 metric 1024 pref medium nexthop via 2002:db8::3 dev bridge weight 1 onlink nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink Create rt6_info through one of the multipath legs, e.g.: $ taskset -a -c 1 ./pkt_inj 24 bridge.10 2001:db8::10 (pkt_inj is just a custom packet generator, nothing special) Then remove that leg from the group by replace (let's assume it is id 200 in this case): $ ip nexthop replace id 203 group 201 Now remove the IPv6 route: $ ip -6 route del 2001:db8::10/128 The route won't be really deleted due to the stale rt6_info holding 1 refcnt in nexthop id 200. At this point we have the following reference count dependency: (deleted) IPv6 route holds 1 reference over nhid 203 nh 203 holds 1 ref over id 201 nh 200 holds 1 ref over the net device and the route due to the stale rt6_info Now to create circular dependency between nh 200 and the IPv6 route, and also to get a reference over nh 200, restore nhid 200 in the group: $ ip nexthop replace id 203 group 201/200 And now we have a permanent circular dependncy because nhid 203 holds a reference over nh 200 and 201, but the route holds a ref over nh 203 and is deleted. To trigger the bug just delete the group (nhid 203): $ ip nexthop del id 203 It won't really be deleted due to the IPv6 route dependency, and now we have 2 unlinked and deleted objects that reference each other: the group and the IPv6 route. Since the group drops the reference it holds over its entries at free time (i.e. its own refcount needs to drop to 0) that will never happen and we get a permanent ref on them, since one of the entries holds a reference over the IPv6 route it will also never be released. At this point the dependencies are: (deleted, only unlinked) IPv6 route holds reference over group nh 203 (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200 nh 200 holds 1 ref over the net device and the route due to the stale rt6_info This is the last point where it can be fixed by running traffic through nh 200, and specifically through the same CPU so the rt6_info (dst) will get released due to the IPv6 genid, that in turn will free the IPv6 route, which in turn will free the ref count over the group nh 203. If nh 200 is deleted at this point, it will never be released due to the ref from the unlinked group 203, it will only be unlinked: $ ip nexthop del id 200 $ ip nexthop $ Now we can never release that stale rt6_info, we have IPv6 route with ref over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with rt6_info (dst) with ref over the net device and the IPv6 route. All of these objects are only unlinked, and cannot be released, thus they can't release their ref counts. Message from syslogd@dev at Nov 19 14:04:10 ... kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3 Message from syslogd@dev at Nov 19 14:04:20 ... kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3 Fixes: 7bf4796dd099 ("nexthops: add support for replace") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22net: ipv6: add fib6_nh_release_dsts stubNikolay Aleksandrov
We need a way to release a fib6_nh's per-cpu dsts when replacing nexthops otherwise we can end up with stale per-cpu dsts which hold net device references, so add a new IPv6 stub called fib6_nh_release_dsts. It must be used after an RCU grace period, so no new dsts can be created through a group's nexthop entry. Similar to fib6_nh_release it shouldn't be used if fib6_nh_init has failed so it doesn't need a dummy stub when IPv6 is not enabled. Fixes: 7bf4796dd099 ("nexthops: add support for replace") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22skbuff: Switch structure bounds to struct_group()Kees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. Replace the existing empty member position markers "headers_start" and "headers_end" with a struct_group(). This will allow memcpy() and sizeof() to more easily reason about sizes, and improve readability. "pahole" shows no size nor member offset changes to struct sk_buff. "objdump -d" shows no object code changes (outside of WARNs affected by source line number changes). Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> # drivers/net/wireguard/* Link: https://lore.kernel.org/lkml/20210728035006.GD35706@embeddedor Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22skbuff: Move conditional preprocessor directives out of struct sk_buffKees Cook
In preparation for using the struct_group() macro in struct sk_buff, move the conditional preprocessor directives out of the region of struct sk_buff that will be enclosed by struct_group(). While GCC and Clang are happy with conditional preprocessor directives here, sparse is not, even under -Wno-directive-within-macro[1], as would be seen under a C=1 build: net/core/filter.c: note: in included file (through include/linux/netlink.h, include/linux/sock_diag.h): ./include/linux/skbuff.h:820:1: warning: directive in macro's argument list ./include/linux/skbuff.h:822:1: warning: directive in macro's argument list ./include/linux/skbuff.h:846:1: warning: directive in macro's argument list ./include/linux/skbuff.h:848:1: warning: directive in macro's argument list Additionally remove empty macro argument definitions and usage. "objdump -d" shows no object code differences. [1] https://www.spinics.net/lists/linux-sparse/msg10857.html Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>