summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-05-14net: dsa: qca8k: add priority tweak to qca8337 switchAnsuel Smith
The port 5 of the qca8337 have some problem in flood condition. The original legacy driver had some specific buffer and priority settings for the different port suggested by the QCA switch team. Add this missing settings to improve switch stability under load condition. The packet priority tweak is only needed for the qca8337 switch and other qca8k switch are not affected. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14devicetree: net: dsa: qca8k: Document new compatible qca8327Ansuel Smith
Add support for qca8327 in the compatible list. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: dsa: qca8k: add support for qca8327 switchAnsuel Smith
qca8327 switch is a low tier version of the more recent qca8337. It does share the same regs used by the qca8k driver and can be supported with minimal change. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: dsa: qca8k: handle error from qca8k_busy_waitAnsuel Smith
Propagate errors from qca8k_busy_wait instead of hardcoding return value. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: dsa: qca8k: handle error with qca8k_rmw operationAnsuel Smith
qca8k_rmw can fail. Rework any user to handle error values and correctly return. Change qca8k_rmw to return the error code or 0 instead of the reg value. The reg returned by qca8k_rmw wasn't used anywhere, so this doesn't cause any functional change. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: dsa: qca8k: handle error with qca8k_write operationAnsuel Smith
qca8k_write can fail. Rework any user to handle error values and correctly return. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: dsa: qca8k: handle error with qca8k_read operationAnsuel Smith
qca8k_read can fail. Rework any user to handle error values and correctly return. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: dsa: qca8k: handle qca8k_set_page errorsAnsuel Smith
With a remote possibility, the set_page function can fail. Since this is a critical part of the write/read qca8k regs, propagate the error and terminate any read/write operation. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: dsa: qca8k: improve qca8k read/write/rmw bus accessAnsuel Smith
Put bus in local variable to improve faster access to the mdio bus. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: dsa: qca8k: use iopoll macro for qca8k_busy_waitAnsuel Smith
Use iopoll macro instead of while loop. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: dsa: qca8k: change simple print to dev variantAnsuel Smith
Change pr_err and pr_warn to dev variant. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14tcp: add tracepoint for checksum errorsJakub Kicinski
Add a tracepoint for capturing TCP segments with a bad checksum. This makes it easy to identify sources of bad frames in the fleet (e.g. machines with faulty NICs). It should also help tools like IOvisor's tcpdrop.py which are used today to get detailed information about such packets. We don't have a socket in many cases so we must open code the address extraction based just on the skb. v2: add missing export for ipv6=m Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14Merge branch 'use-xdp-helpers'David S. Miller
Matteo Croce says: ==================== net: use XDP helpers The commit 43b5169d8355 ("net, xdp: Introduce xdp_init_buff utility routine") and commit be9df4aff65f ("net, xdp: Introduce xdp_prepare_buff utility routine") introduces two useful helpers to populate xdp_buff. Use it in drivers which still open codes that routines. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14vhost_net: use XDP helpersMatteo Croce
Make use of the xdp_{init,prepare}_buff() helpers instead of an open-coded version. Also, the field xdp->rxq was never set, so pass NULL to xdp_init_buff() to clear it. Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14igc: use XDP helpersMatteo Croce
Make use of the xdp_{init,prepare}_buff() helpers instead of an open-coded version. Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14stmmac: use XDP helpersMatteo Croce
Make use of the xdp_{init,prepare}_buff() helpers instead of an open-coded version. Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: cdc_eem: fix URL to CDC EEM 1.0 specJonathan Davies
The old URL is no longer accessible. Signed-off-by: Jonathan Davies <jonathan.davies@nutanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14Merge branch 'rk3308-gmac'David S. Miller
Tobias Schramm says: ==================== Add support for RK3308 gmac The Rockchip RK3308 SoC features an internal gmac. Only the signals required for RMII are exposed so it is limited to 10/100 Mbit/s operation. This patchset adds support for it. I've tested the patchset on a Rock Pi S, works fine. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14arm64: dts: rockchip: add gmac to rk3308 dtsTobias Schramm
The RK3308 SoC has a gmac with only the RMII interface exposed. This commit adds it to the RK3308 dtsi. Signed-off-by: Tobias Schramm <t.schramm@manjaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: stmmac: dwmac-rk: add support for rk3308 gmacTobias Schramm
The Rockchip RK3308 SoC has a gmac with only the RMII interface signals exposed. This patch adds support for it. Signed-off-by: Tobias Schramm <t.schramm@manjaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14dt-bindings: net: rockchip-dwmac: add rk3308 gmac compatibleTobias Schramm
The Rockchip RK3308 has a gmac that is not fully compatible with any of the other Rockchip gmacs. This patch adds a compatible string for it. Signed-off-by: Tobias Schramm <t.schramm@manjaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14alx: fix missing unlock on error in alx_set_pauseparam()Pu Lehui
Add the missing unlock before return from function alx_set_pauseparam() in the error handling case. Fixes: 4a5fe57e7751 ("alx: use fine-grained locking instead of RTNL") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14Merge branch 'hns-coding-style'David S. Miller
Guangbin Huang says: ==================== net: hns: clean up some code style issues This patchset clean up some code style issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns: remove redundant return int void functionPeng Li
Void function return statements are not generally useful, so remove the redundant return. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns: space required before the open brace '{'Peng Li
Add the space required before the open brace '{'. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns: fix some code style issue about spacePeng Li
Spaces at the start of a line will cause checkpatch warning. This patch replaces the spaces by tab at the start of a line. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns: fix the comments style issuePeng Li
Networking block comments don't use an empty /* line, use /* Comment... This patch fix the comments style issue. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14Merge branch 'hns3-next'David S. Miller
Huazhong Tan says: ==================== net: hns3: updates for -next This series adds some updates for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor dump ncl config of debugfsJiaran Zhang
Currently, the debugfs command for ncl config is implemented by "echo xxxx > cmd", and record the information in dmesg. It's unnecessary and heavy. To improve it, create a single file "ncl_config" for it, and query it by command "cat ncl_config", return the result to userspace, rather than record in dmesg. The display style is below: $cat ncl_config offset | data 0x0000 | 0x00000028 0x0004 | 0x00000400 0x0008 | 0x08040201 0x000c | 0x00000000 0x0010 | 0x00040004 0x0014 | 0x00040004 0x0018 | 0x00000000 0x001c | 0x00000000 0x0020 | 0x00040004 Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor dump m7 info of debugfsJiaran Zhang
Currently, the debugfs command for m7 info is implemented by "echo xxxx > cmd", and record the information in dmesg. It's unnecessary and heavy. To improve it, create a single file "imp_info" for it, and query it by command "cat imp_info", return the result to userspace, rather than record in dmesg. The display style is below: $cat imp_info offset | data 0x0000 | 0x00000000 0x00000000 0x0008 | 0x00000000 0x00000000 0x0010 | 0x00000000 0x00000001 0x0018 | 0x00000000 0x00000000 0x0020 | 0x00000000 0x00000000 0x0028 | 0x00000000 0x00000000 0x0030 | 0x00000000 0x00000000 Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor dump reset info of debugfsJiaran Zhang
Currently, the debugfs command for reset info is implemented by "echo xxxx > cmd", and record the information in dmesg. It's unnecessary and heavy. To improve it, create a single file "reset_info" for it, and query it by command "cat reset_info", return the result to userspace, rather than record in dmesg. The display style is below: $cat reset_info PF reset count: 0 FLR reset count: 0 GLOBAL reset count: 0 IMP reset count: 0 reset done count: 0 HW reset done count: 0 reset count: 0 reset fail count: 0 vector0 interrupt enable status: 0x1 reset interrupt source: 0x0 reset interrupt status: 0x0 RAS interrupt status:0x0 hardware reset status: 0x0 handshake status: 0x80 function reset status: 0x0 Change to the "hclge_show_rst_info" in the "hclge_reset_err_handle", when the reset fails, display reset info immediately. Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor dump intr of debugfsJiaran Zhang
Currently, the debugfs command for intr is implemented by "echo xxxx > cmd", and record the information in dmesg. It's unnecessary and heavy. To improve it, create a single file "interrupt_info" for it, and query it by command "cat interrupt_info", return the result to userspace, rather than record in dmesg. The display style is below: $cat interrupt_info num_nic_msi: 65 num_roce_msi: 65 num_msi_used: 2 num_msi_left: 128 Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor dump loopback of debugfsYufeng Mo
Currently, the debugfs command for loopback is implemented by "echo xxxx > cmd", and record the information in dmesg. It's unnecessary and heavy. To improve it, create a single file "loopback" for it, and query it by command "cat loopback", return the result to userspace, rather than record in dmesg. The display style is below: $ cat loopback mac id: 0 app loopback: off serdes serial loopback: off serdes parallel loopback: off Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor dump mng tbl of debugfsYufeng Mo
Currently, the debugfs command for mng tbl is implemented by "echo xxxx > cmd", and record the information in dmesg. It's unnecessary and heavy. To improve it, create a single file "mng_tbl" for it, and query it by command "cat mng_tbl", return the result to userspace, rather than record in dmesg. The display style is below: $ cat mng_tbl entry mac_addr mask ether mask vlan mask i_map ... 00 00:00:00:00:00:00 0 88cc 0 0000 1 0f ... Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor dump mac list of debugfsHuazhong Tan
Currently, the debugfs command for mac list info is implemented by "echo xxxx > cmd", and record the information in dmesg. It's unnecessary and heavy. To improve it, create two files "uc" and "mc" under directory "mac_list" for it, and query mac list info by "cat mac_list/uc" and "mac_list/mc", return the result to userspace, rather than record in dmesg. The display style is below: $ cat mac_list/uc UC MAC_LIST: FUNC_ID MAC_ADDR STATE pf 00:18:2d:00:00:71 ACTIVE $ cat mac_list/mc MC MAC_LIST: FUNC_ID MAC_ADDR STATE pf 01:80:c2:00:00:21 ACTIVE Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor dump bd info of debugfsHuazhong Tan
Currently, the debugfs command for bd info is implemented by "echo xxxx > cmd", and record the information in dmesg. It's unnecessary and heavy. To improve it, add two debugfs directories "tx_bd_info" and "rx_bd_info", and create a file for each queue under these two directories, and query the bd info of specific queue by "cat tx_bd_info/tx_bd_queue*" or "cat rx_bd_info/rx_bd_queue*", return the result to userspace, rather than record in dmesg. The display style is below: $ cat rx_bd_info/rx_bd_queue0 Queue 0 rx bd info: BD_IDX L234_INFO PKT_LEN SIZE... 0 0x0 60 60... 1 0x0 1512 1512... $ cat tx_bd_info/tx_bd_queue0 Queue 0 tx bd info: BD_IDX ADDRESS VLAN_TAG SIZE... 0 0x0 0 0... 1 0x0 0 0... Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor dev capability and dev spec of debugfsJiaran Zhang
Currently, the debugfs command for dev capability and dev spec are implemented by "echo xxxx > cmd", and record the information in dmesg. It's unnecessary and heavy. To improve it, create a single file "dev_info" for them, and query them by command "cat dev_info", return the result to userspace, rather than record in dmesg. The display style is below: $cat dev_info dev capability: support FD: yes support GRO: yes support FEC: yes support UDP GSO: no support PTP: no support INT QL: no support HW TX csum: no support UDP tunnel csum: no support TX push: no support imp-controlled PHY: no support rxd advanced layout: no dev spec: MAC entry num: 0 MNG entry num: 0 MAX non tso bd num: 8 RSS ind tbl size: 512 RSS key size: 40 RSS size: 1 Allocated RSS size: 0 Task queue pairs numbers: 1 RX buffer length: 2048 Desc num per TX queue: 1024 Desc num per RX queue: 1024 Total number of enabled TCs: 1 MAX INT QL: 0 MAX INT GL: 8160 MAX TM RATE: 100000 MAX QSET number: 1024 Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor the debugfs processYufeng Mo
Currently, each debugfs command needs to create a file to get the information. To better support more debugfs commands, the debugfs process is reconstructed, including the process of creating dentries and files, and obtaining information. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: refactor out RX completion checksumHuazhong Tan
Only when RXD advanced layout is enabled, in some cases (e.g. ip fragments), the checksum of entire packet will be calculated and filled in the least significant 16 bits of the unused addr field. So refactor out the handling of RX completion checksum: adjust the location of the checksum in RX descriptor, and use ptype table to identify whether this kind of checksum is calculated. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: hns3: support RXD advanced layoutHuazhong Tan
Currently, the driver gets packet type by parsing the L3_ID/L4_ID/OL3_ID/OL4_ID from RX descriptor, it's time-consuming. Now some new devices support RXD advanced layout, which combines previous OL3_ID/OL4_ID to 8bit ptype field, so the driver gets packet type by looking up only one table, and L3_ID/L4_ID become reserved fields. Considering compatibility, the firmware will report capability of RXD advanced layout, the driver will identify and enable it by default. This patch provides basic function: identify and enable the RXD advanced layout, and refactor out hns3_rx_checksum() by using ptype table to handle RX checksum if supported. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14Merge branch 'lockless-qdisc-packet-stuck'David S. Miller
Yunsheng Lin says: ==================== ix packet stuck problem for lockless qdisc This patchset fixes the packet stuck problem mentioned in [1]. Patch 1: Add STATE_MISSED flag to fix packet stuck problem. Patch 2: Fix a tx_action rescheduling problem after STATE_MISSED flag is added in patch 1. Patch 3: Fix the significantly higher CPU consumption problem when multiple threads are competing on a saturated outgoing device. V8: Change function name as suggested by Jakub and fix some typo in patch 3, adjust commit log in patch 2, and add Acked-by from Jakub. V7: Fix netif_tx_wake_queue() data race noted by Jakub. V6: Some performance optimization in patch 1 suggested by Jakub and drop NET_XMIT_DROP checking in patch 3. V5: add patch 3 to fix the problem reported by Michal Kubecek. V4: Change STATE_NEED_RESCHEDULE to STATE_MISSED and add patch 2. [1]. https://lkml.org/lkml/2019/10/9/42 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: sched: fix tx action reschedule issue with stopped queueYunsheng Lin
The netdev qeueue might be stopped when byte queue limit has reached or tx hw ring is full, net_tx_action() may still be rescheduled if STATE_MISSED is set, which consumes unnecessary cpu without dequeuing and transmiting any skb because the netdev queue is stopped, see qdisc_run_end(). This patch fixes it by checking the netdev queue state before calling qdisc_run() and clearing STATE_MISSED if netdev queue is stopped during qdisc_run(), the net_tx_action() is rescheduled again when netdev qeueue is restarted, see netif_tx_wake_queue(). As there is time window between netif_xmit_frozen_or_stopped() checking and STATE_MISSED clearing, between which STATE_MISSED may set by net_tx_action() scheduled by netif_tx_wake_queue(), so set the STATE_MISSED again if netdev queue is restarted. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Reported-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: sched: fix tx action rescheduling issue during deactivationYunsheng Lin
Currently qdisc_run() checks the STATE_DEACTIVATED of lockless qdisc before calling __qdisc_run(), which ultimately clear the STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED is set before clearing STATE_MISSED, there may be rescheduling of net_tx_action() at the end of qdisc_run_end(), see below: CPU0(net_tx_atcion) CPU1(__dev_xmit_skb) CPU2(dev_deactivate) . . . . set STATE_MISSED . . __netif_schedule() . . . set STATE_DEACTIVATED . . qdisc_reset() . . . .<--------------- . synchronize_net() clear __QDISC_STATE_SCHED | . . . | . . . | . some_qdisc_is_busy() . | . return *false* . | . . test STATE_DEACTIVATED | . . __qdisc_run() *not* called | . . . | . . test STATE_MISS | . . __netif_schedule()--------| . . . . . . . . __qdisc_run() is not called by net_tx_atcion() in CPU0 because CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule is called at the end of qdisc_run_end(), causing tx action rescheduling problem. qdisc_run() called by net_tx_action() runs in the softirq context, which should has the same semantic as the qdisc_run() called by __dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a synchronize_net() between STATE_DEACTIVATED flag being set and qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely bail out for the deactived lockless qdisc in net_tx_action(), and qdisc_reset() will reset all skb not dequeued yet. So add the rcu_read_lock() explicitly to protect the qdisc_run() and do the STATE_DEACTIVATED checking in net_tx_action() before calling qdisc_run_begin(). Another option is to do the checking in the qdisc_run_end(), but it will add unnecessary overhead for non-tx_action case, because __dev_queue_xmit() will not see qdisc with STATE_DEACTIVATED after synchronize_net(), the qdisc with STATE_DEACTIVATED can only be seen by net_tx_action() because of __netif_schedule(). The STATE_DEACTIVATED checking in qdisc_run() is to avoid race between net_tx_action() and qdisc_reset(), see: commit d518d2ed8640 ("net/sched: fix race between deactivation and dequeue for NOLOCK qdisc"). As the bailout added above for deactived lockless qdisc in net_tx_action() provides better protection for the race without calling qdisc_run() at all, so remove the STATE_DEACTIVATED checking in qdisc_run(). After qdisc_reset(), there is no skb in qdisc to be dequeued, so clear the STATE_MISSED in dev_reset_queue() too. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> V8: Clearing STATE_MISSED before calling __netif_schedule() has avoid the endless rescheduling problem, but there may still be a unnecessary rescheduling, so adjust the commit log. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: sched: fix packet stuck problem for lockless qdiscYunsheng Lin
Lockless qdisc has below concurrent problem: cpu0 cpu1 . . q->enqueue . . . qdisc_run_begin() . . . dequeue_skb() . . . sch_direct_xmit() . . . . q->enqueue . qdisc_run_begin() . return and do nothing . . qdisc_run_end() . cpu1 enqueue a skb without calling __qdisc_run() because cpu0 has not released the lock yet and spin_trylock() return false for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb enqueued by cpu1 when calling dequeue_skb() because cpu1 may enqueue the skb after cpu0 calling dequeue_skb() and before cpu0 calling qdisc_run_end(). Lockless qdisc has below another concurrent problem when tx_action is involved: cpu0(serving tx_action) cpu1 cpu2 . . . . q->enqueue . . qdisc_run_begin() . . dequeue_skb() . . . q->enqueue . . . . sch_direct_xmit() . . . qdisc_run_begin() . . return and do nothing . . . clear __QDISC_STATE_SCHED . . qdisc_run_begin() . . return and do nothing . . . . . . qdisc_run_end() . This patch fixes the above data race by: 1. If the first spin_trylock() return false and STATE_MISSED is not set, set STATE_MISSED and retry another spin_trylock() in case other CPU may not see STATE_MISSED after it releases the lock. 2. reschedule if STATE_MISSED is set after the lock is released at the end of qdisc_run_end(). For tx_action case, STATE_MISSED is also set when cpu1 is at the end if qdisc_run_end(), so tx_action will be rescheduled again to dequeue the skb enqueued by cpu2. Clear STATE_MISSED before retrying a dequeuing when dequeuing returns NULL in order to reduce the overhead of the second spin_trylock() and __netif_schedule() calling. Also clear the STATE_MISSED before calling __netif_schedule() at the end of qdisc_run_end() to avoid doing another round of dequeuing in the pfifo_fast_dequeue(). The performance impact of this patch, tested using pktgen and dummy netdev with pfifo_fast qdisc attached: threads without+this_patch with+this_patch delta 1 2.61Mpps 2.60Mpps -0.3% 2 3.97Mpps 3.82Mpps -3.7% 4 5.62Mpps 5.59Mpps -0.5% 8 2.78Mpps 2.77Mpps -0.3% 16 2.22Mpps 2.22Mpps -0.0% Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAITJim Ma
In tls_sw_splice_read, checkout MSG_* is inappropriate, should use SPLICE_*, update tls_wait_data to accept nonblock arguments instead of flags for recvmsg and splice. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Jim Ma <majinjing3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv"Hoang Le
This reverts commit 6bf24dc0cc0cc43b29ba344b66d78590e687e046. Above fix is not correct and caused memory leak issue. Fixes: 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv") Acked-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14net: thunderx: Drop unnecessary NULL check after container_ofGuenter Roeck
The result of container_of() operations is never NULL unless the embedded element is the first element of the structure. This is not the case here. The NULL check is therefore unnecessary and misleading. Remove it. This change was made automatically with the following Coccinelle script. @@ type t; identifier v; statement s; @@ <+... ( t v = container_of(...); | v = container_of(...); ) ... when != v - if (\( !v \| v == NULL \) ) s ...+> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14sfc: don't use netif_info et al before net_device is registeredHeiner Kallweit
Using netif_info() before the net_device is registered results in ugly messages like the following: sfc 0000:01:00.1 (unnamed net_device) (uninitialized): Solarflare NIC detected Therefore use pci_info() et al until net_device is registered. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-15Merge tag 'drm-msm-fixes-2021-05-09' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-fixes - dsi regression fix - dma-buf pinning fix - displayport fixes - llc fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGuqLZDAEJwUFKb6m+h3kyxgjDEKa3DPA1fHA69vxbXH=g@mail.gmail.com
2021-05-14Merge tag 'trace-v5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix trace_check_vprintf() for %.*s The sanity check of all strings being read from the ring buffer to make sure they are in safe memory space did not account for the %.*s notation having another parameter to process (the length). Add that to the check" * tag 'trace-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Handle %.*s in trace_check_vprintf()