summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-10can: isotp: set default value for N_As to 50 micro secondsOliver Hartkopp
The N_As value describes the time a CAN frame needs on the wire when transmitted by the CAN controller. Even very short CAN FD frames need arround 100 usecs (bitrate 1Mbit/s, data bitrate 8Mbit/s). Having N_As to be zero (the former default) leads to 'no CAN frame separation' when STmin is set to zero by the receiving node. This 'burst mode' should not be enabled by default as it could potentially dump a high number of CAN frames into the netdev queue from the soft hrtimer context. This does not affect the system stability but is just not nice and cooperative. With this N_As/frame_txtime value the 'burst mode' is disabled by default. As user space applications usually do not set the frame_txtime element of struct can_isotp_options the new in-kernel default is very likely overwritten with zero when the sockopt() CAN_ISOTP_OPTS is invoked. To make sure that a N_As value of zero is only set intentional the value '0' is now interpreted as 'do not change the current value'. When a frame_txtime of zero is required for testing purposes this CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime. Link: https://lore.kernel.org/all/20220309120416.83514-2-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-03-10can: isotp: add local echo tx processing for consecutive framesOliver Hartkopp
Instead of dumping the CAN frames into the netdevice queue the process to transmit consecutive frames (CF) now waits for the frame to be transmitted and therefore echo'ed from the CAN interface. Link: https://lore.kernel.org/all/20220309120416.83514-1-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-03-09net: dsa: tag_rtl8_4: fix typo in modalias nameLuiz Angelo Daros de Luca
DSA_TAG_PROTO_RTL8_4L is not defined. It should be DSA_TAG_PROTO_RTL8_4T. Fixes: cd87fecdedd7 ("net: dsa: tag_rtl8_4: add rtl8_4t trailing variant") Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220309175641.12943-1-luizluca@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09net: axienet: Use napi_alloc_skb when refilling RX ringRobert Hancock
Use napi_alloc_skb to allocate memory when refilling the RX ring in axienet_poll for more efficiency. napi_alloc_skb() can reuse softirq-local cache of freed skbs which may still be cache-warm and skipping allocator calls. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Link: https://lore.kernel.org/r/20220308211013.1530955-1-robert.hancock@calian.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09tcp: adjust TSO packet sizes based on min_rttEric Dumazet
Back when tcp_tso_autosize() and TCP pacing were introduced, our focus was really to reduce burst sizes for long distance flows. The simple heuristic of using sk_pacing_rate/1024 has worked well, but can lead to too small packets for hosts in the same rack/cluster, when thousands of flows compete for the bottleneck. Neal Cardwell had the idea of making the TSO burst size a function of both sk_pacing_rate and tcp_min_rtt() Indeed, for local flows, sending bigger bursts is better to reduce cpu costs, as occasional losses can be repaired quite fast. This patch is based on Neal Cardwell implementation done more than two years ago. bbr is adjusting max_pacing_rate based on measured bandwidth, while cubic would over estimate max_pacing_rate. /proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable this new feature, in logarithmic steps. Tested: 100Gbit NIC, two hosts in the same rack, 4K MTU. 600 flows rate-limited to 20000000 bytes per second. Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO) ~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log ~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered" 96005 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000': 65,945.29 msec task-clock # 2.845 CPUs utilized 1,314,632 context-switches # 19935.279 M/sec 5,292 cpu-migrations # 80.249 M/sec 940,641 page-faults # 14264.023 M/sec 201,117,030,926 cycles # 3049769.216 GHz (83.45%) 17,699,435,405 stalled-cycles-frontend # 8.80% frontend cycles idle (83.48%) 136,584,015,071 stalled-cycles-backend # 67.91% backend cycles idle (83.44%) 53,809,530,436 instructions # 0.27 insn per cycle # 2.54 stalled cycles per insn (83.36%) 9,062,315,523 branches # 137422329.563 M/sec (83.22%) 153,008,621 branch-misses # 1.69% of all branches (83.32%) 23.182970846 seconds time elapsed TcpInSegs 15648792 0.0 TcpOutSegs 58659110 0.0 # Average of 3.7 4K segments per TSO packet TcpExtTCPDelivered 58654791 0.0 TcpExtTCPDeliveredCE 19 0.0 After patch: ~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log ~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered" 96046 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000': 48,982.58 msec task-clock # 2.104 CPUs utilized 186,014 context-switches # 3797.599 M/sec 3,109 cpu-migrations # 63.472 M/sec 941,180 page-faults # 19214.814 M/sec 153,459,763,868 cycles # 3132982.807 GHz (83.56%) 12,069,861,356 stalled-cycles-frontend # 7.87% frontend cycles idle (83.32%) 120,485,917,953 stalled-cycles-backend # 78.51% backend cycles idle (83.24%) 36,803,672,106 instructions # 0.24 insn per cycle # 3.27 stalled cycles per insn (83.18%) 5,947,266,275 branches # 121417383.427 M/sec (83.64%) 87,984,616 branch-misses # 1.48% of all branches (83.43%) 23.281200256 seconds time elapsed TcpInSegs 1434706 0.0 TcpOutSegs 58883378 0.0 # Average of 41 4K segments per TSO packet TcpExtTCPDelivered 58878971 0.0 TcpExtTCPDeliveredCE 9664 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20220309015757.2532973-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09tcp: autocork: take MSG_EOR hint into considerationEric Dumazet
tcp_should_autocork() is evaluating if it makes senses to not immediately send current skb, hoping that user space will add more payload on it by the time TCP stack reacts to upcoming TX completions. If current skb got MSG_EOR mark, then we know that no further data will be added, it is therefore futile to wait. SOF_TIMESTAMPING_TX_ACK will become a bit more accurate, if prior packets are still in qdisc/device queues. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20220309054706.2857266-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09stmmac: intel: Add ADL-N PCI IDMichael Sit Wei Hong
Add PCI ID for Ethernet TSN Controller on ADL-N. Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com> Link: https://lore.kernel.org/r/20220309033415.3370250-1-michael.wei.hong.sit@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09net/smc: fix -Wmissing-prototypes warning when CONFIG_SYSCTL not setDust Li
when CONFIG_SYSCTL not set, smc_sysctl_net_init/exit need to be static inline to avoid missing-prototypes if compile with W=1. Since __net_exit has noinline annotation when CONFIG_NET_NS not set, it should not be used with static inline. So remove the __net_init/exit when CONFIG_SYSCTL not set. Fixes: 7de8eb0d9039 ("net/smc: fix compile warning for smc_sysctl") Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Link: https://lore.kernel.org/r/20220309033051.41893-1-dust.li@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09Merge branch 'net-fungible-fix-errors-when-config_tls_device-n'Jakub Kicinski
Dimitris Michailidis says: ==================== net/fungible: fix errors when CONFIG_TLS_DEVICE=n This pair of patches fix compile errors in funeth when CONFIG_TLS_DEVICE=n. The errors are due to symbols that are not defined in this config but are used in code guarded by "if (IS_ENABLED(CONFIG_TLS_DEVICE) ..." One option is to place this code under preprocessor guards that will keep the compiler from looking at the code. The option adopted here is to define the offending symbols also when CONFIG_TLS_DEVICE=n. The first patch does this for two functions in tls.h. The second does the same for driver symbols and makes tls.h inclusion unconditional. ==================== Link: https://lore.kernel.org/r/20220309034032.405212-1-dmichail@fungible.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09net/fungible: fix errors when CONFIG_TLS_DEVICE=nDimitris Michailidis
Include the TLS headers unconditionally and define driver TLS symbols used in code compiled also when CONFIG_TLS_DEVICE=n to fix the following errors: ../drivers/net/ethernet/fungible/funeth/funeth_tx.c: In function ‘write_pkt_desc’: ../drivers/net/ethernet/fungible/funeth/funeth_tx.c:244:13: error: implicit declaration of function ‘tls_driver_ctx’ [-Werror=implicit-function-declaration] 244 | tls_ctx = tls_driver_ctx(skb->sk, TLS_OFFLOAD_CTX_DIR_TX); | ^~~~~~~~~~~~~~ ../drivers/net/ethernet/fungible/funeth/funeth_tx.c:244:37: error: ‘TLS_OFFLOAD_CTX_DIR_TX’ undeclared (first use in this function) 244 | tls_ctx = tls_driver_ctx(skb->sk, TLS_OFFLOAD_CTX_DIR_TX); | ^~~~~~~~~~~~~~~~~~~~~~ ../drivers/net/ethernet/fungible/funeth/funeth_tx.c:244:37: note: each undeclared identifier is reported only once for each function it appears in ../drivers/net/ethernet/fungible/funeth/funeth_tx.c:245:23: error: dereferencing pointer to incomplete type ‘struct fun_ktls_tx_ctx’ 245 | tls->tlsid = tls_ctx->tlsid; | ^~ ../drivers/net/ethernet/fungible/funeth/funeth_tx.c: In function ‘fun_start_xmit’: ../drivers/net/ethernet/fungible/funeth/funeth_tx.c:310:6: error: implicit declaration of function ‘tls_is_sk_tx_device_offloaded’ [-Werror=implicit-function-declaration] 310 | tls_is_sk_tx_device_offloaded(skb->sk)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/net/ethernet/fungible/funeth/funeth_tx.c:311:9: error: implicit declaration of function ‘fun_tls_tx’; did you mean ‘fun_xdp_tx’? [-Werror=implicit-function-declaration] 311 | skb = fun_tls_tx(skb, q, &tls_len); | ^~~~~~~~~~ | fun_xdp_tx ../drivers/net/ethernet/fungible/funeth/funeth_tx.c:311:7: warning: assignment to ‘struct sk_buff *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion] 311 | skb = fun_tls_tx(skb, q, &tls_len); | ^ Fixes: db37bc177dae ("net/funeth: add the data path") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09net/tls: Provide {__,}tls_driver_ctx() unconditionallyDimitris Michailidis
Having the definitions of {__,}tls_driver_ctx() under an #if guard means code referencing them also needs to rely on the preprocessor. The protection doesn't appear needed so make the definitions unconditional. Fixes: db37bc177dae ("net/funeth: add the data path") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09bnxt: revert hastily merged uAPI aberrationsJakub Kicinski
This reverts: commit 02acd399533e ("bnxt_en: parse result field when NVRAM package install fails") commit 22f5dba5065d ("bnxt_en: add an nvm test for hw diagnose") commit bafed3f231f7 ("bnxt_en: implement hw health reporter") These patches are still under discussion / I don't think they are right, and since the authors don't reply promptly let me lessen my load of "things I need to resolve before next release" and revert them. Acked-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20220308173659.304915-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09net: stmmac: switch no PTP HW support message to info levelHeiner Kallweit
If HW doesn't support PTP, then it doesn't support it. This is neither a problem nor can the user do something about it. Therefore change the message level to info. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/ee685745-f1ab-e9bf-f20e-077d55dff441@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09e1000e: Print PHY register address when MDI read/write failsChen Yu
There is occasional suspend error from e1000e which blocks the system from further suspending. And the issue was found on a WhiskeyLake-U platform with I219-V: [ 20.078957] PM: pci_pm_suspend(): e1000e_pm_suspend+0x0/0x780 [e1000e] returns -2 [ 20.078970] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -2 [ 20.078974] e1000e 0000:00:1f.6: PM: pci_pm_suspend+0x0/0x170 returned -2 after 371012 usecs [ 20.078978] e1000e 0000:00:1f.6: PM: failed to suspend async: error -2 According to the code flow, this might be caused by broken MDI read/write to PHY registers. However currently the code does not tell us which register is broken. Thus enhance the debug information to print the offender PHY register. So the next the issue is reproduced, this information could be used for narrow down. Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Todd Brandt <todd.e.brandt@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220308172030.451566-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09ptp: idt82p33: use rsmu driver to access i2c/spi busMin Li
rsmu (Renesas Synchronization Management Unit ) driver is located in drivers/mfd and responsible for creating multiple devices including idt82p33 phc, which will then use the exposed regmap and mutex handle to access i2c/spi bus. Signed-off-by: Min Li <min.li.xe@renesas.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/1646748651-16811-1-git-send-email-min.li.xe@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09net: dsa: microchip: ksz9477: implement MTU configurationOleksij Rempel
This chips supports two ways to configure max MTU size: - by setting SW_LEGAL_PACKET_DISABLE bit: if this bit is 0 allowed packed size will be between 64 and bytes 1518. If this bit is 1, it will accept packets up to 2000 bytes. - by setting SW_JUMBO_PACKET bit. If this bit is set, the chip will ignore SW_LEGAL_PACKET_DISABLE value and use REG_SW_MTU__2 register to configure MTU size. Current driver has disabled SW_JUMBO_PACKET bit and activates SW_LEGAL_PACKET_DISABLE. So the switch will pass all packets up to 2000 without any way to configure it. By providing port_change_mtu we are switch to SW_JUMBO_PACKET way and will be able to configure MTU up to ~9000. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220308135857.1119028-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09drivers: vxlan: fix returnvar.cocci warningGuo Zhengkui
Fix the following coccicheck warning: drivers/net/vxlan/vxlan_core.c:2995:5-8: Unneeded variable: "ret". Return "0" on line 3004. Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device") Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Acked-by: Roopa Prabhu <roopa@nvidia.com> Link: https://lore.kernel.org/r/20220308134321.29862-1-guozhengkui@vivo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09net: tcp: fix shim definition of tcp_inbound_md5_hashVladimir Oltean
When CONFIG_TCP_MD5SIG isn't enabled, there is a compilation bug due to the fact that the static inline definition of tcp_inbound_md5_hash() has an unexpected semicolon. Remove it. Fixes: 1330b6ef3313 ("skb: make drop reason booleanable") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220309122012.668986-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09MAINTAINERS: rectify entry for REALTEK RTL83xx SMI DSA ROUTER CHIPSLukas Bulwahn
Commit 429c83c78ab2 ("dt-bindings: net: dsa: realtek: convert to YAML schema, add MDIO") converts realtek-smi.txt to realtek.yaml, but missed to adjust its reference in MAINTAINERS. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a broken reference. Repair this file reference in REALTEK RTL83xx SMI DSA ROUTER CHIPS. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net: lan966x: Add spinlock for frame transmission from CPU.Horatiu Vultur
The registers used to inject a frame to one of the ports is shared between all the net devices. Therefore, there can be race conditions for accessing the registers when two processes send frames at the same time on different ports. To fix this, add a spinlock around the function 'lan966x_port_ifh_xmit()'. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net: ethernet: sun: use min_t() to make code cleanerChangcheng Deng
Use min_t() in order to make code cleaner. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net/fungible: CONFIG_FUN_CORE needs SBITMAPDimitris Michailidis
fun_core.ko uses sbitmaps and needs to select SBITMAP. Fixes below errors: ERROR: modpost: "__sbitmap_queue_get" [drivers/net/ethernet/fungible/funcore/funcore.ko] undefined! ERROR: modpost: "sbitmap_finish_wait" [drivers/net/ethernet/fungible/funcore/funcore.ko] undefined! ERROR: modpost: "sbitmap_queue_clear" [drivers/net/ethernet/fungible/funcore/funcore.ko] undefined! ERROR: modpost: "sbitmap_prepare_to_wait" [drivers/net/ethernet/fungible/funcore/funcore.ko] undefined! ERROR: modpost: "sbitmap_queue_init_node" [drivers/net/ethernet/fungible/funcore/funcore.ko] undefined! ERROR: modpost: "sbitmap_queue_wake_all" [drivers/net/ethernet/fungible/funcore/funcore.ko] undefined! v2: correct "Fixes" SHA Fixes: 749efb1e6d73 ("net/fungible: Kconfig, Makefiles, and MAINTAINERS") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net/fungible: Fix local_memory_node errorDimitris Michailidis
Stephen Rothwell reported the following failure on powerpc: ERROR: modpost: ".local_memory_node" [drivers/net/ethernet/fungible/funeth/funeth.ko] undefined! AFAICS this is because local_memory_node() is a non-inline non-exported function when CONFIG_HAVE_MEMORYLESS_NODES=y. It is also the wrong API to get a CPU's memory node. Use cpu_to_mem() in the two spots it's used. Fixes: ee6373ddf3a9 ("net/funeth: probing and netdev ops") Fixes: db37bc177dae ("net/funeth: add the data path") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09skb: make drop reason booleanableJakub Kicinski
We have a number of cases where function returns drop/no drop decision as a boolean. Now that we want to report the reason code as well we have to pass extra output arguments. We can make the reason code evaluate correctly as bool. I believe we're good to reorder the reasons as they are reported to user space as strings. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09Merge branch 'dsa-next-fixups'David S. Miller
Vladimir Oltean says: ==================== Incremental fixups for DSA unicast filtering There are some bugs I've discovered in the recently merged "DSA unicast filtering" series: https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/ First bug is the dereference of an uninitialized list (dp->fdbs) when the "initial" tag protocol is placed in the device tree for the Felix switch driver. This is a scenario I hadn't tested. It is handled by patches 1-3. Second bug is actually a sum of bugs that canceled each other out during my previous testing. The MAC address change of a DSA slave interface breaks termination for the other slave interfaces. But this actually does not happen if the slave interface whose address is changing is down. And even when up, traffic termination is still not broken because we fail to properly disable host flooding. Patches 4-6 handle this for the Felix driver (the only one benefiting from unicast filtering so far). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net: dsa: felix: avoid early deletion of host FDB entriesVladimir Oltean
The Felix driver declares FDB isolation but puts all standalone ports in VID 0. This is mostly problem-free as discussed with Alvin here: https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870 however there is one catch. DSA still thinks that FDB entries are installed on the CPU port as many times as there are user ports, and this is problematic when multiple user ports share the same MAC address. Consider the default case where all user ports inherit their MAC address from the DSA master, and then the user runs: ip link set swp0 address 00:01:02:03:04:05 The above will make dsa_slave_set_mac_address() call dsa_port_standalone_host_fdb_add() for 00:01:02:03:04:05 in port 0's standalone database, and dsa_port_standalone_host_fdb_del() for the old address of swp0, again in swp0's standalone database. Both the ->port_fdb_add() and ->port_fdb_del() will be propagated down to the felix driver, which will end up deleting the old MAC address from the CPU port. But this is still in use by other user ports, so we end up breaking unicast termination for them. There isn't a problem in the fact that DSA keeps track of host standalone addresses in the individual database of each user port: some drivers like sja1105 need this. There also isn't a problem in the fact that some drivers choose the same VID/FID for all standalone ports. It is just that the deletion of these host addresses must be delayed until they are known to not be in use any longer, and only the driver has this knowledge. Since DSA keeps these addresses in &cpu_dp->fdbs and &cpu_db->mdbs, it is just a matter of walking over those lists and see whether the same MAC address is present on the CPU port in the port db of another user port. I have considered reusing the generic dsa_port_walk_fdbs() and dsa_port_walk_mdbs() schemes for this, but locking makes it difficult. In the ->port_fdb_add() method and co, &dp->addr_lists_lock is held, but dsa_port_walk_fdbs() also acquires that lock. Also, even assuming that we introduce an unlocked variant of the address iterator, we'd still need some relatively complex data structures, and a void *ctx in the dsa_fdb_walk_cb_t which we don't currently pass, such that drivers are able to figure out, after iterating, whether the same MAC address is or isn't present in the port db of another port. All the above, plus the fact that I expect other drivers to follow the same model as felix where all standalone ports use the same FID, made me conclude that a generic method provided by DSA is necessary: dsa_fdb_present_in_other_db() and the mdb equivalent. Felix calls this from the ->port_fdb_del() handler for the CPU port, when the database was classified to either a port db, or a LAG db. For symmetry, we also call this from ->port_fdb_add(), because if the address was installed once, then installing it a second time serves no purpose: it's already in hardware in VID 0 and it affects all standalone ports. This change moves dsa_db_equal() from switch.c to dsa.c, since it now has one more caller. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net: dsa: felix: actually disable flooding towards NPI portVladimir Oltean
The two blamed commits were written/tested individually but not together. When put together, commit 90897569beb1 ("net: dsa: felix: start off with flooding disabled on the CPU port"), which deletes a reinitialization of PGID_UC/PGID_MC/PGID_BC, is no longer sufficient to ensure that these port masks don't contain the CPU port module. This is because commit b903a6bd2e19 ("net: dsa: felix: migrate flood settings from NPI to tag_8021q CPU port") overwrites the hardware default settings towards the CPU port module with the settings that used to be present on the NPI port treated as a regular port. There, flooding is enabled, so flooding would get enabled on the CPU port module too. Adding conditional logic somewhere within felix_setup_tag_npi() to configure either the default no-flood policy or the flood policy inherited from the tag_8021q CPU port from a previous call to dsa_port_manage_cpu_flood() is getting complicated. So just let the migration logic do its thing during initial setup (which will temporarily turn on flooding), then turn flooding off for the NPI port after felix_set_tag_protocol() finishes. Here we are in felix_setup(), so the DSA slave interfaces are not yet created, and this doesn't affect traffic in any way. Fixes: 90897569beb1 ("net: dsa: felix: start off with flooding disabled on the CPU port") Fixes: b903a6bd2e19 ("net: dsa: felix: migrate flood settings from NPI to tag_8021q CPU port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net: dsa: be mostly no-op in dsa_slave_set_mac_address when downVladimir Oltean
Since the slave unicast address is synced to hardware and to the DSA master during dsa_slave_open(), this means that a call to dsa_slave_set_mac_address() while the slave interface is down will result to a call to dsa_port_standalone_host_fdb_del() and to dev_uc_del() for the MAC address while there was no previous dsa_port_standalone_host_fdb_add() or dev_uc_add(). This is a partial revert of the blamed commit below, which was too aggressive. Fixes: 35aae5ab9121 ("net: dsa: remove workarounds for changing master promisc/allmulti only while up") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net: dsa: felix: drop "bool change" from felix_set_tag_protocolVladimir Oltean
We no longer need the workaround in the felix driver to avoid calling dsa_port_walk_fdbs() when &dp->fdbs is an uninitialized list, because that list is now initialized from all call paths of felix_set_tag_protocol(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net: dsa: move port lists initialization to dsa_port_touchVladimir Oltean
&cpu_db->fdbs and &cpu_db->mdbs may be uninitialized lists during some call paths of felix_set_tag_protocol(). There was an attempt to avoid calling dsa_port_walk_fdbs() during setup by using a "bool change" in the felix driver, but this doesn't work when the tagging protocol is defined in the device tree, and a change is triggered by DSA at pseudo-runtime: dsa_tree_setup_switches -> dsa_switch_setup -> dsa_switch_setup_tag_protocol -> ds->ops->change_tag_protocol dsa_tree_setup_ports -> dsa_port_setup -> &dp->fdbs and &db->mdbs only get initialized here So it seems like the only way to fix this is to move the initialization of these lists earlier. dsa_port_touch() is called from dsa_switch_touch_ports() which is called from dsa_switch_parse_of(), and this runs completely before dsa_tree_setup(). Similarly, dsa_switch_release_ports() runs after dsa_tree_teardown(). Fixes: f9cef64fa23f ("net: dsa: felix: migrate host FDB and MDB entries when changing tag proto") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net: dsa: warn if port lists aren't empty in dsa_port_teardownVladimir Oltean
There has been recent work towards matching each switchdev object addition with a corresponding deletion. Therefore, having elements in the fdbs, mdbs, vlans lists at the time of a shared (DSA, CPU) port's teardown is indicative of a bug somewhere else, and not something that is to be expected. We shouldn't try to silently paper over that. Instead, print a warning and a stack trace. This change is a prerequisite for moving the initialization/teardown of these lists. Make it clear that clearing the lists isn't needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09Merge branch 'ptrp-ocp-next'David S. Miller
Jonathan Lemon says: ==================== ptp: ocp: update devlink information Both of these patches update the information displayed via devlink. v1 -> v2: remove board.manufacture information ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09ptp: ocp: Update devlink firmware display path.Jonathan Lemon
Cache the firmware version when the card is initialized, and use this field to populate the devlink firmware information. The cached firmware version will be used for feature gating in upcoming patches. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09ptp: ocp: add nvmem interface for accessing eepromJonathan Lemon
Add the at24 drivers for the eeprom, and use the accessors via the nvmem API instead of direct i2c accesses. This makes things cleaner. Add an eeprom map table which specifies where the pre-defined information is located. Retrieve the information and and export it via the devlink interface. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nextDavid S. Miller
-queue Tony Nguyen says: ==================== 10GbE Intel Wired LAN Driver Updates 2022-03-08 This series contains updates to ixgbe and ixgbevf drivers. Slawomir adds an implementation for ndo_set_vf_link_state() to allow for disabling of VF link state as well a mailbox implementation so the VF can query the state. Additionally, for 82599, the option to disable a VF after receiving several malicious driver detection (MDD) events are encountered is added. For ixgbevf, the corresponding implementation to query and report a disabled state is added. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-08net: prestera: acl: make read-only array client_map static constColin Ian King
Don't populate the read-only array client_map on the stack but instead make it static const. Also makes the object code a little smaller. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220307221349.164585-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08ptp: ocp: correct label for error pathJonathan Lemon
When devlink_register() was removed from the error path, the corresponding label was not updated. Rename the label for readability puposes, no functional change. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/r/20220308000458.2166-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08SO_ZEROCOPY should return -EOPNOTSUPP rather than -ENOTSUPPSamuel Thibault
ENOTSUPP is documented as "should never be seen by user programs", and thus not exposed in <errno.h>, and thus applications cannot safely check against it (they get "Unknown error 524" as strerror). We should rather return the well-known -EOPNOTSUPP. This is similar to 2230a7ef5198 ("drop_monitor: Use correct error code") and 4a5cdc604b9c ("net/tls: Fix return values to avoid ENOTSUPP"), which did not seem to cause problems. Signed-off-by: Samuel Thibault <samuel.thibault@labri.fr> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20220307223126.djzvg44v2o2jkjsx@begin Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08Merge branch 'mptcp-advertisement-reliability-improvement-and-misc-updates'Jakub Kicinski
Mat Martineau says: ==================== mptcp: Advertisement reliability improvement and misc. updates Patch 1 adds a helpful debug tracepoint for outgoing MPTCP packets. Patch 2 is a small "magic number" refactor. Patches 3 & 4 refactor parts of the mptcp_join.sh selftest. No change in test coverage. Patch 5 ensures only advertised address IDs are un-advertised. Patches 6-8 improve handling of an edge case where endpoint IDs need to be created on-the-fly when adding subflows. Includes selftest coverage. Patch 9 adds validation of the fullmesh flag in a MPTCP netlink command, which was overlooked when this flag was introduced for 5.18. ==================== Link: https://lore.kernel.org/r/20220307204439.65164-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08mptcp: add fullmesh flag check for adding addressGeliang Tang
The fullmesh flag mustn't be used with the signal flag when adding an address. This patch added the necessary flags check for this case. Fixes: 73c762c1f07d ("mptcp: set fullmesh flag in pm_netlink") Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08selftests: mptcp: add implicit endpoint test casePaolo Abeni
Ensure implicit endpoint are created when expected and that the user-space can update them Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08mptcp: strict local address ID selectionPaolo Abeni
The address ID selection for MPJ subflows created in response to incoming ADD_ADDR option is currently unreliable: it happens at MPJ socket creation time, when the local address could be unknown. Additionally, if the no local endpoint is available for the local address, a new dummy endpoint is created, confusing the user-land. This change refactor the code to move the address ID selection inside the rebuild_header() helper, when the local address eventually selected by the route lookup is finally known. If the address used is not mapped by any endpoint - and thus can't be advertised/removed pick the id 0 instead of allocate a new endpoint. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08mptcp: introduce implicit endpointsPaolo Abeni
In some edge scenarios, an MPTCP subflows can use a local address mapped by a "implicit" endpoint created by the in-kernel path manager. Such endpoints presence can be confusing, as it's creation is hard to track and will prevent the later endpoint creation from the user-space using the same address. Define a new endpoint flag to mark implicit endpoints and allow the user-space to replace implicit them with user-provided data at endpoint creation time. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08mptcp: more careful RM_ADDR generationPaolo Abeni
The in-kernel MPTCP path manager, when processing the MPTCP_PM_CMD_FLUSH_ADDR command, generates RM_ADDR events for each known local address. While that is allowed by the RFC, it makes unpredictable the exact number of RM_ADDR generated when both ends flush the PM addresses. This change restricts the RM_ADDR generation to previously explicitly announced addresses, and adjust the expected results in a bunch of related self-tests. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08selftests: mptcp: Rename wait functionMat Martineau
The "selftests: mptcp: improve 'fair usage on close' stability" commit changed that self test to check the TcpAttemptFails MIB instead of looking for TW sockets. The associated bash function wasn't renamed in that commit because of the merge conflicts it would cause, so this commit updates the function name as Paolo originally intended. Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08selftests: mptcp: join: allow running -cCiMatthieu Baerts
Without this patch, no tests would be ran when launching: mptcp_join.sh -cCi In any order or a combination with 2 of these letters. The recommended way with getopt is first parse all options and then act. This allows to do some actions in priority, e.g. display the help menu and stop. But also some global variables changing the behaviour of this selftests -- like the ones behind -cCi options -- can be set before running the different tests. By doing that, we can also avoid long and unreadable regex. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08mptcp: use MPTCP_SUBFLOW_NODATAGeliang Tang
Set subflow->data_avail with the enum value MPTCP_SUBFLOW_NODATA, instead of using 0 directly. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08mptcp: add tracepoint in mptcp_sendmsg_fragGeliang Tang
The tracepoint in get_mapping_status() only dumped the incoming mpext fields. This patch added a new tracepoint in mptcp_sendmsg_frag() to dump the outgoing mpext too. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08ixgbevf: add disable link stateSlawomir Mrozowicz
Add possibility to disable link state if it is administratively disabled in PF. It is part of the general functionality that allows the PF driver to control the state of the virtual link VF devices. Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-08ixgbe: add improvement for MDD response functionalitySlawomir Mrozowicz
The 82599 PF driver disable VF driver after a special MDD event occurs. Adds the option for administrators to control whether VFs are automatically disabled after several MDD events. The automatically disabling is now the default mode for 82599 PF driver, as it is more reliable. This addresses CVE-2021-33061. Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>