linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-02-08	ptp_pch: Use ioread64_lo_hi() / iowrite64_lo_hi()	Andy Shevchenko
	There is already helper functions to do 64-bit I/O on 32-bit machines or buses, thus we don't need to reinvent the wheel. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220207210730.75252-2-andriy.shevchenko@linux.intel.com Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	ptp_pch: use mac_pton()	Andy Shevchenko
	Use mac_pton() instead of custom approach. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20220207210730.75252-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path	Eric Dumazet
	ip[6]mr_free_table() can only be called under RTNL lock. RTNL: assertion failed at net/core/dev.c (10367) WARNING: CPU: 1 PID: 5890 at net/core/dev.c:10367 unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367 Modules linked in: CPU: 1 PID: 5890 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller-11627-g422ee58dc0ef #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367 Code: 0f 85 9b ee ff ff e8 69 07 4b fa ba 7f 28 00 00 48 c7 c6 00 90 ae 8a 48 c7 c7 40 90 ae 8a c6 05 6d b1 51 06 01 e8 8c 90 d8 01 <0f> 0b e9 70 ee ff ff e8 3e 07 4b fa 4c 89 e7 e8 86 2a 59 fa e9 ee RSP: 0018:ffffc900046ff6e0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888050f51d00 RSI: ffffffff815fa008 RDI: fffff520008dfece RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815f3d6e R11: 0000000000000000 R12: 00000000fffffff4 R13: dffffc0000000000 R14: ffffc900046ff750 R15: ffff88807b7dc000 FS: 00007f4ab736e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee0b4f8990 CR3: 000000001e7d2000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> mroute_clean_tables+0x244/0xb40 net/ipv6/ip6mr.c:1509 ip6mr_free_table net/ipv6/ip6mr.c:389 [inline] ip6mr_rules_init net/ipv6/ip6mr.c:246 [inline] ip6mr_net_init net/ipv6/ip6mr.c:1306 [inline] ip6mr_net_init+0x3f0/0x4e0 net/ipv6/ip6mr.c:1298 ops_init+0xaf/0x470 net/core/net_namespace.c:140 setup_net+0x54f/0xbb0 net/core/net_namespace.c:331 copy_net_ns+0x318/0x760 net/core/net_namespace.c:475 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110 copy_namespaces+0x391/0x450 kernel/nsproxy.c:178 copy_process+0x2e0c/0x7300 kernel/fork.c:2167 kernel_clone+0xe7/0xab0 kernel/fork.c:2555 __do_sys_clone+0xc8/0x110 kernel/fork.c:2672 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f4ab89f9059 Code: Unable to access opcode bytes at RIP 0x7f4ab89f902f. RSP: 002b:00007f4ab736e118 EFLAGS: 00000206 ORIG_RAX: 0000000000000038 RAX: ffffffffffffffda RBX: 00007f4ab8b0bf60 RCX: 00007f4ab89f9059 RDX: 0000000020000280 RSI: 0000000020000270 RDI: 0000000040200000 RBP: 00007f4ab8a5308d R08: 0000000020000300 R09: 0000000020000300 R10: 00000000200002c0 R11: 0000000000000206 R12: 0000000000000000 R13: 00007ffc3977cc1f R14: 00007f4ab736e300 R15: 0000000000022000 </TASK> Fixes: f243e5a7859a ("ipmr,ip6mr: call ip6mr_free_table() on failure path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cong.wang@bytedance.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220208053451.2885398-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	net: ethernet: litex: Add the dependency on HAS_IOMEM	Cai Huoqing
	The LiteX driver uses devm io function API which needs HAS_IOMEM enabled, so add the dependency on HAS_IOMEM. Fixes: ee7da21ac4c3 ("net: Add driver for LiteX's LiteETH network interface") Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev> Link: https://lore.kernel.org/r/20220208013308.6563-1-cai.huoqing@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	Merge branch 'net-speedup-netns-dismantles'	Jakub Kicinski
	Eric Dumazet says: ==================== net: speedup netns dismantles From: Eric Dumazet <edumazet@google.com> In this series, I made network namespace deletions more scalable, by 4x on the little benchmark described in this cover letter. - Remove bottleneck on ipv6 addrconf, by replacing a global hash table to a per netns one. - Rework many (struct pernet_operations)->exit() handlers to exit_batch() ones. This removes many rtnl acquisitions, and gives to cleanup_net() kind of a priority over rtnl ownership. Tested on a host with 24 cpus (48 HT) Test script: for nr in {1..10} do (for i in {1..10000}; do unshare -n /bin/bash -c "ifconfig lo up"; done) & done wait for i in {1..10} do sleep 1 echo 3 >/proc/sys/vm/drop_caches grep net_namespace /proc/slabinfo done Before: We can see host struggles to clean the netns, even after there are no new creations. Memory cost is high, because each netns consumes a good amount of memory. time ./unshare10.sh net_namespace 82634 82634 3968 1 1 : tunables 24 12 8 : slabdata 82634 82634 0 net_namespace 82634 82634 3968 1 1 : tunables 24 12 8 : slabdata 82634 82634 0 net_namespace 82634 82634 3968 1 1 : tunables 24 12 8 : slabdata 82634 82634 0 net_namespace 82634 82634 3968 1 1 : tunables 24 12 8 : slabdata 82634 82634 0 net_namespace 82634 82634 3968 1 1 : tunables 24 12 8 : slabdata 82634 82634 0 net_namespace 82634 82634 3968 1 1 : tunables 24 12 8 : slabdata 82634 82634 0 net_namespace 82634 82634 3968 1 1 : tunables 24 12 8 : slabdata 82634 82634 0 net_namespace 82634 82634 3968 1 1 : tunables 24 12 8 : slabdata 82634 82634 0 net_namespace 82634 82634 3968 1 1 : tunables 24 12 8 : slabdata 82634 82634 0 net_namespace 37214 37792 3968 1 1 : tunables 24 12 8 : slabdata 37214 37792 192 real 6m57.766s user 3m37.277s sys 40m4.826s After: We can see the script completes much faster, the kernel thread doing the cleanup_net() keeps up just fine. Memory cost is not too big. time ./unshare10.sh net_namespace 9945 9945 4096 1 1 : tunables 24 12 8 : slabdata 9945 9945 0 net_namespace 4087 4665 4096 1 1 : tunables 24 12 8 : slabdata 4087 4665 192 net_namespace 4082 4607 4096 1 1 : tunables 24 12 8 : slabdata 4082 4607 192 net_namespace 234 761 4096 1 1 : tunables 24 12 8 : slabdata 234 761 192 net_namespace 224 751 4096 1 1 : tunables 24 12 8 : slabdata 224 751 192 net_namespace 218 745 4096 1 1 : tunables 24 12 8 : slabdata 218 745 192 net_namespace 193 667 4096 1 1 : tunables 24 12 8 : slabdata 193 667 172 net_namespace 167 609 4096 1 1 : tunables 24 12 8 : slabdata 167 609 152 net_namespace 167 609 4096 1 1 : tunables 24 12 8 : slabdata 167 609 152 net_namespace 157 609 4096 1 1 : tunables 24 12 8 : slabdata 157 609 152 real 1m43.876s user 3m39.728s sys 7m36.342s ==================== Link: https://lore.kernel.org/r/20220208045038.2635826-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	net: remove default_device_exit()	Eric Dumazet
	For some reason default_device_ops kept two exit method: 1) default_device_exit() is called for each netns being dismantled in a cleanup_net() round. This acquires rtnl for each invocation. 2) default_device_exit_batch() is called once with the list of all netns int the batch, allowing for a single rtnl invocation. Get rid of the .exit() method to handle the logic from default_device_exit_batch(), to decrease the number of rtnl acquisition to one. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	bonding: switch bond_net_exit() to batch mode	Eric Dumazet
	cleanup_net() is competing with other rtnl users. Batching bond_net_exit() factorizes all rtnl acquistions to a single one, giving chance for cleanup_net() to progress much faster, holding rtnl a bit longer. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	can: gw: switch cangw_pernet_exit() to batch mode	Eric Dumazet
	cleanup_net() is competing with other rtnl users. Avoiding to acquire rtnl for each netns before calling cgw_remove_all_jobs() gives chance for cleanup_net() to progress much faster, holding rtnl a bit longer. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	ipmr: introduce ipmr_net_exit_batch()	Eric Dumazet
	cleanup_net() is competing with other rtnl users. Avoiding to acquire rtnl for each netns before calling ipmr_rules_exit() gives chance for cleanup_net() to progress much faster, holding rtnl a bit longer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	ip6mr: introduce ip6mr_net_exit_batch()	Eric Dumazet
	cleanup_net() is competing with other rtnl users. Avoiding to acquire rtnl for each netns before calling ip6mr_rules_exit() gives chance for cleanup_net() to progress much faster, holding rtnl a bit longer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	ipv6: change fib6_rules_net_exit() to batch mode	Eric Dumazet
	cleanup_net() is competing with other rtnl users. fib6_rules_net_exit() seems a good candidate for exit_batch(), as this gives chance for cleanup_net() to progress much faster, holding rtnl a bit longer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	ipv4: add fib_net_exit_batch()	Eric Dumazet
	cleanup_net() is competing with other rtnl users. Instead of acquiring rtnl at each fib_net_exit() invocation, add fib_net_exit_batch() so that rtnl is acquired once. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	nexthop: change nexthop_net_exit() to nexthop_net_exit_batch()	Eric Dumazet
	cleanup_net() is competing with other rtnl users. nexthop_net_exit() seems a good candidate for exit_batch(), as this gives chance for cleanup_net() to progress much faster, holding rtnl a bit longer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	ipv6/addrconf: switch to per netns inet6_addr_lst hash table	Eric Dumazet
	IPv6 does not scale very well with the number of IPv6 addresses. It uses a global (shared by all netns) hash table with 256 buckets. Some functions like addrconf_verify_rtnl() and addrconf_ifdown() have to iterate all addresses in the hash table. I have seen addrconf_verify_rtnl() holding the cpu for 10ms or more. Switch to the per netns hashtable (and spinlock) added in prior patches. This considerably speeds up netns dismantle times on hosts with thousands of netns. This also has an impact on regular (fast path) IPv6 processing. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	ipv6/addrconf: use one delayed work per netns	Eric Dumazet
	Next step for using per netns inet6_addr_lst is to have per netns work item to ultimately call addrconf_verify_rtnl() and addrconf_verify() with a new 'struct net*' argument. Everything is still using the global inet6_addr_lst[] table. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	ipv6/addrconf: allocate a per netns hash table	Eric Dumazet
	Add a per netns hash table and a dedicated spinlock, first step to get rid of the global inet6_addr_lst[] one. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	ibmvnic: don't release napi in __ibmvnic_open()	Sukadev Bhattiprolu
	If __ibmvnic_open() encounters an error such as when setting link state, it calls release_resources() which frees the napi structures needlessly. Instead, have __ibmvnic_open() only clean up the work it did so far (i.e. disable napi and irqs) and leave the rest to the callers. If caller of __ibmvnic_open() is ibmvnic_open(), it should release the resources immediately. If the caller is do_reset() or do_hard_reset(), they will release the resources on the next reset. This fixes following crash that occurred when running the drmgr command several times to add/remove a vnic interface: [102056] ibmvnic 30000003 env3: Disabling rx_scrq[6] irq [102056] ibmvnic 30000003 env3: Disabling rx_scrq[7] irq [102056] ibmvnic 30000003 env3: Replenished 8 pools Kernel attempted to read user page (10) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000010 Faulting instruction address: 0xc000000000a3c840 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries ... CPU: 9 PID: 102056 Comm: kworker/9:2 Kdump: loaded Not tainted 5.16.0-rc5-autotest-g6441998e2e37 #1 Workqueue: events_long __ibmvnic_reset [ibmvnic] NIP: c000000000a3c840 LR: c0080000029b5378 CTR: c000000000a3c820 REGS: c0000000548e37e0 TRAP: 0300 Not tainted (5.16.0-rc5-autotest-g6441998e2e37) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28248484 XER: 00000004 CFAR: c0080000029bdd24 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0 GPR00: c0080000029b55d0 c0000000548e3a80 c0000000028f0200 0000000000000000 ... NIP [c000000000a3c840] napi_enable+0x20/0xc0 LR [c0080000029b5378] __ibmvnic_open+0xf0/0x430 [ibmvnic] Call Trace: [c0000000548e3a80] [0000000000000006] 0x6 (unreliable) [c0000000548e3ab0] [c0080000029b55d0] __ibmvnic_open+0x348/0x430 [ibmvnic] [c0000000548e3b40] [c0080000029bcc28] __ibmvnic_reset+0x500/0xdf0 [ibmvnic] [c0000000548e3c60] [c000000000176228] process_one_work+0x288/0x570 [c0000000548e3d00] [c000000000176588] worker_thread+0x78/0x660 [c0000000548e3da0] [c0000000001822f0] kthread+0x1c0/0x1d0 [c0000000548e3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 7d2948f8 792307e0 4e800020 60000000 3c4c01eb 384239e0 f821ffd1 39430010 38a0fff6 e92d1100 f9210028 39200000 <e9030010> f9010020 60420000 e9210020 ---[ end trace 5f8033b08fd27706 ]--- Fixes: ed651a10875f ("ibmvnic: Updated reset handling") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Link: https://lore.kernel.org/r/20220208001918.900602-1-sukadev@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	Merge branch 'more-dsa-fixes-for-devres-mdiobus_-alloc-register'	Jakub Kicinski
	Vladimir Oltean says: ==================== More DSA fixes for devres + mdiobus_{alloc,register} The initial patch series "[net,0/2] Fix mdiobus users with devres" https://patchwork.kernel.org/project/netdevbpf/cover/20210920214209.1733768-1-vladimir.oltean@nxp.com/ fixed some instances where DSA drivers on slow buses (SPI, I2C) trigger a panic (changed since then to a warn) in mdiobus_free. That was due to devres calling mdiobus_free() with no prior mdiobus_unregister(), which again was due to commit ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") by Bartosz Golaszewski. Rafael Richter and Daniel Klauer report yet another variation on that theme, but this time it applies to any DSA switch driver, not just those on buses which have a "->shutdown() calls ->remove() which unregisters children" sequence. Their setup is that of an LX2160A DPAA2 SoC driving a Marvell DSA switch (MDIO). DPAA2 Ethernet drivers probe on the "fsl-mc" bus (drivers/bus/fsl-mc/fsl-mc-bus.c). This bus is meant to be the kernel-side representation of the networking objects kept by the Management Complex (MC) firmware. The fsl-mc bus driver has this pattern: static void fsl_mc_bus_shutdown(struct platform_device *pdev) { fsl_mc_bus_remove(pdev); } which proceeds to remove the children on the bus. Among those children, the dpaa2-eth network driver. When dpaa2-eth is a DSA master, this removal of the master on shutdown trips up the device link created by dsa_master_setup(), and as such, the Marvell switch is also removed. From this point on, readers can revisit the description of commits 74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres") 5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres") since the prerequisites for the BUG_ON in mdiobus_free() have been accomplished if there is a devres mismatch between mdiobus_alloc() and mdiobus_register(). Most DSA drivers have this kind of mismatch, and upon my initial assessment I had not realized the possibility described above, so I didn't fix it. This patch series walks through all drivers and makes them use either fully devres, or no devres. I am aware that there are DSA drivers that are only known to be tested with a single DSA master, so some patches are probably overkill for them. But code is copy-pasted from so many sources without fully understanding the differences, that I think it's better to not leave an in-tree source of inspiration that may lead to subtle breakage if not adapted properly. ==================== Link: https://lore.kernel.org/r/20220207161553.579933-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	net: dsa: lantiq_gswip: don't use devres for mdiobus	Vladimir Oltean
	As explained in commits: 74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres") 5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres") mdiobus_free() will panic when called from devm_mdiobus_free() <- devres_release_all() <- __device_release_driver(), and that mdiobus was not previously unregistered. The GSWIP switch is a platform device, so the initial set of constraints that I thought would cause this (I2C or SPI buses which call ->remove on ->shutdown) do not apply. But there is one more which applies here. If the DSA master itself is on a bus that calls ->remove from ->shutdown (like dpaa2-eth, which is on the fsl-mc bus), there is a device link between the switch and the DSA master, and device_links_unbind_consumers() will unbind the GSWIP switch driver on shutdown. So the same treatment must be applied to all DSA switch drivers, which is: either use devres for both the mdiobus allocation and registration, or don't use devres at all. The gswip driver has the code structure in place for orderly mdiobus removal, so just replace devm_mdiobus_alloc() with the non-devres variant, and add manual free where necessary, to ensure that we don't let devres free a still-registered bus. Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	net: dsa: mt7530: fix kernel bug in mdiobus_free() when unbinding	Vladimir Oltean
	Nobody in this driver calls mdiobus_unregister(), which is necessary if mdiobus_register() completes successfully. So if the devres callbacks that free the mdiobus get invoked (this is the case when unbinding the driver), mdiobus_free() will BUG if the mdiobus is still registered, which it is. My speculation is that this is due to the fact that prior to commit ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") from June 2020, _devm_mdiobus_free() used to call mdiobus_unregister(). But at the time that the mt7530 support was introduced in May 2021, the API was already changed. It's therefore likely that the blamed patch was developed on an older tree, and incorrectly adapted to net-next. This makes the Fixes: tag correct. Fix the problem by using the devres variant of mdiobus_register. Fixes: ba751e28d442 ("net: dsa: mt7530: add interrupt support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	net: dsa: seville: register the mdiobus under devres	Vladimir Oltean
	As explained in commits: 74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres") 5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres") mdiobus_free() will panic when called from devm_mdiobus_free() <- devres_release_all() <- __device_release_driver(), and that mdiobus was not previously unregistered. The Seville VSC9959 switch is a platform device, so the initial set of constraints that I thought would cause this (I2C or SPI buses which call ->remove on ->shutdown) do not apply. But there is one more which applies here. If the DSA master itself is on a bus that calls ->remove from ->shutdown (like dpaa2-eth, which is on the fsl-mc bus), there is a device link between the switch and the DSA master, and device_links_unbind_consumers() will unbind the seville switch driver on shutdown. So the same treatment must be applied to all DSA switch drivers, which is: either use devres for both the mdiobus allocation and registration, or don't use devres at all. The seville driver has a code structure that could accommodate both the mdiobus_unregister and mdiobus_free calls, but it has an external dependency upon mscc_miim_setup() from mdio-mscc-miim.c, which calls devm_mdiobus_alloc_size() on its behalf. So rather than restructuring that, and exporting yet one more symbol mscc_miim_teardown(), let's work with devres and replace of_mdiobus_register with the devres variant. When we use all-devres, we can ensure that devres doesn't free a still-registered bus (it either runs both callbacks, or none). Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	net: dsa: felix: don't use devres for mdiobus	Vladimir Oltean
	As explained in commits: 74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres") 5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres") mdiobus_free() will panic when called from devm_mdiobus_free() <- devres_release_all() <- __device_release_driver(), and that mdiobus was not previously unregistered. The Felix VSC9959 switch is a PCI device, so the initial set of constraints that I thought would cause this (I2C or SPI buses which call ->remove on ->shutdown) do not apply. But there is one more which applies here. If the DSA master itself is on a bus that calls ->remove from ->shutdown (like dpaa2-eth, which is on the fsl-mc bus), there is a device link between the switch and the DSA master, and device_links_unbind_consumers() will unbind the felix switch driver on shutdown. So the same treatment must be applied to all DSA switch drivers, which is: either use devres for both the mdiobus allocation and registration, or don't use devres at all. The felix driver has the code structure in place for orderly mdiobus removal, so just replace devm_mdiobus_alloc_size() with the non-devres variant, and add manual free where necessary, to ensure that we don't let devres free a still-registered bus. Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	net: dsa: bcm_sf2: don't use devres for mdiobus	Vladimir Oltean
	As explained in commits: 74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres") 5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres") mdiobus_free() will panic when called from devm_mdiobus_free() <- devres_release_all() <- __device_release_driver(), and that mdiobus was not previously unregistered. The Starfighter 2 is a platform device, so the initial set of constraints that I thought would cause this (I2C or SPI buses which call ->remove on ->shutdown) do not apply. But there is one more which applies here. If the DSA master itself is on a bus that calls ->remove from ->shutdown (like dpaa2-eth, which is on the fsl-mc bus), there is a device link between the switch and the DSA master, and device_links_unbind_consumers() will unbind the bcm_sf2 switch driver on shutdown. So the same treatment must be applied to all DSA switch drivers, which is: either use devres for both the mdiobus allocation and registration, or don't use devres at all. The bcm_sf2 driver has the code structure in place for orderly mdiobus removal, so just replace devm_mdiobus_alloc() with the non-devres variant, and add manual free where necessary, to ensure that we don't let devres free a still-registered bus. Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	net: dsa: ar9331: register the mdiobus under devres	Vladimir Oltean
	As explained in commits: 74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres") 5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres") mdiobus_free() will panic when called from devm_mdiobus_free() <- devres_release_all() <- __device_release_driver(), and that mdiobus was not previously unregistered. The ar9331 is an MDIO device, so the initial set of constraints that I thought would cause this (I2C or SPI buses which call ->remove on ->shutdown) do not apply. But there is one more which applies here. If the DSA master itself is on a bus that calls ->remove from ->shutdown (like dpaa2-eth, which is on the fsl-mc bus), there is a device link between the switch and the DSA master, and device_links_unbind_consumers() will unbind the ar9331 switch driver on shutdown. So the same treatment must be applied to all DSA switch drivers, which is: either use devres for both the mdiobus allocation and registration, or don't use devres at all. The ar9331 driver doesn't have a complex code structure for mdiobus removal, so just replace of_mdiobus_register with the devres variant in order to be all-devres and ensure that we don't free a still-registered bus. Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	net: dsa: mv88e6xxx: don't use devres for mdiobus	Vladimir Oltean
	As explained in commits: 74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres") 5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres") mdiobus_free() will panic when called from devm_mdiobus_free() <- devres_release_all() <- __device_release_driver(), and that mdiobus was not previously unregistered. The mv88e6xxx is an MDIO device, so the initial set of constraints that I thought would cause this (I2C or SPI buses which call ->remove on ->shutdown) do not apply. But there is one more which applies here. If the DSA master itself is on a bus that calls ->remove from ->shutdown (like dpaa2-eth, which is on the fsl-mc bus), there is a device link between the switch and the DSA master, and device_links_unbind_consumers() will unbind the Marvell switch driver on shutdown. systemd-shutdown[1]: Powering off. mv88e6085 0x0000000008b96000:00 sw_gl0: Link is Down fsl-mc dpbp.9: Removing from iommu group 7 fsl-mc dpbp.8: Removing from iommu group 7 ------------[ cut here ]------------ kernel BUG at drivers/net/phy/mdio_bus.c:677! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00040-gdc05f73788e5 #15 pc : mdiobus_free+0x44/0x50 lr : devm_mdiobus_free+0x10/0x20 Call trace: mdiobus_free+0x44/0x50 devm_mdiobus_free+0x10/0x20 devres_release_all+0xa0/0x100 __device_release_driver+0x190/0x220 device_release_driver_internal+0xac/0xb0 device_links_unbind_consumers+0xd4/0x100 __device_release_driver+0x4c/0x220 device_release_driver_internal+0xac/0xb0 device_links_unbind_consumers+0xd4/0x100 __device_release_driver+0x94/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_device_remove+0x24/0x40 __fsl_mc_device_remove+0xc/0x20 device_for_each_child+0x58/0xa0 dprc_remove+0x90/0xb0 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_bus_remove+0x80/0x100 fsl_mc_bus_shutdown+0xc/0x1c platform_shutdown+0x20/0x30 device_shutdown+0x154/0x330 kernel_power_off+0x34/0x6c __do_sys_reboot+0x15c/0x250 __arm64_sys_reboot+0x20/0x30 invoke_syscall.constprop.0+0x4c/0xe0 do_el0_svc+0x4c/0x150 el0_svc+0x24/0xb0 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x178/0x17c So the same treatment must be applied to all DSA switch drivers, which is: either use devres for both the mdiobus allocation and registration, or don't use devres at all. The Marvell driver already has a good structure for mdiobus removal, so just plug in mdiobus_free and get rid of devres. Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") Reported-by: Rafael Richter <Rafael.Richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Daniel Klauer <daniel.klauer@gin.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	net: add dev->dev_registered_tracker	Eric Dumazet
	Convert one dev_hold()/dev_put() pair in register_netdevice() and unregister_netdevice_many() to dev_hold_track() and dev_put_track(). This would allow to detect a rogue dev_put() a bit earlier. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220207184107.1401096-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	Merge branch 'fix bpf_prog_pack build errors'	Alexei Starovoitov
	Song Liu says: ==================== Fix build errors reported by kernel test robot. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-02-08	bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE	Song Liu
	Fix build with CONFIG_TRANSPARENT_HUGEPAGE=n with BPF_PROG_PACK_SIZE as PAGE_SIZE. Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220208220509.4180389-3-song@kernel.org
2022-02-08	bonding: pair enable_port with slave_arr_updates	Mahesh Bandewar
	When 803.2ad mode enables a participating port, it should update the slave-array. I have observed that the member links are participating and are part of the active aggregator while the traffic is egressing via only one member link (in a case where two links are participating). Via kprobes I discovered that slave-arr has only one link added while the other participating link wasn't part of the slave-arr. I couldn't see what caused that situation but the simple code-walk through provided me hints that the enable_port wasn't always associated with the slave-array update. Fixes: ee6377147409 ("bonding: Simplify the xmit function for modes that use xmit_hash") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20220207222901.1795287-1-maheshb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	gve: Recording rx queue before sending to napi	Tao Liu
	This caused a significant performance degredation when using generic XDP with multiple queues. Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support") Signed-off-by: Tao Liu <xliutaox@google.com> Link: https://lore.kernel.org/r/20220207175901.2486596-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	et131x: support arbitrary MAX_SKB_FRAGS	Eric Dumazet
	This NIC does not support TSO, it is very unlikely it would have to send packets with many fragments. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220208004855.1887345-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	Merge branch 'iwl-next' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux Nguyen, Anthony L says: ==================== iwl-next Intel Wired LAN Driver Updates 2022-02-07 Dave adds support for ice driver to provide DSCP QoS mappings to irdma driver. [1] https://lore.kernel.org/netdev/20220202191921.1638-1-shiraz.saleem@intel.com/ * 'iwl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux: ice: add support for DSCP QoS for IDC ==================== Link: https://lore.kernel.org/r/20220207235921.1303522-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08	bpf: Fix leftover header->pages in sparc and powerpc code.	Song Liu
	Replace header->pages * PAGE_SIZE with new header->size. Fixes: ed2d9e1a26cc ("bpf: Use size instead of pages in bpf_binary_header") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220208220509.4180389-2-song@kernel.org
2022-02-08	libbpf: Fix signedness bug in btf_dump_array_data()	Dan Carpenter
	The btf__resolve_size() function returns negative error codes so "elem_size" must be signed for the error handling to work. Fixes: 920d16af9b42 ("libbpf: BTF dumper support for typed data") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220208071552.GB10495@kili
2022-02-08	selftests/bpf: Do not export subtest as standalone test	Hou Tao
	Two subtests in ksyms_module.c are not qualified as static, so these subtests are exported as standalone tests in tests.h and lead to confusion for the output of "./test_progs -t ksyms_module". By using the following command ... grep "^void $serial_$\?test_[a-zA-Z0-9_]\+($void$\?)" \ tools/testing/selftests/bpf/prog_tests/*.c \| \ awk -F : '{print $1}' \| sort \| uniq -c \| awk '$1 != 1' ... one finds out that other tests also have a similar problem, so fix these tests by marking subtests in these tests as static. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220208065444.648778-1-houtao1@huawei.com
2022-02-08	Documentation: KUnit: Fix usage bug	Akira Kawata
	Fix a bug of kunit documentation. Link: https://bugzilla.kernel.org/show_bug.cgi?id=205773 : Quoting Steve Pfetsch: : : kunit documentation is incorrect: : https://kunit.dev/third_party/stable_kernel/docs/usage.html : struct rectangle self = container_of(this, struct shape, parent); : : : Shouldn't it be: : struct rectangle self = container_of(this, struct rectangle, parent); : ? Signed-off-by: Akira Kawata <akirakawata1@gmail.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-02-08	Merge tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs	Linus Torvalds
	Pull NFS client fixes from Anna Schumaker: "Stable Fixes: - Fix initialization of nfs_client cl_flags Other Fixes: - Fix performance issues with uncached readdir calls - Fix potential pointer dereferences in rpcrdma_ep_create - Fix nfs4_proc_get_locations() kernel-doc comment - Fix locking during sunrpc sysfs reads - Update my email address in the MAINTAINERS file to my new kernel.org email" * tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: lock against ->sock changing during sysfs read MAINTAINERS: Update my email address NFS: Fix nfs4_proc_get_locations() kernel-doc comment xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create NFS: Fix initialisation of nfs_client cl_flags field NFS: Avoid duplicate uncached readdir calls on eof NFS: Don't skip directory entries when doing uncached readdir NFS: Don't overfill uncached readdir pages
2022-02-08	bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures	Song Liu
	Instead of BUG_ON(), fail gracefully and return orig_prog. Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220208062533.3802081-1-song@kernel.org
2022-02-08	i40e: Add a stat for tracking busy rx pages	Joe Damato
	In some cases, pages cannot be reused by i40e because the page is busy. Add a counter for this event. Busy page count is accessible via ethtool. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-08	i40e: Add a stat for tracking pages waived	Joe Damato
	In some cases, pages can not be reused because they are not associated with the correct NUMA zone. Knowing how often pages are waived helps users to understand the interaction between the driver's memory usage and their system. Pass rx_stats through to i40e_can_reuse_rx_page to allow tracking when pages are waived. The page waive count is accessible via ethtool. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-08	i40e: Add a stat tracking new RX page allocations	Joe Damato
	Add a counter for new page allocations in the i40e RX path. This stat is accessible with ethtool. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-08	i40e: Aggregate and export RX page reuse stat	Joe Damato
	rx page reuse was already being tracked by the i40e driver per RX ring. Aggregate the counts and make them accessible via ethtool. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-08	i40e: Remove rx page reuse double count	Joe Damato
	Page reuse was being tracked from two locations: - i40e_reuse_rx_page (via 40e_clean_rx_irq), and - i40e_alloc_mapped_page Remove the double count and only count reuse from i40e_alloc_mapped_page when the page is about to be reused. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-08	SUNRPC: lock against ->sock changing during sysfs read	NeilBrown
	->sock can be set to NULL asynchronously unless ->recv_mutex is held. So it is important to hold that mutex. Otherwise a sysfs read can trigger an oops. Commit 17f09d3f619a ("SUNRPC: Check if the xprt is connected before handling sysfs reads") appears to attempt to fix this problem, but it only narrows the race window. Fixes: 17f09d3f619a ("SUNRPC: Check if the xprt is connected before handling sysfs reads") Fixes: a8482488a7d6 ("SUNRPC query transport's source port") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08	MAINTAINERS: Update my email address	Anna Schumaker
	Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08	NFS: Fix nfs4_proc_get_locations() kernel-doc comment	Yang Li
	Add the description of @server and @fhandle, and remove the excess @inode in nfs4_proc_get_locations() kernel-doc comment to remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/nfs/nfs4proc.c:8219: warning: Function parameter or member 'server' not described in 'nfs4_proc_get_locations' fs/nfs/nfs4proc.c:8219: warning: Function parameter or member 'fhandle' not described in 'nfs4_proc_get_locations' fs/nfs/nfs4proc.c:8219: warning: Excess function parameter 'inode' description in 'nfs4_proc_get_locations' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08	xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create	Dan Aloni
	If there are failures then we must not leave the non-NULL pointers with the error value, otherwise `rpcrdma_ep_destroy` gets confused and tries free them, resulting in an Oops. Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08	NFS: Fix initialisation of nfs_client cl_flags field	Trond Myklebust
	For some long forgotten reason, the nfs_client cl_flags field is initialised in nfs_get_client() instead of being initialised at allocation time. This quirk was harmless until we moved the call to nfs_create_rpc_client(). Fixes: dd99e9f98fbf ("NFSv4: Initialise connection to the server in nfs4_alloc_client()") Cc: stable@vger.kernel.org # 4.8.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-07	Merge branch 'inet-separate-dscp-from-ecn-bits-using-new-dscp_t-type'	Jakub Kicinski
	Guillaume Nault says: ==================== inet: Separate DSCP from ECN bits using new dscp_t type The networking stack currently doesn't clearly distinguish between DSCP and ECN bits. The entire DSCP+ECN bits are stored in u8 variables (or structure fields), and each part of the stack handles them in their own way, using different macros. This has created several bugs in the past and some uncommon code paths are still unfixed. Such bugs generally manifest by selecting invalid routes because of ECN bits interfering with FIB routes and rules lookups (more details in the LPC 2021 talk[1] and in the RFC of this series[2]). This patch series aims at preventing the introduction of such bugs (and detecting existing ones), by introducing a dscp_t type, representing "sanitised" DSCP values (that is, with no ECN information), as opposed to plain u8 values that contain both DSCP and ECN information. dscp_t makes it clear for the reader what we're working on, and Sparse can flag invalid interactions between dscp_t and plain u8. This series converts only a few variables and structures: * Patch 1 converts the tclass field of struct fib6_rule. It effectively forbids the use of ECN bits in the tos/dsfield option of ip -6 rule. Rules now match packets solely based on their DSCP bits, so ECN doesn't influence the result any more. This contrasts with the previous behaviour where all 8 bits of the Traffic Class field were used. It is believed that this change is acceptable as matching ECN bits wasn't usable for IPv4, so only IPv6-only deployments could be depending on it. Also the previous behaviour made DSCP-based ip6-rules fail for packets with both a DSCP and an ECN mark, which is another reason why any such deploy is unlikely. * Patch 2 converts the tos field of struct fib4_rule. This one too effectively forbids defining ECN bits, this time in ip -4 rule. Before that, setting ECN bit 1 was accepted, while ECN bit 0 was rejected. But even when accepted, the rule would never match, as the packets would have their ECN bits cleared before doing the rule lookup. * Patch 3 converts the fc_tos field of struct fib_config. This is equivalent to patch 2, but for IPv4 routes. Routes using a tos/dsfield option with any ECN bit set is now rejected. Before this patch, they were accepted but, as with ip4 rules, these routes couldn't match any packet, since their ECN bits are cleared before the lookup. * Patch 4 converts the fa_tos field of struct fib_alias. This one is pure internal u8 to dscp_t conversion. While patches 1-3 had user facing consequences, this patch shouldn't have any side effect and is there to give an overview of what future conversion patches will look like. Conversions are quite mechanical, but imply some code churn, which is the price for the extra clarity a possibility of type checking. To summarise, all the behaviour changes required for the dscp_t type approach to work should be contained in patches 1-3. These changes are edge cases of ip-route and ip-rule that don't currently work properly. So they should be safe. Also, a kernel selftest is added for each of them. Finally, this work also paves the way for allowing the usage of the 3 high order DSCP bits in IPv4 (a few call paths already handle them, but in general the stack clears them before IPv4 rule and route lookups). References: [1] LPC 2021 talk: - https://linuxplumbersconf.org/event/11/contributions/943/ - Direct link to slide deck: https://linuxplumbersconf.org/event/11/contributions/943/attachments/901/1780/inet_tos_lpc2021.pdf [2] RFC version of this series: - https://lore.kernel.org/netdev/cover.1638814614.git.gnault@redhat.com/ ==================== Link: https://lore.kernel.org/r/cover.1643981839.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07	ipv4: Use dscp_t in struct fib_alias	Guillaume Nault
	Use the new dscp_t type to replace the fa_tos field of fib_alias. This ensures ECN bits are ignored and makes the field compatible with the fc_dscp field of struct fib_config. Converting old *tos variables and fields to dscp_t allows sparse to flag incorrect uses of DSCP and ECN bits. This patch is entirely about type annotation and shouldn't change any existing behaviour. Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>