linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-09-20	Merge branch 'net-ipa-a-mix-of-cleanups'	Jakub Kicinski
	Alex Elder says: ==================== net: ipa: a mix of cleanups This series contains a set of cleanups done in preparation for a more substantitive upcoming series that reworks how IPA registers and their fields are defined. The first eliminates about half of the possible GSI register constant symbols by removing offset definitions that are not currently required. The next two mainly rearrange code for some common enumerated types. The next one fixes two spots that reuse local variable names in inner scopes when defining offsets. The next adds some additional restrictions on the value held in a register. And the last one just fixes two field mask symbol names so they adhere to the common naming convention. ==================== Link: https://lore.kernel.org/r/20220910011131.1431934-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20	net: ipa: fix two symbol names	Alex Elder
	All field mask symbols are defined with a "_FMASK" suffix, but EOT_COAL_GRANULARITY and DRBIP_ACL_ENABLE are defined without one. Fix that. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20	net: ipa: update sequencer definition constraints	Alex Elder
	Starting with IPA v4.5, replication is done differently from before, and as a result the "replication" portion of the how the sequencer is specified must be zero. Add a check for the configuration data failing that requirement, and only update the sesquencer type value when it's supported. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20	net: ipa: don't reuse variable names	Alex Elder
	In ipa_endpoint_init_hdr(), as well as ipa_endpoint_init_hdr_ext(), a top-level automatic variable named "offset" is used to represent the offset of a register. However, deeper within each of those functions is another definition of a local variable with the same name, representing something else. Scoping rules ensure the result is what was intended, but this variable name reuse is bad practice and makes the code confusing. Fix this by naming the inner variable "off". Use "off" instead of "checksum_offset" in ipa_endpoint_init_cfg() for consistency. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20	net: ipa: move and redefine ipa_version_valid()	Alex Elder
	Move the definition of ipa_version_valid(), making it a static inline function defined together with the enumerated type in "ipa_version.h". Define a new count value in the type. Rename the function to be ipa_version_supported(), and have it return true only if the IPA version supplied is explicitly supported by the driver. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20	net: ipa: move the definition of gsi_ee_id	Alex Elder
	Move the definition of the gsi_ee_id enumerated type out of "gsi.h" and into "ipa_version.h". That latter header file isolates the definition of the ipa_version enumerated type, allowing it to be included in both IPA and GSI code. We have the same requirement for gsi_ee_id, and moving it here makes it easier to get only that definition without everything else defined in "gsi.h". Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20	net: ipa: don't define unneeded GSI register offsets	Alex Elder
	Each GSI execution environment (EE) is able to access many of the GSI registers associated with the other EEs. A block of GSI registers is contained within a region of memory, and an EE's register offset can be determined by adding the register's base offset to the product of the EE ID and a fixed constant. Despite this possibility, the AP IPA code never accesses any GSI registers other than its own. So there's no need to define the macros that compute register offsets for other EEs. Redefine the AP access macros to compute the offset the way the more general "any EE" macro would, and get rid of the unneeded macros. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20	of: mdio: Add of_node_put() when breaking out of for_each_xx	Liang He
	In of_mdiobus_register(), we should call of_node_put() for 'child' escaped out of for_each_available_child_of_node(). Fixes: 66bdede495c7 ("of_mdio: Fix broken PHY IRQ in case of probe deferral") Co-developed-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Liang He <windhl@126.com> Link: https://lore.kernel.org/r/20220913125659.3331969-1-windhl@126.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20	drm/i915/gem: Really move i915_gem_context.link under ref protection	Chris Wilson
	i915_perf assumes that it can use the i915_gem_context reference to protect its i915->gem.contexts.list iteration. However, this requires that we do not remove the context from the list until after we drop the final reference and release the struct. If, as currently, we remove the context from the list during context_close(), the link.next pointer may be poisoned while we are holding the context reference and cause a GPF: [ 4070.573157] i915 0000:00:02.0: [drm:i915_perf_open_ioctl [i915]] filtering on ctx_id=0x1fffff ctx_id_mask=0x1fffff [ 4070.574881] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP [ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: G E 5.17.9 #180 [ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915] [ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 50 f0 41 0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 <49> 8b 45 50 48 8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff [ 4070.574990] RSP: 0018:ffffc90002077b78 EFLAGS: 00010202 [ 4070.574995] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000 [ 4070.575000] RDX: 0000000000000001 RSI: ffffc90002077b20 RDI: ffff88810ddc7c68 [ 4070.575004] RBP: 0000000000000001 R08: ffff888103242648 R09: fffffffffffffffc [ 4070.575008] R10: ffffffff82c50bc0 R11: 0000000000025c80 R12: ffff888101bf1860 [ 4070.575012] R13: dead0000000000b0 R14: ffffc90002077c04 R15: ffff88810be5cabc [ 4070.575016] FS: 00007f1ed50c0780(0000) GS:ffff88885ec80000(0000) knlGS:0000000000000000 [ 4070.575021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4070.575025] CR2: 00007f1ed5590280 CR3: 000000010ef6f005 CR4: 00000000003706e0 [ 4070.575029] Call Trace: [ 4070.575033] <TASK> [ 4070.575037] lrc_configure_all_contexts+0x13e/0x150 [i915] [ 4070.575103] gen8_enable_metric_set+0x4d/0x90 [i915] [ 4070.575164] i915_perf_open_ioctl+0xbc0/0x1500 [i915] [ 4070.575224] ? asm_common_interrupt+0x1e/0x40 [ 4070.575232] ? i915_oa_init_reg_state+0x110/0x110 [i915] [ 4070.575290] drm_ioctl_kernel+0x85/0x110 [ 4070.575296] ? update_load_avg+0x5f/0x5e0 [ 4070.575302] drm_ioctl+0x1d3/0x370 [ 4070.575307] ? i915_oa_init_reg_state+0x110/0x110 [i915] [ 4070.575382] ? gen8_gt_irq_handler+0x46/0x130 [i915] [ 4070.575445] __x64_sys_ioctl+0x3c4/0x8d0 [ 4070.575451] ? __do_softirq+0xaa/0x1d2 [ 4070.575456] do_syscall_64+0x35/0x80 [ 4070.575461] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4070.575467] RIP: 0033:0x7f1ed5c10397 [ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48 [ 4070.575478] RSP: 002b:00007ffd65c8d7a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 4070.575484] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f1ed5c10397 [ 4070.575488] RDX: 00007ffd65c8d7c0 RSI: 0000000040106476 RDI: 0000000000000006 [ 4070.575492] RBP: 00005620972f9c60 R08: 000000000000000a R09: 0000000000000005 [ 4070.575496] R10: 000000000000000d R11: 0000000000000246 R12: 000000000000000a [ 4070.575500] R13: 000000000000000d R14: 0000000000000000 R15: 00007ffd65c8d7c0 [ 4070.575505] </TASK> [ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E) [ 4070.575549] ---[ end trace 0000000000000000 ]--- v3: fix incorrect syntax of spin_lock() replacing spin_lock_irqsave() v2: irqsave not required in a worker, neither conversion to irq safe elsewhere (Tvrtko), - perf: it's safe to call gen8_configure_context() even if context has been closed, no need to check, - drop unrelated cleanup (Andi, Tvrtko) Reported-by: Mark Janes <mark.janes@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/issues/6222 References: a4e7ccdac38e ("drm/i915: Move context management under GEM") Fixes: f8246cf4d9a9 ("drm/i915/gem: Drop free_work for GEM contexts") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.12+ Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220916092403.201355-3-janusz.krzysztofik@linux.intel.com (cherry picked from commit ad3aa7c31efa5a09b0dba42e66cfdf77e0db7dc2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-09-20	drm/i915/gem: Flush contexts on driver release	Janusz Krzysztofik
	Due to i915_perf assuming that it can use the i915_gem_context reference to protect its i915->gem.contexts.list iteration, we need to defer removal of the context from the list until last reference to the context is put. However, there is a risk of triggering kernel warning on contexts list not empty at driver release time if we deleagate that task to a worker for i915_gem_context_release_work(), unless that work is flushed first. Unfortunately, it is not flushed on driver release. Fix it. Instead of additionally calling flush_workqueue(), either directly or via a new dedicated wrapper around it, replace last call to i915_gem_drain_freed_objects() with existing i915_gem_drain_workqueue() that performs both tasks. Fixes: 75eefd82581f ("drm/i915: Release i915_gem_context from a worker") Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Cc: stable@kernel.org # v5.16+ Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220916092403.201355-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 1cec34442408a77ba5396b19725fed2c398005c3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-09-20	Revert "block: freeze the queue earlier in del_gendisk"	Christoph Hellwig
	This reverts commit a09b314005f3a0956ebf56e01b3b80339df577cc. Dusty Mabe reported consistent hang during CoreOS shutdown with a MD RAID1 setup. Although apparently similar hangs happened before, and this patch most likely is not the root cause it made it much more severe. Revert it until we can figure out what is going on with the md driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220919144049.978907-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-20	MAINTAINERS: Add maintainer for hwmon/max31760	Ibrahim Tilki
	Add maintainer for hwmon/max31760 driver Signed-off-by: Ibrahim Tilki <Ibrahim.Tilki@analog.com> Link: https://lore.kernel.org/r/20220910171945.48088-5-Ibrahim.Tilki@analog.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-09-20	dt-bindings: hwmon: Add bindings for max31760	Ibrahim Tilki
	Adding bindings for Analog Devices MAX31760 Fan-Speed Controller Signed-off-by: Ibrahim Tilki <Ibrahim.Tilki@analog.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220910171945.48088-4-Ibrahim.Tilki@analog.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-09-20	x86/dumpstack: Don't mention RIP in "Code: "	Jiri Slaby
	Commit 238c91115cd0 ("x86/dumpstack: Fix misleading instruction pointer error message") changed the "Code:" line in bug reports when RIP is an invalid pointer. In particular, the report currently says (for example): BUG: kernel NULL pointer dereference, address: 0000000000000000 ... RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. That Unable to access opcode bytes at RIP 0xffffffffffffffd6. is quite confusing as RIP value is 0, not -42. That -42 comes from "regs->ip - PROLOGUE_SIZE", because Code is dumped with some prologue (and epilogue). So do not mention "RIP" on this line in this context. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/b772c39f-c5ae-8f17-fe6e-6a2bc4d1f83b@kernel.org
2022-09-20	docs: hwmon: add max31760 documentation	Ibrahim Tilki
	Adding documentation for max31760 fan speed controller Signed-off-by: Ibrahim Tilki <Ibrahim.Tilki@analog.com> Link: https://lore.kernel.org/r/20220910171945.48088-3-Ibrahim.Tilki@analog.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-09-20	drivers: hwmon: Add max31760 fan speed controller driver	Ibrahim Tilki
	MAX31760 is a precision fan speed controller with nonvolatile lookup table. Device has one internal and one external temperature sensor support. Controls two fans and measures their speeds. Generates hardware alerts when programmable max and critical temperatures are exceeded. Signed-off-by: Ibrahim Tilki <Ibrahim.Tilki@analog.com> Reviewed-by: Nurettin Bolucu <Nurettin.Bolucu@analog.com> Link: https://lore.kernel.org/r/20220910171945.48088-2-Ibrahim.Tilki@analog.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-09-20	gpio: ftgpio010: Make irqchip immutable	Linus Walleij
	This turns the FTGPIO010 irqchip immutable. Tested on the D-Link DIR-685. Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-09-20	gpio: mockup: Fix potential resource leakage when register a chip	Andy Shevchenko
	If creation of software node fails, the locally allocated string array is left unfreed. Free it on error path. Fixes: 6fda593f3082 ("gpio: mockup: Convert to use software nodes") Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-09-20	gpio: mockup: fix NULL pointer dereference when removing debugfs	Bartosz Golaszewski
	We now remove the device's debugfs entries when unbinding the driver. This now causes a NULL-pointer dereference on module exit because the platform devices are unregistered after the global debugfs directory has been recursively removed. Fix it by unregistering the devices first. Fixes: 303e6da99429 ("gpio: mockup: remove gpio debugfs when remove device") Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-09-20	x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions	Vincent Mailhol
	If x is not 0, __ffs(x) is equivalent to: (unsigned long)__builtin_ctzl(x) And if x is not ~0UL, ffz(x) is equivalent to: (unsigned long)__builtin_ctzl(~x) Because __builting_ctzl() returns an int, a cast to (unsigned long) is necessary to avoid potential warnings on implicit casts. Concerning the edge cases, __builtin_ctzl(0) is always undefined, whereas __ffs(0) and ffz(~0UL) may or may not be defined, depending on the processor. Regardless, for both functions, developers are asked to check against 0 or ~0UL so replacing __ffs() or ffz() by __builting_ctzl() is safe. For x86_64, the current __ffs() and ffz() implementations do not produce optimized code when called with a constant expression. On the contrary, the __builtin_ctzl() folds into a single instruction. However, for non constant expressions, the __ffs() and ffz() asm versions of the kernel remains slightly better than the code produced by GCC (it produces a useless instruction to clear eax). Use __builtin_constant_p() to select between the kernel's __ffs()/ffz() and the __builtin_ctzl() depending on whether the argument is constant or not. Statistics On a allyesconfig, before...: $ objdump -d vmlinux.o \| grep tzcnt \| wc -l 3607 ...and after: $ objdump -d vmlinux.o \| grep tzcnt \| wc -l 2600 So, roughly 27.9% of the calls to either __ffs() or ffz() were using constant expressions and could be optimized out. (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1) Note: on x86_64, the BSF instruction produces TZCNT when used with the REP prefix (which explain the use of `grep tzcnt' instead of `grep bsf' in above benchmark). c.f. [1] [1] e26a44a2d618 ("x86: Use REP BSF unconditionally") [ bp: Massage commit message. ] Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20220511160319.1045812-1-mailhol.vincent@wanadoo.fr
2022-09-20	x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions	Vincent Mailhol
	For x86_64, the current ffs() implementation does not produce optimized code when called with a constant expression. On the contrary, the __builtin_ffs() functions of both GCC and clang are able to fold the expression into a single instruction. Example Consider two dummy functions foo() and bar() as below: #include <linux/bitops.h> #define CONST 0x01000000 unsigned int foo(void) { return ffs(CONST); } unsigned int bar(void) { return __builtin_ffs(CONST); } GCC would produce below assembly code: 0000000000000000 <foo>: 0: ba 00 00 00 01 mov $0x1000000,%edx 5: b8 ff ff ff ff mov $0xffffffff,%eax a: 0f bc c2 bsf %edx,%eax d: 83 c0 01 add $0x1,%eax 10: c3 ret <Instructions after ret and before next function were redacted> 0000000000000020 <bar>: 20: b8 19 00 00 00 mov $0x19,%eax 25: c3 ret And clang would produce: 0000000000000000 <foo>: 0: b8 ff ff ff ff mov $0xffffffff,%eax 5: 0f bc 05 00 00 00 00 bsf 0x0(%rip),%eax # c <foo+0xc> c: 83 c0 01 add $0x1,%eax f: c3 ret 0000000000000010 <bar>: 10: b8 19 00 00 00 mov $0x19,%eax 15: c3 ret Both examples clearly demonstrate the benefit of using __builtin_ffs() instead of the kernel's asm implementation for constant expressions. However, for non constant expressions, the kernel's ffs() asm version remains better for x86_64 because, contrary to GCC, it doesn't emit the CMOV assembly instruction, c.f. [1] (noticeably, clang is able optimize out the CMOV call). Use __builtin_constant_p() to select between the kernel's ffs() and the __builtin_ffs() depending on whether the argument is constant or not. As a side benefit, replacing the ffs() function declaration by a macro also removes below -Wshadow warning: ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow] 283 \| static __always_inline int ffs(int x) Statistics On a allyesconfig, before...: $ objdump -d vmlinux.o \| grep bsf \| wc -l 1081 ...and after: $ objdump -d vmlinux.o \| grep bsf \| wc -l 792 So, roughly 26.7% of the calls to ffs() were using constant expressions and could be optimized out. (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1) [1] commit ca3d30cc02f7 ("x86_64, asm: Optimise fls(), ffs() and fls64()") [ bp: Massage commit message. ] Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20220511160319.1045812-1-mailhol.vincent@wanadoo.fr
2022-09-20	Merge branch 'net-ethernet-adi-add-adin1110-support'	Paolo Abeni
	Alexandru Tachici says: ==================== net: ethernet: adi: Add ADIN1110 support The ADIN1110 is a low power single port 10BASE-T1L MAC-PHY designed for industrial Ethernet applications. It integrates an Ethernet PHY core with a MAC and all the associated analog circuitry, input and output clock buffering. ADIN1110 MAC-PHY encapsulates the ADIN1100 PHY. The PHY registers can be accessed through the MDIO MAC registers. We are registering an MDIO bus with custom read/write in order to let the PHY to be discovered by the PAL. This will let the ADIN1100 Linux driver to probe and take control of the PHY. The ADIN2111 is a low power, low complexity, two-Ethernet ports switch with integrated 10BASE-T1L PHYs and one serial peripheral interface (SPI) port. The device is designed for industrial Ethernet applications using low power constrained nodes and is compliant with the IEEE 802.3cg-2019 Ethernet standard for long reach 10 Mbps single pair Ethernet (SPE). The switch supports various routing configurations between the two Ethernet ports and the SPI host port providing a flexible solution for line, daisy-chain, or ring network topologies. The ADIN2111 supports cable reach of up to 1700 meters with ultra low power consumption of 77 mW. The two PHY cores support the 1.0 V p-p operating mode and the 2.4 V p-p operating mode defined in the IEEE 802.3cg standard. The device integrates the switch, two Ethernet physical layer (PHY) cores with a media access control (MAC) interface and all the associated analog circuitry, and input and output clock buffering. The device also includes internal buffer queues, the SPI and subsystem registers, as well as the control logic to manage the reset and clock control and hardware pin configuration. Access to the PHYs is exposed via an internal MDIO bus. Writes/reads can be performed by reading/writing to the ADIN2111 MDIO registers via SPI. On probe, for each port, a struct net_device is allocated and registered. When both ports are added to the same bridge, the driver will enable offloading of frame forwarding at the hardware level. Driver offers STP support. Normal operation on forwarding state. Allows only frames with the 802.1d DA to be passed to the host when in any of the other states. When both ports of ADIN2111 belong to the same SW bridge a maximum of 12 FDB entries will offloaded by the hardware and are marked as such. ==================== Link: https://lore.kernel.org/r/20220913122629.124546-1-andrei.tachici@stud.acs.upb.ro Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	dt-bindings: net: adin1110: Add docs	Alexandru Tachici
	Add bindings for the ADIN1110/2111 MAC-PHY/SWITCH. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: ethernet: adi: Add ADIN1110 support	Alexandru Tachici
	The ADIN1110 is a low power single port 10BASE-T1L MAC-PHY designed for industrial Ethernet applications. It integrates an Ethernet PHY core with a MAC and all the associated analog circuitry, input and output clock buffering. ADIN1110 MAC-PHY encapsulates the ADIN1100 PHY. The PHY registers can be accessed through the MDIO MAC registers. We are registering an MDIO bus with custom read/write in order to let the PHY to be discovered by the PAL. This will let the ADIN1100 Linux driver to probe and take control of the PHY. The ADIN2111 is a low power, low complexity, two-Ethernet ports switch with integrated 10BASE-T1L PHYs and one serial peripheral interface (SPI) port. The device is designed for industrial Ethernet applications using low power constrained nodes and is compliant with the IEEE 802.3cg-2019 Ethernet standard for long reach 10 Mbps single pair Ethernet (SPE). The switch supports various routing configurations between the two Ethernet ports and the SPI host port providing a flexible solution for line, daisy-chain, or ring network topologies. The ADIN2111 supports cable reach of up to 1700 meters with ultra low power consumption of 77 mW. The two PHY cores support the 1.0 V p-p operating mode and the 2.4 V p-p operating mode defined in the IEEE 802.3cg standard. The device integrates the switch, two Ethernet physical layer (PHY) cores with a media access control (MAC) interface and all the associated analog circuitry, and input and output clock buffering. The device also includes internal buffer queues, the SPI and subsystem registers, as well as the control logic to manage the reset and clock control and hardware pin configuration. Access to the PHYs is exposed via an internal MDIO bus. Writes/reads can be performed by reading/writing to the ADIN2111 MDIO registers via SPI. On probe, for each port, a struct net_device is allocated and registered. When both ports are added to the same bridge, the driver will enable offloading of frame forwarding at the hardware level. Driver offers STP support. Normal operation on forwarding state. Allows only frames with the 802.1d DA to be passed to the host when in any of the other states. When both ports of ADIN2111 belong to the same SW bridge a maximum of 12 FDB entries will offloaded by the hardware and are marked as such. Co-developed-by: Lennart Franzen <lennart@lfdomain.com> Signed-off-by: Lennart Franzen <lennart@lfdomain.com> Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: phy: adin1100: add PHY IDs of adin1110/adin2111	Alexandru Tachici
	Add additional PHY IDs for the internal PHYs of adin1110 and adin2111. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	tcp: read multiple skbs in tcp_read_skb()	Cong Wang
	Before we switched to ->read_skb(), ->read_sock() was passed with desc.count=1, which technically indicates we only read one skb per ->sk_data_ready() call. However, for TCP, this is not true. TCP at least has sk_rcvlowat which intentionally holds skb's in receive queue until this watermark is reached. This means when ->sk_data_ready() is invoked there could be multiple skb's in the queue, therefore we have to read multiple skbs in tcp_read_skb() instead of one. Fixes: 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()") Reported-by: Peilin Ye <peilin.ye@bytedance.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20220912173553.235838-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	Revert "ALSA: usb-audio: Split endpoint setups for hw_params and prepare"	Takashi Iwai
	This reverts commit ff878b408a03bef5d610b7e2302702e16a53636e. Unfortunately the recent fix seems bringing another regressions with PulseAudio / pipewire, at least for Steinberg and MOTU devices. As a temporary solution, do a straight revert. The issue for Android will be revisited again later by another different fix (if any). Fixes: ff878b408a03 ("ALSA: usb-audio: Split endpoint setups for hw_params and prepare") Cc: <stable@vger.kernel.org> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216500 Link: https://lore.kernel.org/r/20220920113929.25162-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-09-20	Merge branch 'seg6-add-next-c-sid-support-for-srv6-end-behavior'	Paolo Abeni
	Andrea Mayer says: ==================== seg6: add NEXT-C-SID support for SRv6 End behavior The Segment Routing (SR) architecture is based on loose source routing. A list of instructions, called segments, can be added to the packet headers to influence the forwarding and processing of the packets in an SR enabled network. In SRv6 (Segment Routing over IPv6 data plane) [1], the segment identifiers (SIDs) are IPv6 addresses (128 bits) and the segment list (SID List) is carried in the Segment Routing Header (SRH). A segment may correspond to a "behavior" that is executed by a node when the packet is received. The Linux kernel currently supports a large subset of the behaviors described in [2] (e.g., End, End.X, End.T and so on). Some SRv6 scenarios (i.e.: traffic-engineering, fast-rerouting, VPN, mobile network backhaul, etc.) may require a large number of segments (i.e. up to 15). Therefore, reducing the size of the SID List is useful to minimize the impact on MTU (Maximum Transfer Unit) and to enable SRv6 on legacy hardware devices with limited processing power that can suffer from long IPv6 headers. Draft-ietf-spring-srv6-srh-compression [3] extends the SRv6 architecture by providing different mechanisms for the efficient representation (i.e. compression) of the SID List. The NEXT-C-SID mechanism described in [3] offers the possibility of encoding several SRv6 segments within a single 128 bit SID address. Such a SID address is called a Compressed SID Container. In this way, the length of the SID List can be drastically reduced. In some cases, the SRH can be omitted, as the IPv6 Destination Address can carry the whole Segment List, using its compressed representation. The NEXT-C-SID mechanism relies on the "flavors" framework defined in [2]. The flavors represent additional operations that can modify or extend a subset of the existing behaviors. In this patchset we extend the SRv6 Subsystem in order to support the NEXT-C-SID mechanism. In details the patchset is made of: - patch 1/3: add netlink_ext_ack support in parsing SRv6 behavior attributes; - patch 2/3: add NEXT-C-SID support for SRv6 End behavior; - patch 3/3: add selftest for NEXT-C-SID in SRv6 End behavior. The corresponding iproute2 patch for supporting the NEXT-C-SID in SRv6 End behavior is provided in a separated patchset. Comments, improvements and suggestions are always appreciated. [1] - https://datatracker.ietf.org/doc/html/rfc8754 [2] - https://datatracker.ietf.org/doc/html/rfc8986 [3] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression ==================== Link: https://lore.kernel.org/r/20220912171619.16943-1-andrea.mayer@uniroma2.it Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End behavior	Andrea Mayer
	This selftest is designed for testing the support of NEXT-C-SID flavor for SRv6 End behavior. It instantiates a virtual network composed of several nodes: hosts and SRv6 routers. Each node is realized using a network namespace that is properly interconnected to others through veth pairs. The test considers SRv6 routers implementing IPv4/IPv6 L3 VPNs leveraged by hosts for communicating with each other. Such routers i) apply different SRv6 Policies to the traffic received from connected hosts, considering the IPv4 or IPv6 protocols; ii) use the NEXT-C-SID compression mechanism for encoding several SRv6 segments within a single 128-bit SID address, referred to as a Compressed SID (C-SID) container. The NEXT-C-SID is provided as a "flavor" of the SRv6 End behavior, enabling it to properly process the C-SID containers. The correct execution of the enabled NEXT-C-SID SRv6 End behavior is verified through reachability tests carried out between hosts belonging to the same VPN. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	seg6: add NEXT-C-SID support for SRv6 End behavior	Andrea Mayer
	The NEXT-C-SID mechanism described in [1] offers the possibility of encoding several SRv6 segments within a single 128 bit SID address. Such a SID address is called a Compressed SID (C-SID) container. In this way, the length of the SID List can be drastically reduced. A SID instantiated with the NEXT-C-SID flavor considers an IPv6 address logically structured in three main blocks: i) Locator-Block; ii) Locator-Node Function; iii) Argument. C-SID container +------------------------------------------------------------------+ \| Locator-Block \|Loc-Node\| Argument \| \| \|Function\| \| +------------------------------------------------------------------+ <--------- B -----------> <- NF -> <------------- A ---------------> (i) The Locator-Block can be any IPv6 prefix available to the provider; (ii) The Locator-Node Function represents the node and the function to be triggered when a packet is received on the node; (iii) The Argument carries the remaining C-SIDs in the current C-SID container. The NEXT-C-SID mechanism relies on the "flavors" framework defined in [2]. The flavors represent additional operations that can modify or extend a subset of the existing behaviors. This patch introduces the support for flavors in SRv6 End behavior implementing the NEXT-C-SID one. An SRv6 End behavior with NEXT-C-SID flavor works as an End behavior but it is capable of processing the compressed SID List encoded in C-SID containers. An SRv6 End behavior with NEXT-C-SID flavor can be configured to support user-provided Locator-Block and Locator-Node Function lengths. In this implementation, such lengths must be evenly divisible by 8 (i.e. must be byte-aligned), otherwise the kernel informs the user about invalid values with a meaningful error code and message through netlink_ext_ack. If Locator-Block and/or Locator-Node Function lengths are not provided by the user during configuration of an SRv6 End behavior instance with NEXT-C-SID flavor, the kernel will choose their default values i.e., 32-bit Locator-Block and 16-bit Locator-Node Function. [1] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression [2] - https://datatracker.ietf.org/doc/html/rfc8986 Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	seg6: add netlink_ext_ack support in parsing SRv6 behavior attributes	Andrea Mayer
	An SRv6 behavior instance can be set up using mandatory and/or optional attributes. In the setup phase, each supplied attribute is parsed and processed. If the parsing operation fails, the creation of the behavior instance stops and an error number/code is reported to the user. In many cases, it is challenging for the user to figure out exactly what happened by relying only on the error code. For this reason, we add the support for netlink_ext_ack in parsing SRv6 behavior attributes. In this way, when an SRv6 behavior attribute is parsed and an error occurs, the kernel can send a message to the userspace describing the error through a meaningful text message in addition to the classic error code. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	Merge branch 'revert-fec-ptp-changes'	Paolo Abeni
	Francesco Dolcini says: ==================== Revert fec PTP changes Revert the last 2 FEC PTP changes from Csókás Bence, they are causing multiple issues and we are at 6.0-rc5. ==================== Link: https://lore.kernel.org/r/20220912070143.98153-1-francesco.dolcini@toradex.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	Revert "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"	Francesco Dolcini
	This reverts commit b353b241f1eb9b6265358ffbe2632fdcb563354f, this is creating multiple issues, just not ready to be merged yet. Link: https://lore.kernel.org/all/CAHk-=wj1obPoTu1AHj9Bd_BGYjdjDyPP+vT5WMj8eheb3A9WHw@mail.gmail.com/ Link: https://lore.kernel.org/all/20220907143915.5w65kainpykfobte@pengutronix.de/ Fixes: b353b241f1eb ("net: fec: Use a spinlock to guard `fep->ptp_clk_on`") Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	Revert "fec: Restart PPS after link state change"	Francesco Dolcini
	This reverts commit f79959220fa5fbda939592bf91c7a9ea90419040, this is creating multiple issues, just not ready to be merged yet. Link: https://lore.kernel.org/all/20220905180542.GA3685102@roeck-us.net/ Link: https://lore.kernel.org/all/CAHk-=wj1obPoTu1AHj9Bd_BGYjdjDyPP+vT5WMj8eheb3A9WHw@mail.gmail.com/ Fixes: f79959220fa5 ("fec: Restart PPS after link state change") Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: dsa: microchip: lan937x: fix maximum frame length check	Rakesh Sankaranarayanan
	Maximum frame length check is enabled in lan937x switch on POR, But it is found to be disabled on driver during port setup operation. Due to this, packets are not dropped when transmitted with greater than configured value. For testing, setup made for lan1->lan2 transmission and configured lan1 interface with a frame length (less than 1500 as mentioned in documentation) and transmitted packets with greater than configured value. Expected no packets at lan2 end, but packets observed at lan2. Based on the documentation, packets should get discarded if the actual packet length doesn't match the frame length configured. Frame length check should be disabled only for cascaded ports due to tailtags. This feature was disabled on ksz9477 series due to ptp issue, which is not in lan937x series. But since lan937x took ksz9477 as base, frame length check disabled here as well. Patch added to remove this portion from port setup so that maximum frame length check will be active for normal ports. Fixes: 55ab6ffaf378 ("net: dsa: microchip: add DSA support for microchip LAN937x") Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com> Link: https://lore.kernel.org/r/20220912051228.1306074-1-rakesh.sankaranarayanan@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	open: always initialize ownership fields	Tetsuo Handa
	Beginning of the merge window we introduced the vfs{g,u}id_t types in b27c82e12965 ("attr: port attribute changes to new types") and changed various codepaths over including chown_common(). During that change we forgot to account for the case were the passed ownership value is -1. In this case the ownership fields in struct iattr aren't initialized but we rely on them being initialized by the time we generate the ownership to pass down to the LSMs. All the major LSMs don't care about the ownership values at all. Only Tomoyo uses them and so it took a while for syzbot to unearth this issue. Fix this by initializing the ownership fields and do it within the retry_deleg block. While notify_change() doesn't alter the ownership fields currently we shouldn't rely on it. Since no kernel has been released with these changes this does not needed to be backported to any stable kernels. [Christian Brauner (Microsoft) <brauner@kernel.org>] * rewrote commit message * use INVALID_VFS{G,U}ID macros Fixes: b27c82e12965 ("attr: port attribute changes to new types") # mainline only Reported-and-tested-by: syzbot+541e21dcc32c4046cba9@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-09-20	net-next: gro: Fix use of skb_gro_header_slow	Richard Gobert
	In the cited commit, the function ipv6_gro_receive was accidentally changed to use skb_gro_header_slow, without attempting the fast path. Fix it. Fixes: 35ffb6654729 ("net: gro: skb_gro_header helper function") Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Link: https://lore.kernel.org/r/20220911184835.GA105063@debian Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net/mlx5e: Ensure macsec_rule is always initiailized in ↵	Nathan Chancellor
	macsec_fs_{r,t}x_add_rule() Clang warns: drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:539:6: error: variable 'macsec_rule' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (err) ^~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:598:9: note: uninitialized use occurs here return macsec_rule; ^~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:539:2: note: remove the 'if' if its condition is always false if (err) ^~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:523:38: note: initialize the variable 'macsec_rule' to silence this warning union mlx5e_macsec_rule macsec_rule; ^ = NULL drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1131:6: error: variable 'macsec_rule' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (err) ^~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1215:9: note: uninitialized use occurs here return macsec_rule; ^~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1131:2: note: remove the 'if' if its condition is always false if (err) ^~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1118:38: note: initialize the variable 'macsec_rule' to silence this warning union mlx5e_macsec_rule macsec_rule; ^ = NULL 2 errors generated. If macsec_fs_{r,t}x_ft_get() fail, macsec_rule will be uninitialized. Initialize it to NULL at the top of each function so that it cannot be used uninitialized. Fixes: e467b283ffd5 ("net/mlx5e: Add MACsec TX steering rules") Fixes: 3b20949cb21b ("net/mlx5e: Add MACsec RX steering rules") Link: https://github.com/ClangBuiltLinux/linux/issues/1706 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/20220911085748.461033-1-nathan@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	drm/hisilicon: Add depends on MMU	Randy Dunlap
	The Kconfig symbol depended on MMU but was dropped by the commit acad3fe650a5 ("drm/hisilicon: Removed the dependency on the mmu") because it already had as a dependency ARM64 that already selects MMU. But later, commit a0f25a6bb319 ("drm/hisilicon/hibmc: Allow to be built if COMPILE_TEST is enabled") allowed the driver to be built for non-ARM64 when COMPILE_TEST is set but that could lead to unmet direct dependencies and linking errors. Prevent a kconfig warning when MMU is not enabled by making DRM_HISI_HIBMC depend on MMU. WARNING: unmet direct dependencies detected for DRM_TTM Depends on [n]: HAS_IOMEM [=y] && DRM [=m] && MMU [=n] Selected by [m]: - DRM_TTM_HELPER [=m] && HAS_IOMEM [=y] && DRM [=m] - DRM_HISI_HIBMC [=m] && HAS_IOMEM [=y] && DRM [=m] && PCI [=y] && (ARM64 \|\| COMPILE_TEST [=y]) Fixes: acad3fe650a5 ("drm/hisilicon: Removed the dependency on the mmu") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Xinliang Liu <xinliang.liu@linaro.org> Cc: Tian Tao <tiantao6@hisilicon.com> Cc: John Stultz <jstultz@google.com> Cc: Xinwei Kong <kong.kongxinwei@hisilicon.com> Cc: Chen Feng <puck.chen@hisilicon.com> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Huang Rui <ray.huang@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220531025557.29593-1-rdunlap@infradead.org
2022-09-20	fat: port to vfs{g,u}id_t and associated helpers	Christian Brauner
	A while ago we introduced a dedicated vfs{g,u}id_t type in commit 1e5267cd0895 ("mnt_idmapping: add vfs{g,u}id_t"). We already switched over a good part of the VFS. Ultimately we will remove all legacy idmapped mount helpers that operate only on k{g,u}id_t in favor of the new type safe helpers that operate on vfs{g,u}id_t. Cc: Seth Forshee (Digital Ocean) <sforshee@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2022-09-20	Merge branch 'dsa-changes-for-multiple-cpu-ports-part-4'	Paolo Abeni
	Vladimir Oltean says: ==================== DSA changes for multiple CPU ports (part 4) Those who have been following part 1: https://patchwork.kernel.org/project/netdevbpf/cover/20220511095020.562461-1-vladimir.oltean@nxp.com/ part 2: https://patchwork.kernel.org/project/netdevbpf/cover/20220521213743.2735445-1-vladimir.oltean@nxp.com/ and part 3: https://patchwork.kernel.org/project/netdevbpf/cover/20220819174820.3585002-1-vladimir.oltean@nxp.com/ will know that I am trying to enable the second internal port pair from the NXP LS1028A Felix switch for DSA-tagged traffic via "ocelot-8021q". This series represents the final part of that effort. We have: - the introduction of new UAPI in the form of IFLA_DSA_MASTER, the iproute2 patch for which is here: https://patchwork.kernel.org/project/netdevbpf/patch/20220904190025.813574-1-vladimir.oltean@nxp.com/ - preparation for LAG DSA masters in terms of suppressing some operations for masters in the DSA core that simply don't make sense when those masters are a bonding/team interface - handling all the net device events that occur between DSA and a LAG DSA master, including migration to a different DSA master when the current master joins a LAG, or the LAG gets destroyed - updating documentation - adding an implementation for NXP LS1028A, where things are insanely complicated due to hardware limitations. We have 2 tagging protocols: * the native "ocelot" protocol (NPI port mode). This does not support CPU ports in a LAG, and supports a single DSA master. The DSA master can be changed between eno2 (2.5G) and eno3 (1G), but all ports must be down during the changing process, and user ports assigned to the old DSA master will refuse to come up if the user requests that during a "transient" state. * the "ocelot-8021q" software-defined protocol, where the Ethernet ports connected to the CPU are not actually "god mode" ports as far as the hardware is concerned. So here, static assignment between user and CPU ports is possible by editing the PGID_SRC masks for the port-based forwarding matrix, and "CPU ports in a LAG" simply means "a LAG like any other". The series was regression-tested on LS1028A using the local_termination.sh kselftest, in most of the possible operating modes and tagging protocols. I have not done a detailed performance evaluation yet, but using LAG, is possible to exceed the termination bandwidth of a single CPU port in an iperf3 test with multiple senders and multiple receivers. v1 at: https://patchwork.kernel.org/project/netdevbpf/cover/20220830195932.683432-1-vladimir.oltean@nxp.com/ Previous (older) RFC at: https://lore.kernel.org/netdev/20220523104256.3556016-1-olteanv@gmail.com/ ==================== Link: https://lore.kernel.org/r/20220911010706.2137967-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: dsa: felix: add support for changing DSA master	Vladimir Oltean
	Changing the DSA master means different things depending on the tagging protocol in use. For NPI mode ("ocelot" and "seville"), there is a single port which can be configured as NPI, but DSA only permits changing the CPU port affinity of user ports one by one. So changing a user port to a different NPI port globally changes what the NPI port is, and breaks the user ports still using the old one. To address this while still permitting the change of the NPI port, require that the user ports which are still affine to the old NPI port are down, and cannot be brought up until they are all affine to the same NPI port. The tag_8021q mode ("ocelot-8021q") is more flexible, in that each user port can be freely assigned to one CPU port or to the other. This works by filtering host addresses towards both tag_8021q CPU ports, and then restricting the forwarding from a certain user port only to one of the two tag_8021q CPU ports. Additionally, the 2 tag_8021q CPU ports can be placed in a LAG. This works by enabling forwarding via PGID_SRC from a certain user port towards the logical port ID containing both tag_8021q CPU ports, but then restricting forwarding per packet, via the LAG hash codes in PGID_AGGR, to either one or the other. When we change the DSA master to a LAG device, DSA guarantees us that the LAG has at least one lower interface as a physical DSA master. But DSA masters can come and go as lowers of that LAG, and ds->ops->port_change_master() will not get called, because the DSA master is still the same (the LAG). So we need to hook into the ds->ops->port_lag_{join,leave} calls on the CPU ports and update the logical port ID of the LAG that user ports are assigned to. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	docs: net: dsa: update information about multiple CPU ports	Vladimir Oltean
	DSA now supports multiple CPU ports, explain the use cases that are covered, the new UAPI, the permitted degrees of freedom, the driver API, and remove some old "hanging fruits". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: dsa: allow masters to join a LAG	Vladimir Oltean
	There are 2 ways in which a DSA user port may become handled by 2 CPU ports in a LAG: (1) its current DSA master joins a LAG ip link del bond0 && ip link add bond0 type bond mode 802.3ad ip link set eno2 master bond0 When this happens, all user ports with "eno2" as DSA master get automatically migrated to "bond0" as DSA master. (2) it is explicitly configured as such by the user # Before, the DSA master was eno3 ip link set swp0 type dsa master bond0 The design of this configuration is that the LAG device dynamically becomes a DSA master through dsa_master_setup() when the first physical DSA master becomes a LAG slave, and stops being so through dsa_master_teardown() when the last physical DSA master leaves. A LAG interface is considered as a valid DSA master only if it contains existing DSA masters, and no other lower interfaces. Therefore, we mainly rely on method (1) to enter this configuration. Each physical DSA master (LAG slave) retains its dev->dsa_ptr for when it becomes a standalone DSA master again. But the LAG master also has a dev->dsa_ptr, and this is actually duplicated from one of the physical LAG slaves, and therefore needs to be balanced when LAG slaves come and go. To the switch driver, putting DSA masters in a LAG is seen as putting their associated CPU ports in a LAG. We need to prepare cross-chip host FDB notifiers for CPU ports in a LAG, by calling the driver's ->lag_fdb_add method rather than ->port_fdb_add. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: dsa: propagate extack to port_lag_join	Vladimir Oltean
	Drivers could refuse to offload a LAG configuration for a variety of reasons, mainly having to do with its TX type. Additionally, since DSA masters may now also be LAG interfaces, and this will translate into a call to port_lag_join on the CPU ports, there may be extra restrictions there. Propagate the netlink extack to this DSA method in order for drivers to give a meaningful error message back to the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: dsa: suppress device links to LAG DSA masters	Vladimir Oltean
	These don't work (print a harmless error about the operation failing) and make little sense to have anyway, because when a LAG DSA master goes away, we will introduce logic to move our CPU port back to the first physical DSA master. So suppress these device links in preparation for adding support for LAG DSA masters. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: dsa: suppress appending ethtool stats to LAG DSA masters	Vladimir Oltean
	Similar to the discussion about tracking the admin/oper state of LAG DSA masters, we have the problem here that struct dsa_port *cpu_dp caches a single pair of orig_ethtool_ops and netdev_ops pointers. So if we call dsa_master_setup(bond0, cpu_dp) where cpu_dp is also the dev->dsa_ptr of one of the physical DSA masters, we'd effectively overwrite what we cached from that physical netdev with what replaced from the bonding interface. We don't need DSA ethtool stats on the bonding interface when used as DSA master, it's good enough to have them just on the physical DSA masters, so suppress this logic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: dsa: don't keep track of admin/oper state on LAG DSA masters	Vladimir Oltean
	We store information about the DSA master's state in cpu_dp->master_admin_up and cpu_dp->master_oper_up, and this assumes a bijective association between a CPU port and a DSA master. However, when we have CPU ports in a LAG (and DSA masters in a LAG too), the way in which we set up things is that the physical DSA masters still have dev->dsa_ptr pointing to our cpu_dp, but the bonding/team device itself also has its dev->dsa_ptr pointing towards one of the CPU port structures (the first one). So logically speaking, that first cpu_dp can't keep track of both the physical master's admin/oper state, and of the bonding master's state. This isn't even needed; the reason why we keep track of the DSA master's state is to know when it is available for Ethernet-based register access. For that use case, we don't even need LAG; we just need to decide upon one of the physical DSA masters (if there is more than 1 available) and use that. This change suppresses dsa_tree_master_{admin,oper}_state_change() calls on LAG DSA masters (which will be supported in a future change), to allow the tracking of just physical DSA masters. Link: https://lore.kernel.org/netdev/628cc94d.1c69fb81.15b0d.422d@mx.google.com/ Suggested-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: dsa: allow the DSA master to be seen and changed through rtnetlink	Vladimir Oltean
	Some DSA switches have multiple CPU ports, which can be used to improve CPU termination throughput, but DSA, through dsa_tree_setup_cpu_ports(), sets up only the first one, leading to suboptimal use of hardware. The desire is to not change the default configuration but to permit the user to create a dynamic mapping between individual user ports and the CPU port that they are served by, configurable through rtnetlink. It is also intended to permit load balancing between CPU ports, and in that case, the foreseen model is for the DSA master to be a bonding interface whose lowers are the physical DSA masters. To that end, we create a struct rtnl_link_ops for DSA user ports with the "dsa" kind. We expose the IFLA_DSA_MASTER link attribute that contains the ifindex of the newly desired DSA master. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20	net: dsa: introduce dsa_port_get_master()	Vladimir Oltean
	There is a desire to support for DSA masters in a LAG. That configuration is intended to work by simply enslaving the master to a bonding/team device. But the physical DSA master (the LAG slave) still has a dev->dsa_ptr, and that cpu_dp still corresponds to the physical CPU port. However, we would like to be able to retrieve the LAG that's the upper of the physical DSA master. In preparation for that, introduce a helper called dsa_port_get_master() that replaces all occurrences of the dp->cpu_dp->master pattern. The distinction between LAG and non-LAG will be made later within the helper itself. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>