summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-07-21dt-bindings: serial: Remove obsolete nxp,lpc1850-uart.txtRob Herring
nxp,lpc1850-uart.txt binding is already covered by 8250.yaml, so remove it. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230707221607.1064888-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-07-21dt-bindings: serial: Remove obsolete cavium-uart.txtRob Herring
cavium-uart.txt binding is already covered by 8250.yaml, so remove it. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230707221602.1063972-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-07-21loop: do not enforce max_loop hard limit by (new) defaultMauricio Faria de Oliveira
Problem: The max_loop parameter is used for 2 different purposes: 1) initial number of loop devices to pre-create on init 2) maximum number of loop devices to add on access/open() Historically, its default value (zero) caused 1) to create non-zero number of devices (CONFIG_BLK_DEV_LOOP_MIN_COUNT), and no hard limit on 2) to add devices with autoloading. However, the default value changed in commit 85c50197716c ("loop: Fix the max_loop commandline argument treatment when it is set to 0") to CONFIG_BLK_DEV_LOOP_MIN_COUNT, for max_loop=0 not to pre-create devices. That does improve 1), but unfortunately it breaks 2), as the default behavior changed from no-limit to hard-limit. Example: For example, this userspace code broke for N >= CONFIG, if the user relied on the default value 0 for max_loop: mknod("/dev/loopN"); open("/dev/loopN"); // now fails with ENXIO Though affected users may "fix" it with (loop.)max_loop=0, this means to require a kernel parameter change on stable kernel update (that commit Fixes: an old commit in stable). Solution: The original semantics for the default value in 2) can be applied if the parameter is not set (ie, default behavior). This still keeps the intended function in 1) and 2) if set, and that commit's intended improvement in 1) if max_loop=0. Before 85c50197716c: - default: 1) CONFIG devices 2) no limit - max_loop=0: 1) CONFIG devices 2) no limit - max_loop=X: 1) X devices 2) X limit After 85c50197716c: - default: 1) CONFIG devices 2) CONFIG limit (*) - max_loop=0: 1) 0 devices (*) 2) no limit - max_loop=X: 1) X devices 2) X limit This commit: - default: 1) CONFIG devices 2) no limit (*) - max_loop=0: 1) 0 devices 2) no limit - max_loop=X: 1) X devices 2) X limit Future: The issue/regression from that commit only affects code under the CONFIG_BLOCK_LEGACY_AUTOLOAD deprecation guard, thus the fix too is contained under it. Once that deprecated functionality/code is removed, the purpose 2) of max_loop (hard limit) is no longer in use, so the module parameter description can be changed then. Tests: Linux 6.4-rc7 CONFIG_BLK_DEV_LOOP_MIN_COUNT=8 CONFIG_BLOCK_LEGACY_AUTOLOAD=y - default (original) # ls -1 /dev/loop* /dev/loop-control /dev/loop0 ... /dev/loop7 # ./test-loop open: /dev/loop8: No such device or address - default (patched) # ls -1 /dev/loop* /dev/loop-control /dev/loop0 ... /dev/loop7 # ./test-loop # - max_loop=0 (original & patched): # ls -1 /dev/loop* /dev/loop-control # ./test-loop # - max_loop=8 (original & patched): # ls -1 /dev/loop* /dev/loop-control /dev/loop0 ... /dev/loop7 # ./test-loop open: /dev/loop8: No such device or address - max_loop=0 (patched; CONFIG_BLOCK_LEGACY_AUTOLOAD is not set) # ls -1 /dev/loop* /dev/loop-control # ./test-loop open: /dev/loop8: No such device or address Fixes: 85c50197716c ("loop: Fix the max_loop commandline argument treatment when it is set to 0") Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230720143033.841001-3-mfo@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-21loop: deprecate autoloading callback loop_probe()Mauricio Faria de Oliveira
The 'probe' callback in __register_blkdev() is only used under the CONFIG_BLOCK_LEGACY_AUTOLOAD deprecation guard. The loop_probe() function is only used for that callback, so guard it too, accordingly. See commit fbdee71bb5d8 ("block: deprecate autoloading based on dev_t"). Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230720143033.841001-2-mfo@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-21sbitmap: fix batching wakeupDavid Jeffery
Current code supposes that it is enough to provide forward progress by just waking up one wait queue after one completion batch is done. Unfortunately this way isn't enough, cause waiter can be added to wait queue just after it is woken up. Follows one example(64 depth, wake_batch is 8) 1) all 64 tags are active 2) in each wait queue, there is only one single waiter 3) each time one completion batch(8 completions) wakes up just one waiter in each wait queue, then immediately one new sleeper is added to this wait queue 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each wait queue 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7 can't be waken up anymore. Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for single batch. Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20230721095715.232728-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-21Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "I've picked up a handful of arm64 fixes while Catalin's been away, so here they are. Below is the usual summary, but we have basically have two cleanups, a fix for an SME crash and a fix for hibernation: - Fix saving of SME state after SVE vector length is changed - Fix sparse warnings for missing vDSO function prototypes - Fix hibernation resume path when kfence is enabled - Fix field names for the HFGxTR_EL2 register" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes arm64: vdso: Clear common make C=2 warnings arm64: mm: Make hibernation aware of KFENCE arm64: Fix HFGxTR_EL2 field naming
2023-07-21Merge tag 'pm-6.5-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Revert three recent intel_idle commits that introduced a functional issue, included a coding mistake and have been questioned at the design level" * tag 'pm-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "intel_idle: Add support for using intel_idle in a VM guest using just hlt" Revert "intel_idle: Add a "Long HLT" C1 state for the VM guest mode" Revert "intel_idle: Add __init annotation to matchup_vm_state_with_baremetal()"
2023-07-21Merge tag 'sound-6.5-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A pile of fixes that have been gathered since the previous pull. Most of changes are device-specific, and nothing looks too scary. - A memory leak fix in ALSA sequencer code in 6.5-rc - Many fixes for ASoC Qualcomm CODEC drivers, covering SoundWire probe problems - A series of ASoC AMD fixes - A few fixes and cleanups of selftest stuff - HD-audio codec fixes and quirks for Clevo, HP, Lenovo, Dell" * tag 'sound-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits) ALSA: hda/realtek: Add support for DELL Oasis 13/14/16 laptops ALSA: hda/realtek: Fix generic fixup definition for cs35l41 amp ALSA: hda/realtek: Enable Mute LED on HP Laptop 15s-eq2xxx selftests: ALSA: Add test-pcmtest-driver to .gitignore ALSA: hda/realtek: Add quirk for Clevo NS70AU ASoC: fsl_sai: Disable bit clock with transmitter ALSA: seq: Fix memory leak at error path in snd_seq_create_port() ASoC: SOF: ipc3-dtrace: uninitialized data in dfsentry_trace_filter_write() ASoC: cs42l51: fix driver to properly autoload with automatic module loading MAINTAINERS: Redo addition of ssm3515 to APPLE SOUND ASoC: rt5640: Fix the issue of speaker noise ALSA: hda/realtek - remove 3k pull low procedure selftests: ALSA: Fix fclose on an already fclosed file pointer ALSA: pcmtest: Don't use static storage to track per device data ALSA: pcmtest: Convert to platform remove callback returning void ASoC: dt-bindings: audio-graph-card2: Drop incomplete example ASoC: dt-bindings: Update maintainer email id ASoC: amd: ps: Fix extraneous error messages ASoC: fsl_sai: Revert "ASoC: fsl_sai: Enable MCTL_MCLK_EN bit for master mode" ASoC: codecs: SND_SOC_WCD934X should select REGMAP_IRQ ...
2023-07-21Merge tag 'fbdev-for-6.5-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev fixes and cleanups from Helge Deller: "Just the usual bunch of code cleanups in various drivers, this time mostly in vgacon and imxfb: - Code cleanup in vgacon (Jiri Slaby) - Explicitly include correct DT includes (Rob Herring) - imxfb code cleanup (Yangtao Li, Martin Kaiser) - kyrofb: make arrays const and smaller (Colin Ian King) - ep93xx-fb: return value check fix (Yuanjun Gong) - au1200fb: add missing IRQ check (Zhang Shurong)" * tag 'fbdev-for-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: Explicitly include correct DT includes fbdev: ep93xx-fb: fix return value check in ep93xxfb_probe fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe fbdev: kyro: make some const read-only arrays static and reduce type size fbcon: remove unused display (p) from fbcon_redraw() sticon: make sticon_set_def_font() void and remove op parameter vgacon: cache vc_cell_height in vgacon_cursor() vgacon: let vgacon_doresize() return void vgacon: remove unused xpos from vgacon_set_cursor_size() vgacon: remove unneeded forward declarations vgacon: switch vgacon_scrolldelta() and vgacon_restore_screen() fbdev: imxfb: remove unneeded labels fbdev: imxfb: Convert to devm_platform_ioremap_resource() fbdev: imxfb: Convert to devm_kmalloc_array() fbdev: imxfb: Removed unneeded release_mem_region fbdev: imxfb: switch to DEFINE_SIMPLE_DEV_PM_OPS fbdev: imxfb: warn about invalid left/right margin
2023-07-21drm/atomic: Fix potential use-after-free in nonblocking commitsDaniel Vetter
This requires a bit of background. Properly done a modeset driver's unload/remove sequence should be drm_dev_unplug(); drm_atomic_helper_shutdown(); drm_dev_put(); The trouble is that the drm_dev_unplugged() checks are by design racy, they do not synchronize against all outstanding ioctl. This is because those ioctl could block forever (both for modeset and for driver specific ioctls), leading to deadlocks in hotunplug. Instead the code sections that touch the hardware need to be annotated with drm_dev_enter/exit, to avoid accessing hardware resources after the unload/remove has finished. To avoid use-after-free issues all the involved userspace visible objects are supposed to hold a reference on the underlying drm_device, like drm_file does. The issue now is that we missed one, the atomic modeset ioctl can be run in a nonblocking fashion, and in that case it cannot rely on the implied drm_device reference provided by the ioctl calling context. This can result in a use-after-free if an nonblocking atomic commit is carefully raced against a driver unload. Fix this by unconditionally grabbing a drm_device reference for any drm_atomic_state structures. Strictly speaking this isn't required for blocking commits and TEST_ONLY calls, but it's the simpler approach. Thanks to shanzhulig for the initial idea of grabbing an unconditional reference, I just added comments, a condensed commit message and fixed a minor potential issue in where exactly we drop the final reference. Reported-by: shanzhulig <shanzhulig@gmail.com> Suggested-by: shanzhulig <shanzhulig@gmail.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: stable@kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-21iavf: check for removal state before IAVF_FLAG_PF_COMMS_FAILEDJacob Keller
In iavf_adminq_task(), if the function can't acquire the adapter->crit_lock, it checks if the driver is removing. If so, it simply exits without re-enabling the interrupt. This is done to ensure that the task stops processing as soon as possible once the driver is being removed. However, if the IAVF_FLAG_PF_COMMS_FAILED is set, the function checks this before attempting to acquire the lock. In this case, the function exits early and re-enables the interrupt. This will happen even if the driver is already removing. Avoid this, by moving the check to after the adapter->crit_lock is acquired. This way, if the driver is removing, we will not re-enable the interrupt. Fixes: fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-21iavf: fix potential deadlock on allocation failureJacob Keller
In iavf_adminq_task(), if kzalloc() fails to allocate the event.msg_buf, the function will exit without releasing the adapter->crit_lock. This is unlikely, but if it happens, the next access to that mutex will deadlock. Fix this by moving the unlock to the end of the function, and adding a new label to allow jumping to the unlock portion of the function exit flow. Fixes: fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-21i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir()Wang Ming
The debugfs_create_dir() function returns error pointers. It never returns NULL. Most incorrect error checks were fixed, but the one in i40e_dbg_init() was forgotten. Fix the remaining error check. Fixes: 02e9c290814c ("i40e: debugfs interface") Signed-off-by: Wang Ming <machel@vivo.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-21ia64: mmap: Consider pgoff when searching for free mappingHelge Deller
IA64 is the only architecture which does not consider the pgoff value when searching for a possible free memory region with vm_unmapped_area(). Adding this seems to have no negative side effect on IA64, so add it now to make IA64 consistent with all other architectures. Cc: stable@vger.kernel.org # 6.4 Signed-off-by: Helge Deller <deller@gmx.de> Tested-by: matoro <matoro_mailinglist_kernel@matoro.tk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-ia64@vger.kernel.org Link: https://lore.kernel.org/r/20230721152432.196382-3-deller@gmx.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-21io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()Helge Deller
The io_uring testcase is broken on IA-64 since commit d808459b2e31 ("io_uring: Adjust mapping wrt architecture aliasing requirements"). The reason is, that this commit introduced an own architecture independend get_unmapped_area() search algorithm which finds on IA-64 a memory region which is outside of the regular memory region used for shared userspace mappings and which can't be used on that platform due to aliasing. To avoid similar problems on IA-64 and other platforms in the future, it's better to switch back to the architecture-provided get_unmapped_area() function and adjust the needed input parameters before the call. Beside fixing the issue, the function now becomes easier to understand and maintain. This patch has been successfully tested with the io_uring testcase on physical x86-64, ppc64le, IA-64 and PA-RISC machines. On PA-RISC the LTP mmmap testcases did not report any regressions. Cc: stable@vger.kernel.org # 6.4 Signed-off-by: Helge Deller <deller@gmx.de> Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk> Fixes: d808459b2e31 ("io_uring: Adjust mapping wrt architecture aliasing requirements") Link: https://lore.kernel.org/r/20230721152432.196382-2-deller@gmx.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-21ARM: dts: at91: sam9x60: fix the SOC detectionDurai Manickam KR
Remove the dbgu compatible strings in the UART submodule of the flexcom for the proper SOC detection. Fixes: 99c808335877 (ARM: dts: at91: sam9x60: Add missing flexcom definitions) Signed-off-by: Durai Manickam KR <durai.manickamkr@microchip.com> Link: https://lore.kernel.org/r/20230712100042.317856-1-durai.manickamkr@microchip.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-07-21ARM: dts: nspire: Fix arm primecell compatible stringSudeep Holla
Commit c44e0503e5fd ("docs: dt: fix documented Primecell compatible string") removed the last occurrence of incorrect "arm,amba-primecell" compatible string, but somehow one occurrence of the same is left in the nspire platform. Fix the same by replacing it with the correct "arm,primecell" compatible. Cc: Rob Herring <robh+dt@kernel.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org> Cc: Conor Dooley <conor+dt@kernel.org> Cc: devicetree@vger.kernel.org Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20230711092617.1412815-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-07-21Merge tag 'juno-fix-6.5' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes Armv8 Juno/Vexpress DTS fix for v6.5 A single simple fix removing dangling symlink left as part of arm dts files movement to vendor sub-directories. It is harmless and causes no issue for the build but scripts copying files see errors/failures. * tag 'juno-fix-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: arm64: dts: arm: Remove the dangling vexpress-v2m-rs1.dtsi symlink Link: https://lore.kernel.org/r/20230721112359.3369716-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-07-21ALSA: seq: remove redundant unsigned comparison to zeroWang Weiyang
Since struct snd_ump_block_info::first_group is unsigned char, comparison to zero is redundant Signed-off-by: Wang Weiyang <wangweiyang2@huawei.com> Fixes: 81fd444aa371 ("ALSA: seq: Bind UMP device") Link: https://lore.kernel.org/r/20230721103124.18522-1-wangweiyang2@huawei.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-07-21arm64/fpsimd: Ensure SME storage is allocated after SVE VL changesMark Brown
When we reconfigure the SVE vector length we discard the backing storage for the SVE vectors and then reallocate on next SVE use, leaving the SME specific state alone. This means that we do not enable SME traps if they were already disabled. That means that userspace code can enter streaming mode without trapping, putting the task in a state where if we try to save the state of the task we will fault. Since the ABI does not specify that changing the SVE vector length disturbs SME state, and since SVE code may not be aware of SME code in the process, we shouldn't simply discard any ZA state. Instead immediately reallocate the storage for SVE, and disable SME if we change the SVE vector length while there is no SME state active. Disabling SME traps on SVE vector length changes would make the overall code more complex since we would have a state where we have valid SME state stored but might get a SME trap. Fixes: 9e4ab6c89109 ("arm64/sme: Implement vector length configuration prctl()s") Reported-by: David Spickett <David.Spickett@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230720-arm64-fix-sve-sme-vl-change-v2-1-8eea06b82d57@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-07-21Merge branch 'octeontx2-pf-round-robin-sched'David S. Miller
Hariprasad Kelam says: ==================== octeontx2-pf: support Round Robin scheduling octeontx2 and CN10K silicons support Round Robin scheduling. When multiple traffic flows reach transmit level with the same priority, with Round Robin scheduling traffic flow with the highest quantum value is picked. With this support, the user can add multiple classes with the same priority and different quantum in htb offload. This series of patches adds support for the same. Patch1: implement transmit schedular allocation algorithm as preparation for support round robin scheduling. Patch2: Allow quantum parameter in HTB offload mode. Patch3: extends octeontx2 htb offload support for Round Robin scheduling Patch4: extend QOS documentation for Round Robin scheduling Hariprasad Kelam (1): docs: octeontx2: extend documentation for Round Robin scheduling Naveen Mamindlapalli (3): octeontx2-pf: implement transmit schedular allocation algorithm sch_htb: Allow HTB quantum parameter in offload mode octeontx2-pf: htb offload support for Round Robin scheduling --- v4 * update classid values in documentation. v3 * 1. update QOS documentation for round robin scheduling 2. added out of bound checks for quantum parameter v2 * change data type of otx2_index_used to reduce size of structure otx2_qos_cfg ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21docs: octeontx2: extend documentation for Round Robin schedulingHariprasad Kelam
Add example tc-htb commands for Round robin scheduling Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21octeontx2-pf: htb offload support for Round Robin schedulingNaveen Mamindlapalli
When multiple traffic flows reach Transmit level with the same priority, with Round robin scheduling traffic flow with the highest quantum value is picked. With this support, the user can add multiple classes with the same priority and different quantum. This patch does necessary changes to support the same. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21sch_htb: Allow HTB quantum parameter in offload modeNaveen Mamindlapalli
The current implementation of HTB offload returns the EINVAL error for quantum parameter. This patch removes the error returning checks for 'quantum' parameter and populates its value to tc_htb_qopt_offload structure such that driver can use the same. Add quantum parameter check in mlx5 driver, as mlx5 devices are not capable of supporting the quantum parameter when htb offload is used. Report error if quantum parameter is set to a non-default value. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21octeontx2-pf: implement transmit schedular allocation algorithmNaveen Mamindlapalli
unlike strict priority, where number of classes are limited to max 8, there is no restriction on the number of dwrr child nodes unless the count increases the max number of child nodes supported. Hardware expects strict priority transmit schedular indexes mapped to their priority. This patch adds defines transmit schedular allocation algorithm such that the above requirement is honored. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21Merge branch 'mlxsw-enslavement'David S. Miller
Petr Machata says: ==================== mlxsw: Permit enslavement to netdevices with uppers The mlxsw driver currently makes the assumption that the user applies configuration in a bottom-up manner. Thus netdevices need to be added to the bridge before IP addresses are configured on that bridge or SVI added on top of it. Enslaving a netdevice to another netdevice that already has uppers is in fact forbidden by mlxsw for this reason. Despite this safety, it is rather easy to get into situations where the offloaded configuration is just plain wrong. As an example, take a front panel port, configure an IP address: it gets a RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the port from the bridge again, but the RIF never comes back. There is a number of similar situations, where changing the configuration there and back utterly breaks the offload. Similarly, detaching a front panel port from a configured topology means unoffloading of this whole topology -- VLAN uppers, next hops, etc. Attaching the port back is then not permitted at all. If it were, it would not result in a working configuration, because much of mlxsw is written to react to changes in immediate configuration. There is nothing that would go visit netdevices in the attached-to topology and offload existing routes and VLAN memberships, for example. In this patchset, introduce a number of replays to be invoked so that this sort of post-hoc offload is supported. Then remove the vetoes that disallowed enslavement of front panel ports to other netdevices with uppers. The patchset progresses as follows: - In patch #1, fix an issue in the bridge driver. To my knowledge, the issue could not have resulted in a buggy behavior previously, and thus is packaged with this patchset instead of being sent separately to net. - In patch #2, add a new helper to the switchdev code. - In patch #3, drop mlxsw selftests that will not be relevant after this patchset anymore. - Patches #4, #5, #6, #7 and #8 prepare the codebase for smoother introduction of the rest of the code. - Patches #9, #10, #11, #12, #13 and #14 replay various aspects of upper configuration when a front panel port is introduced into a topology. Individual patches take care of bridge and LAG RIF memberships, switchdev replay, nexthop and neighbors replay, and MACVLAN offload. - Patches #15 and #16 introduce RIFs for newly-relevant netdevices when a front panel port is enslaved (in which case all uppers are newly relevant), or, respectively, deslaved (in which case the newly-relevant netdevice is the one being deslaved). - Up until this point, the introduced scaffolding was not really used, because mlxsw still forbids enslavement of mlxsw netdevices to uppers with uppers. In patch #17, this condition is finally relaxed. A sizable selftest suite is available to test all this new code. That will be sent in a separate patchset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum: Permit enslavement to netdevices with uppersPetr Machata
Enslaving of front panel ports (and their uppers) to netdevices that already have uppers is currently forbidden. In the previous patches, a number of replays have been added. Those ensure that various bits of state, such as next hops or switchdev objects, are offloaded when they become relevant due to a mlxsw lower being introduced into the topology. However the act of actually, for example, enslaving a front-panel port to a bridge with uppers, has been vetoed so far. In this patch, remove the vetoes and permit the operation. mlxsw currently validates creation of "interesting" uppers. Thus creating VLAN netdevices on top of 802.1ad bridges is forbidden if the bridge has an mlxsw lower, but permitted in general. This validation code never gets run when a port is introduced as a lower of an existing netdevice structure. Thus when enslaving an mlxsw netdevice to netdevices with uppers, invoke the PRECHANGEUPPER event handler for each netdevice above the one that the front panel port is being enslaved to. This way the tower of netdevices above the attachment point is validated. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Replay IP NETDEV_UP on device deslavementPetr Machata
When a netdevice is removed from a bridge or a LAG, and it has an IP address, it should join the router and gain a RIF. Do that by replaying address addition event on the netdevice. When handling deslavement of LAG or its upper from a bridge device, the replay should be done after all the lowers of the LAG have left the bridge. Thus these scenarios are handled by passing replay_deslavement of false, and by invoking, after the lowers have been processed, a new helper, mlxsw_sp_netdevice_post_lag_event(), which does the per-LAG / -upper handling, and in particular invokes the replay. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Replay IP NETDEV_UP on device enslavementPetr Machata
Enslaving of front panel ports (and their uppers) to netdevices that already have uppers is currently forbidden. When this is permitted, any uppers with IP addresses need to have the NETDEV_UP inetaddr event replayed, so that any RIFs are created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Replay neighbours when RIF is madePetr Machata
As neighbours are created, mlxsw is involved through the netevent notifications. When at the time there is no RIF for a given neighbour, the notification is not acted upon. When the RIF is later created, these outstanding neighbours are left unoffloaded and cause traffic to go through the SW datapath. In order to fix this issue, as a RIF is created, walk the ARP and ND tables and find neighbours for the netdevice that represents the RIF. Then schedule neighbour work for them, allowing them to be offloaded. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Replay MACVLANs when RIF is madePetr Machata
If IP address is added to a MACVLAN netdevice, the effect is of configuring VRRP on the RIF for the netdevice linked to the MACVLAN. Because the MACVLAN offload is tied to existence of a RIF at the linked netdevice, adding a MACVLAN is currently not allowed until a RIF is present. If this requirement stays, it will never be possible to attach a first port into a topology that involves a MACVLAN. Thus topologies would need to be built in a certain order, which is impractical. Additionally, IP address removal, which leads to disappearance of the RIF that the MACVLAN depends on, cannot be vetoed. Thus even as things stand now it is possible to get to a state where a MACVLAN netdevice exists without a RIF, despite having mlxsw lowers. And once the MACVLAN is un-offloaded due to RIF getting destroyed, recreating the RIF does not bring it back. In this patch, accept that MACVLAN can be created out of order and support that use case. One option would seem to be to simply recognize MACVLAN netdevices as "interesting", and let the existing replay mechanisms take care of the offload. However, that does not address the necessity to reoffload MACVLAN once a RIF is created. Thus add a new replay hook, symmetrical to mlxsw_sp_rif_macvlan_flush(), called mlxsw_sp_rif_macvlan_replay(), which instead of unwinding the existing offloads, applies the configuration as if the netdevice were created just now. Additionally, remove all vetoes and warning messages that checked for presence of a RIF at the linked device. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Offload ethernet nexthops when RIF is madePetr Machata
As RIF is created, refresh each netxhop group tracked at the CRIF for which the RIF was created. Note that nothing needs to be done for IPIP nexthops. The RIF for these is either available from the get-go, or will never be available, so no after the fact offloading needs to be done. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Join RIFs of LAG upper VLANsPetr Machata
In the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to join not only RIF for the immediate LAG, as is currently the case, but also RIFs for VLAN netdevices upper to the LAG. In this patch, extend mlxsw_sp_netdevice_router_join_lag() to walk the uppers of a LAG being joined, and also join any VLAN ones. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_switchdev: Replay switchdev objects on port joinPetr Machata
Currently it never happens that a netdevice that is already a bridge slave would suddenly become mlxsw upper. The only case where this might be possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG has any upper (e.g. is enslaved), enlaving mlxsw port to that LAG is forbidden. Thus the only way to install a LAG between a bridge and a mlxsw port is by first enslaving the port to the LAG, and then enslaving that LAG to a bridge. At that point there are no bridge objects (such as port VLANs) to replay. Those are added afterwards, and notified as they are created. This holds even for the PVID. However in the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to replay the existing bridge objects. Without this replay, e.g. the mlxsw bridge_port_vlan objects are not instantiated, which causes issues later, as a lot of code relies on their presence. To that end, add a new notifier block whose sole role is to filter out events related to the one relevant upper, and forward those to the existing switchdev notifier block. Pass the new notifier block to switchdev_bridge_port_offload() when the bridge port is created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum: On port enslavement to a LAG, join upper's bridgesPetr Machata
Currently it never happens that a netdevice that is already a bridge slave would suddenly become mlxsw upper. The only case where this might be possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG already has an upper, enslaving mlxsw port to that LAG is forbidden. Thus the only way to install a LAG between a bridge and a mlxsw port is by first enslaving the port to the LAG, and then enslaving that LAG to a bridge. However in the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to join bridges of LAG uppers. Without this replay, the mlxsw bridge_port objects are not instantiated, which causes issues later, as a lot of code relies on their presence. Therefore in this patch, when the first mlxsw physical netdevice is enslaved to a LAG, consider bridges upper to the LAG (both the direct master, if any, and any bridge masters of VLAN uppers), and have the relevant netdevices join their bridges. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum: Add a replay_deslavement argument to event handlersPetr Machata
When handling deslavement of LAG or its upper from a bridge device, when the deslaved netdevice has an IP address, it should join the router. This should be done after all the lowers of the LAG have left the bridge. The replay intended to cause the device to join the router therefore cannot be invoked unconditionally in the event handlers themselves. It can be done right away if the handler is invoked for a sole device, but when it is invoked repeated for each LAG lower, the replay needs to be postponed until after this processing is done. To that end, add a boolean parameter, replay_deslavement, to mlxsw_sp_netdevice_port_upper_event(), mlxsw_sp_netdevice_port_vlan_event() and one helper on the call path. Have the invocations that are done for sole netdevices pass true, and those done for LAG lowers pass false. Nothing depends on this flag at this point, but it removes some noise from the patch that introduces the replay itself. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum: Allow event handlers to check unowned bridgesPetr Machata
Currently the bridge-related handlers bail out when the event is related to a netdevice that is not an upper of one of the front-panel ports. In order to allow enslavement of front-panel ports to bridges that already have uppers, it will be necessary to replay CHANGEUPPER events to validate that the configuration is offloadable. In order for the replay to be effective, it must be possible to ignore unsupported configuration in the context of an actual notifier event, but to still "veto" these configurations when the validation is performed. To that end, introduce two parameters to a number of handlers: mlxsw_sp, because it will not be possible to deduce that from the netdevice lowers; and process_foreign to indicate whether netdevices that are not front panel uppers should be validated. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum: Split a helper out of mlxsw_sp_netdevice_event()Petr Machata
Move the meat of mlxsw_sp_netdevice_event() to a separate function that does just the validation. This separate helper will be possible to call later for recursive ascent when validating attachment of a front panel port to a bridge with uppers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Extract a helper to schedule neighbour workPetr Machata
This will come in handy for neighbour replay. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21mlxsw: spectrum_router: Allow address handlers to run on bridge portsPetr Machata
Currently the IP address event handlers bail out when the event is related to a netdevice that is a bridge port or a member of a LAG. In order to create a RIF when a bridged or LAG'd port is unenslaved, these event handlers will be replayed. However, at the point in time when the NETDEV_CHANGEUPPER event is delivered, informing of the loss of enslavement, the port is still formally enslaved. In order for the operation to have any effect, these handlers need an extra parameter to indicate that the check for bridge or LAG membership should not be done. In this patch, add an argument "nomaster" to several event handlers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21selftests: mlxsw: rtnetlink: Drop obsolete testsPetr Machata
Support for enslaving ports to LAGs with uppers will be added in the following patches. Selftests to make sure it actually does the right thing are ready and will be sent as a follow-up. Similarly, ordering of MACVLAN creation and RIF creation will be relaxed and it will be permitted to create a MACVLAN first. Thus these two tests are obsolete. Drop them. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21net: switchdev: Add a helper to replay objects on a bridge portPetr Machata
When a front panel joins a bridge via another netdevice (typically a LAG), the driver needs to learn about the objects configured on the bridge port. When the bridge port is offloaded by the driver for the first time, this can be achieved by passing a notifier to switchdev_bridge_port_offload(). The notifier is then invoked for the individual objects (such as VLANs) configured on the bridge, and can look for the interesting ones. Calling switchdev_bridge_port_offload() when the second port joins the bridge lower is unnecessary, but the replay is still needed. To that end, add a new function, switchdev_bridge_port_replay(), which does only the replay part of the _offload() function in exactly the same way as that function. Cc: Jiri Pirko <jiri@resnulli.us> Cc: Ivan Vecera <ivecera@redhat.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21net: bridge: br_switchdev: Tolerate -EOPNOTSUPP when replaying MDBPetr Machata
There are two kinds of MDB entries to be replayed: port MDB entries, and host MDB entries. They are both replayed by br_switchdev_mdb_replay(). If the driver supports one kind, but lacks the other, the first -EOPNOTSUPP returned terminates the whole replay, including any further still-supported objects in the list. For this to cause issues, there must be MDB entries for both the host and the port being replayed. In that case, if the driver bails out from handling the host entry, the port entries are never replayed. However, the replay is currently only done when a switchdev port joins a bridge. There would be no port memberships at that point. Thus despite being erroneous, the code does not cause observable bugs. This is not an issue with other object kinds either, because there, each function replays one object kind. If a driver does not support that kind, it makes sense to bail out early. -EOPNOTSUPP is then ignored in nbp_switchdev_sync_objs(). For MDB, suppress the -EOPNOTSUPP error code in br_switchdev_mdb_replay() already, so that the whole list gets replayed. The reason we need this patch is that a future patch will introduce a replay that should be used when a front-panel port netdevice is enslaved to a bridge lower, in particular a LAG. The LAG netdevice can already have both host and port MDB entries. The port entries need to be replayed so that they are offloaded on the port that joins the LAG. Cc: Jiri Pirko <jiri@resnulli.us> Cc: Ivan Vecera <ivecera@redhat.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21nvme-rdma: fix potential unbalanced freeze & unfreezeMing Lei
Move start_freeze into nvme_rdma_configure_io_queues(), and there is at least two benefits: 1) fix unbalanced freeze and unfreeze, since re-connection work may fail or be broken by removal 2) IO during error recovery can be failfast quickly because nvme fabrics unquiesces queues after teardown. One side-effect is that !mpath request may timeout during connecting because of queue topo change, but that looks not one big deal: 1) same problem exists with current code base 2) compared with !mpath, mpath use case is dominant Fixes: 9f98772ba307 ("nvme-rdma: fix controller reset hang during traffic") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-21nvme-tcp: fix potential unbalanced freeze & unfreezeMing Lei
Move start_freeze into nvme_tcp_configure_io_queues(), and there is at least two benefits: 1) fix unbalanced freeze and unfreeze, since re-connection work may fail or be broken by removal 2) IO during error recovery can be failfast quickly because nvme fabrics unquiesces queues after teardown. One side-effect is that !mpath request may timeout during connecting because of queue topo change, but that looks not one big deal: 1) same problem exists with current code base 2) compared with !mpath, mpath use case is dominant Fixes: 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-21nvme: fix possible hang when removing a controller during error recoveryMing Lei
Error recovery can be interrupted by controller removal, then the controller is left as quiesced, and IO hang can be caused. Fix the issue by unquiescing controller unconditionally when removing namespaces. This way is reasonable and safe given forward progress can be made when removing namespaces. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reported-by: Chunguang Xu <brookxu.cn@gmail.com> Closes: https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@shopee.com/ Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-21net: ethernet: mtk_ppe: add MTK_FOE_ENTRY_V{1,2}_SIZE macrosLorenzo Bianconi
Introduce MTK_FOE_ENTRY_V{1,2}_SIZE macros in order to make more explicit foe_entry size for different chipset revisions. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-20tools/testing/cxl: Remove unused SZ_512G macroXiao Yang
SZ_512G macro has become useless since commit b2f3b74e1072 ("tools/testing/cxl: Move cxl_test resources to the top of memory") so remove it directly. Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Link: https://lore.kernel.org/r/20230719163103.3392-1-yangx.jy@fujitsu.com Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-20Merge tag 'drm-fixes-2023-07-21' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Mostly amdgpu fixes, a couple of i915 fixes, some nouveau and then a few misc accel and other fixes. client: - memory leak fix dma-buf: - memory leak fix qaic: - bound check fixes - map_user_pages leak - int overflow fixes habanalabs: - debugfs stub helper nouveau: - aux event slot fixes - anx9805 cards fixes i915: - Add sentinel to xehp_oa_b_counters - Revert "drm/i915: use localized __diag_ignore_all() instead of per file" amdgpu: - More PCIe DPM fixes for Intel platforms - DCN3.0.1 fixes - Virtual display timer fix - Async flip fix - SMU13 clock reporting fixes - Add missing PSP firmware declaration - DP MST fix - DCN3.1.x fixes - Slab out of bounds fix" * tag 'drm-fixes-2023-07-21' of git://anongit.freedesktop.org/drm/drm: (31 commits) accel/habanalabs: add more debugfs stub helpers drm/nouveau/kms/nv50-: init hpd_irq_lock for PIOR DP drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts drm/nouveau/i2c: fix number of aux event slots drm/amdgpu: use a macro to define no xcp partition case drm/amdgpu/vm: use the same xcp_id from root PD drm/amdgpu: fix slab-out-of-bounds issue in amdgpu_vm_pt_create drm/amdgpu: Allocate root PD on correct partition drm/amd/display: Keep PHY active for DP displays on DCN31 drm/amd/display: Prevent vtotal from being set to 0 drm/amd/display: Disable MPC split by default on special asic drm/amd/display: check TG is non-null before checking if enabled drm/amd/display: Add polling method to handle MST reply packet drm/amd/display: Clean up errors & warnings in amdgpu_dm.c drm/amdgpu: Allow the initramfs generator to include psp_13_0_6_ta drm/amdgpu/pm: make mclk consistent for smu 13.0.7 drm/amdgpu/pm: make gfxclock consistent for sienna cichlid drm/amd/display: only accept async flips for fast updates drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel drm/amd/display: add DCN301 specific logic for OTG programming ...
2023-07-20Merge branch 'nexthop-refactor-and-fix-nexthop-selection-for-multipath-routes'Jakub Kicinski
Benjamin Poirier says: ==================== nexthop: Refactor and fix nexthop selection for multipath routes In order to select a nexthop for multipath routes, fib_select_multipath() is used with legacy nexthops and nexthop_select_path_hthr() is used with nexthop objects. Those two functions perform a validity test on the neighbor related to each nexthop but their logic is structured differently. This causes a divergence in behavior and nexthop_select_path_hthr() may return a nexthop that failed the neighbor validity test even if there was one that passed. Refactor nexthop_select_path_hthr() to make it more similar to fib_select_multipath() and fix the problem mentioned above. v1: https://lore.kernel.org/netdev/20230529201914.69828-1-bpoirier@nvidia.com/ ==================== Link: https://lore.kernel.org/r/20230719-nh_select-v2-0-04383e89f868@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>