summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-09-18powerpc/dexcr: Move HASHCHK trap handlerBenjamin Gray
Syzkaller reported a sleep in atomic context bug relating to the HASHCHK handler logic: BUG: sleeping function called from invalid context at arch/powerpc/kernel/traps.c:1518 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 25040, name: syz-executor preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 no locks held by syz-executor/25040. irq event stamp: 34 hardirqs last enabled at (33): [<c000000000048b38>] prep_irq_for_enabled_exit arch/powerpc/kernel/interrupt.c:56 [inline] hardirqs last enabled at (33): [<c000000000048b38>] interrupt_exit_user_prepare_main+0x148/0x600 arch/powerpc/kernel/interrupt.c:230 hardirqs last disabled at (34): [<c00000000003e6a4>] interrupt_enter_prepare+0x144/0x4f0 arch/powerpc/include/asm/interrupt.h:176 softirqs last enabled at (0): [<c000000000281954>] copy_process+0x16e4/0x4750 kernel/fork.c:2436 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 15 PID: 25040 Comm: syz-executor Not tainted 6.5.0-rc5-00001-g3ccdff6bb06d #3 Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_021) hv:phyp pSeries Call Trace: [c0000000a8247ce0] [c00000000032b0e4] __might_resched+0x3b4/0x400 kernel/sched/core.c:10189 [c0000000a8247d80] [c0000000008c7dc8] __might_fault+0xa8/0x170 mm/memory.c:5853 [c0000000a8247dc0] [c00000000004160c] do_program_check+0x32c/0xb20 arch/powerpc/kernel/traps.c:1518 [c0000000a8247e50] [c000000000009b2c] program_check_common_virt+0x3bc/0x3c0 To determine if a trap was caused by a HASHCHK instruction, we inspect the user instruction that triggered the trap. However this may sleep if the page needs to be faulted in (get_user_instr() reaches __get_user(), which calls might_fault() and triggers the bug message). Move the HASHCHK handler logic to after we allow IRQs, which is fine because we are only interested in HASHCHK if it's a user space trap. Fixes: 5bcba4e6c13f ("powerpc/dexcr: Handle hashchk exception") Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230915034604.45393-1-bgray@linux.ibm.com
2023-09-18powerpc/82xx: Select FSL_SOCChristophe Leroy
It used to be impossible to select CONFIG_CPM2 without selecting CONFIG_FSL_SOC at the same time because CONFIG_CPM2 was dependent on CONFIG_8260 and CONFIG_8260 was selecting CONFIG_FSL_SOC. But after commit eb5aa2137275 ("powerpc/82xx: Remove CONFIG_8260 and CONFIG_8272") CONFIG_CPM2 depends on CONFIG_PPC_82xx instead but CONFIG_PPC_82xx doesn't directly selects CONFIG_FSL_SOC. Fix it by forcing CONFIG_PPC_82xx to select CONFIG_FSL_SOC just like already done by PPC_8xx, PPC_MPC512x, PPC_83xx, PPC_86xx. Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: eb5aa2137275 ("powerpc/82xx: Remove CONFIG_8260 and CONFIG_8272") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/7ab513546148ebe33ddd4b0ea92c7bfd3cce3ad7.1694705016.git.christophe.leroy@csgroup.eu
2023-09-18powerpc: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION and ↵Naveen N Rao
FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY We recently added support for -fpatchable-function-entry and it is enabled by default on ppc32 (ppc64 needs gcc v13.1.0). When building the kernel for ppc32 and also enabling CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, we see the below build error with older gcc versions: powerpc-linux-gnu-ld: init/main.o(__patchable_function_entries): error: need linked-to section for --gc-sections This error is thrown since __patchable_function_entries section would be garbage collected with --gc-sections since it does not reference any other kept sections. This has subsequently been fixed with: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=b7d072167715829eed0622616f6ae0182900de3e Disable LD_DEAD_CODE_DATA_ELIMINATION for gcc versions before v11.1.0 if using -fpatchable-function-entry to avoid this bug. Fixes: 0f71dcfb4aef ("powerpc/ftrace: Add support for -fpatchable-function-entry") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230913134129.2782088-1-naveen@kernel.org
2023-09-18powerpc/watchpoints: Annotate atomic context in more placesBenjamin Gray
It can be easy to miss that the notifier mechanism invokes the callbacks in an atomic context, so add some comments to that effect on the two handlers we register here. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230829063457.54157-4-bgray@linux.ibm.com
2023-09-18powerpc/watchpoint: Disable pagefaults when getting user instructionBenjamin Gray
This is called in an atomic context, so is not allowed to sleep if a user page needs to be faulted in and has nowhere it can be deferred to. The pagefault_disabled() function is documented as preventing user access methods from sleeping. In practice the page will be mapped in nearly always because we are reading the instruction that just triggered the watchpoint trap. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230829063457.54157-3-bgray@linux.ibm.com
2023-09-18powerpc/watchpoints: Disable preemption in thread_change_pc()Benjamin Gray
thread_change_pc() uses CPU local data, so must be protected from swapping CPUs while it is reading the breakpoint struct. The error is more noticeable after 1e60f3564bad ("powerpc/watchpoints: Track perf single step directly on the breakpoint"), which added an unconditional __this_cpu_read() call in thread_change_pc(). However the existing __this_cpu_read() that runs if a breakpoint does need to be re-inserted has the same issue. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230829063457.54157-2-bgray@linux.ibm.com
2023-09-18powerpc/perf/hv-24x7: Update domain value checkKajol Jain
Valid domain value is in range 1 to HV_PERF_DOMAIN_MAX. Current code has check for domain value greater than or equal to HV_PERF_DOMAIN_MAX. But the check for domain value 0 is missing. Fix this issue by adding check for domain value 0. Before: # ./perf stat -v -e hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ sleep 1 Using CPUID 00800200 Control descriptor is not initialized Error: The sys_perf_event_open() syscall returned with 5 (Input/output error) for event (hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/). /bin/dmesg | grep -i perf may provide additional information. Result from dmesg: [ 37.819387] hv-24x7: hcall failed: [0 0x60040000 0x100 0] => ret 0xfffffffffffffffc (-4) detail=0x2000000 failing ix=0 After: # ./perf stat -v -e hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ sleep 1 Using CPUID 00800200 Control descriptor is not initialized Warning: hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ event is not supported by the kernel. failed to read counter hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ Fixes: ebd4a5a3ebd9 ("powerpc/perf/hv-24x7: Minor improvements") Reported-by: Krishan Gopal Sarawast <krishang@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230825055601.360083-1-kjain@linux.ibm.com
2023-09-17Linux 6.6-rc2v6.6-rc2Linus Torvalds
2023-09-17Merge tag 'x86-urgent-2023-09-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - Fix an UV boot crash - Skip spurious ENDBR generation on _THIS_IP_ - Fix ENDBR use in putuser() asm methods - Fix corner case boot crashes on 5-level paging - and fix a false positive WARNING on LTO kernels" * tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/purgatory: Remove LTO flags x86/boot/compressed: Reserve more memory for page tables x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*() x86/ibt: Suppress spurious ENDBR x86/platform/uv: Use alternate source for socket to node data
2023-09-17Merge tag 'sched-urgent-2023-09-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Fix a performance regression on large SMT systems, an Intel SMT4 balancing bug, and a topology setup bug on (Intel) hybrid processors" * tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain sched/fair: Fix SMT4 group_smt_balance handling sched/fair: Optimize should_we_balance() for large SMT systems
2023-09-17Merge tag 'objtool-urgent-2023-09-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Ingo Molnar: "Fix a cold functions related false-positive objtool warning that triggers on Clang" * tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix _THIS_IP_ detection for cold functions
2023-09-17Merge tag 'core-urgent-2023-09-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull WARN fix from Ingo Molnar: "Fix a missing preempt-enable in the WARN() slowpath" * tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: panic: Reenable preemption in WARN slowpath
2023-09-17gve: Use size_add() in call to struct_size()Gustavo A. R. Silva
If, for any reason, `tx_stats_num + rx_stats_num` wraps around, the protection that struct_size() adds against potential integer overflows is defeated. Fix this by hardening call to struct_size() with size_add(). Fixes: 691f4077d560 ("gve: Replace zero-length array with flexible-array member") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17stat: remove no-longer-used helper macrosLinus Torvalds
The choose_32_64() macros were added to deal with an odd inconsistency between the 32-bit and 64-bit layout of 'struct stat' way back when in commit a52dd971f947 ("vfs: de-crapify "cp_new_stat()" function"). Then a decade later Mikulas noticed that said inconsistency had been a mistake in the early x86-64 port, and shouldn't have existed in the first place. So commit 932aba1e1690 ("stat: fix inconsistency between struct stat and struct compat_stat") removed the uses of the helpers. But the helpers remained around, unused. Get rid of them. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-09-17Merge tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Three small SMB3 client fixes, one to improve a null check and two minor cleanups" * tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: fix some minor typos and repeated words smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP smb3: move server check earlier when setting channel sequence number
2023-09-17Merge tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Two ksmbd server fixes" * tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd: ksmbd: fix passing freed memory 'aux_payload_buf' ksmbd: remove unneeded mark_inode_dirty in set_info_sec()
2023-09-17Merge tag 'ext4_for_linus-6.6-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Regression and bug fixes for ext4" * tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix rec_len verify error ext4: do not let fstrim block system suspend ext4: move setting of trimmed bit into ext4_try_to_trim_range() jbd2: Fix memory leak in journal_init_common() jbd2: Remove page size assumptions buffer: Make bh_offset() work for compound pages
2023-09-17Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-David S. Miller
queue Tony Nguyen says: ==================== This series contains updates to iavf and i40e drivers. Radoslaw prevents admin queue operations being added when the driver is being removed for iavf. Petr Oros immediately starts reconfiguration on changes to VLANs on iavf. Ivan Vecera moves reset of VF to occur after port VLAN values are set on i40e. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17scsi: iscsi_tcp: restrict to TCP socketsEric Dumazet
Nothing prevents iscsi_sw_tcp_conn_bind() to receive file descriptor pointing to non TCP socket (af_unix for example). Return -EINVAL if this is attempted, instead of crashing the kernel. Fixes: 7ba247138907 ("[SCSI] open-iscsi/linux-iscsi-5 Initiator: Initiator code") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lee Duncan <lduncan@suse.com> Cc: Chris Leech <cleech@redhat.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: open-iscsi@googlegroups.com Cc: linux-scsi@vger.kernel.org Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17Merge branch 'vsock-tests'David S. Miller
Stefano Garzarella says: ==================== vsock/test: add recv_buf()/send_buf() utility functions and some improvements We recently found that some tests were failing [1]. The problem was that we were not waiting for all the bytes correctly, so we had a partial read. I had initially suggested using MSG_WAITALL, but this could have timeout problems. Since we already had send_byte() and recv_byte() that handled the timeout, but also the expected return value, I moved that code to two new functions that we can now use to send/receive generic buffers. The last commit is just an improvement to a test I found difficult to understand while using the new functions. @Arseniy a review and some testing are really appreciated :-) [1] https://lore.kernel.org/netdev/63xflnwiohdfo6m3vnrrxgv2ulplencpwug5qqacugqh7xxpu3@tsczkuqgwurb/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17vsock/test: track bytes in sk_buff merging test for SOCK_SEQPACKETStefano Garzarella
The test was a bit complicated to read. Added variables to keep track of the bytes read and to be read in each step. Also some comments. The test is unchanged. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17vsock/test: use send_buf() in vsock_test.cStefano Garzarella
We have a very common pattern used in vsock_test that we can now replace with the new send_buf(). This allows us to reuse the code we already had to check the actual return value and wait for all the bytes to be sent with an appropriate timeout. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17vsock/test: add send_buf() utility functionStefano Garzarella
Move the code of send_byte() out in a new utility function that can be used to send a generic buffer. This new function can be used when we need to send a custom buffer and not just a single 'A' byte. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17vsock/test: use recv_buf() in vsock_test.cStefano Garzarella
We have a very common pattern used in vsock_test that we can now replace with the new recv_buf(). This allows us to reuse the code we already had to check the actual return value and wait for all bytes to be received with an appropriate timeout. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17vsock/test: add recv_buf() utility functionStefano Garzarella
Move the code of recv_byte() out in a new utility function that can be used to receive a generic buffer. This new function can be used when we need to receive a custom buffer and not just a single 'A' byte. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17ipv4: fix null-deref in ipv4_link_failureKyle Zeng
Currently, we assume the skb is associated with a device before calling __ip_options_compile, which is not always the case if it is re-routed by ipvs. When skb->dev is NULL, dev_net(skb->dev) will become null-dereference. This patch adds a check for the edge case and switch to use the net_device from the rtable when skb->dev is NULL. Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Cc: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your *net-next* tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 79 files changed, 5275 insertions(+), 600 deletions(-). The main changes are: 1) Basic BTF validation in libbpf, from Andrii Nakryiko. 2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi. 3) next_thread cleanups, from Oleg Nesterov. 4) Add mcpu=v4 support to arm32, from Puranjay Mohan. 5) Add support for __percpu pointers in bpf progs, from Yonghong Song. 6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang. 7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman", Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle), Song Liu, Stanislav Fomichev, Yonghong Song ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17Merge branch 'phy-stopping-race'David S. Miller
Russell King says: ==================== net: phy: avoid race when erroring stopping PHY This series addresses a problem reported by Jijie Shao where the PHY state machine can race with phy_stop() leading to an incorrect state. The issue centres around phy_state_machine() dropping the phydev->lock mutex briefly, which allows phy_stop() to get in half-way through the state machine, and when the state machine resumes, it overwrites phydev->state with a value incompatible with a stopped PHY. This causes a subsequent phy_start() to issue a warning. We address this firstly by using versions of functions that do not take tne lock, moving them into the locked region. The only function that this can't be done with is phy_suspend() which needs to call into the driver without taking the lock. For phy_suspend(), we split the state machine into two parts - the initial part which runs under the phydev->lock, and the second part which runs without the lock. We finish off by using the split state machine in phy_stop() which removes another unnecessary unlock-lock sequence from phylib. Changes from RFC: - Added Jijie Shao's tested-by ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net: phy: convert phy_stop() to use split state machineRussell King (Oracle)
Convert phy_stop() to use the new locked-section and unlocked-section parts of the PHY state machine. Tested-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net: phy: split locked and unlocked section of phy_state_machine()Russell King (Oracle)
Split out the locked and unlocked sections of phy_state_machine() into two separate functions which can be called inside the phydev lock and outside the phydev lock as appropriate, thus allowing us to combine the locked regions in the caller of phy_state_machine() with the locked region inside phy_state_machine(). This avoids unnecessarily dropping the phydev lock which may allow races to occur. Tested-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net: phy: move phy_state_machine()Russell King (Oracle)
Move phy_state_machine() before phy_stop() to avoid subsequent patches introducing forward references. Tested-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net: phy: move phy_suspend() to end of phy_state_machine()Russell King (Oracle)
Move the call to phy_suspend() to the end of phy_state_machine() after we release the lock so that we can combine the locked areas. phy_suspend() can not be called while holding phydev->lock as it has caused deadlocks in the past. Tested-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net: phy: move call to start anegRussell King (Oracle)
Move the call to start auto-negotiation inside the lock in the PHYLIB state machine, calling the locked variant _phy_start_aneg(). This avoids unnecessarily releasing and re-acquiring the lock. Tested-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net: phy: call phy_error_precise() while holding the lockRussell King (Oracle)
Move the locking out of phy_error_precise() and to its only call site, merging with the locked region that has already been taken. Tested-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net: phy: always call phy_process_state_change() under lockRussell King (Oracle)
phy_stop() calls phy_process_state_change() while holding the phydev lock, so also arrange for phy_state_machine() to do the same, so that this function is called with consistent locking. Tested-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net: dsa: microchip: Add partial ACL support for ksz9477 switchesOleksij Rempel
This patch adds partial Access Control List (ACL) support for the ksz9477 family of switches. ACLs enable filtering of incoming layer 2 MAC, layer 3 IP, and layer 4 TCP/UDP packets on each port. They provide additional capabilities for filtering routed network protocols and can take precedence over other forwarding functions. ACLs can filter ingress traffic based on header fields such as source/destination MAC address, EtherType, IPv4 address, IPv4 protocol, UDP/TCP ports, and TCP flags. The ACL is an ordered list of up to 16 access control rules programmed into the ACL Table. Each entry specifies a set of matching conditions and action rules for controlling packet forwarding and priority. The ACL also implements a count function, generating an interrupt instead of a forwarding action. It can be used as a watchdog timer or an event counter. The ACL consists of three parts: matching rules, action rules, and processing entries. Multiple match conditions can be either AND'ed or OR'ed together. This patch introduces support for a subset of the available ACL functionality, specifically layer 2 matching and prioritization of matched packets. For example: tc qdisc add dev lan2 clsact tc filter add dev lan2 ingress protocol 0x88f7 flower action skbedit prio 7 tc qdisc add dev lan1 clsact tc filter add dev lan1 ingress protocol 0x88f7 flower action skbedit prio 7 The hardware offloading implementation was benchmarked against a configuration without hardware offloading. This latter setup relied on a software-based Linux bridge. No noticeable differences were observed between the two configurations. Here is an example of software-based test: ip l s dev enu1u1 up ip l s dev enu1u2 up ip l s dev enu1u4 up ethtool -A enu1u1 autoneg off rx off tx off ethtool -A enu1u2 autoneg off rx off tx off ethtool -A enu1u4 autoneg off rx off tx off ip l a name br0 type bridge ip l s dev br0 up ip l s enu1u1 master br0 ip l s enu1u2 master br0 ip l s enu1u4 master br0 tc qdisc add dev enu1u1 root handle 1: ets strict 4 priomap 3 3 2 2 1 1 0 0 tc qdisc add dev enu1u4 root handle 1: ets strict 4 priomap 3 3 2 2 1 1 0 0 tc qdisc add dev enu1u2 root handle 1: ets strict 4 priomap 3 3 2 2 1 1 0 0 tc qdisc add dev enu1u1 clsact tc filter add dev enu1u1 ingress protocol ipv4 flower action skbedit prio 7 tc qdisc add dev enu1u4 clsact tc filter add dev enu1u4 ingress protocol ipv4 flower action skbedit prio 0 On a system attached to the port enu1u2 I run two iperf3 server instances: iperf3 -s -p 5210 & iperf3 -s -p 5211 & On systems attached to enu1u4 and enu1u1 I run: iperf3 -u -c 172.17.0.1 -p 5210 -b100M -l1472 -t100 and iperf3 -u -c 172.17.0.1 -p 5211 -b100M -l1472 -t100 As a result, IP traffic on port enu1u1 will be prioritized and take precedence over IP traffic on port enu1u4 Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net: dsa: microchip: Move *_port_setup code to dsa_switch_ops::port_setup()Oleksij Rempel
Right now, the *_port_setup code is in dsa_switch_ops::port_enable(), which is not the best place for it. This patch moves it to a more suitable place, dsa_switch_ops::port_setup(), to match the function's purpose and name. This patch is a preparation for coming ACL support patch. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17Merge branch 'devlink-instances-relationships'David S. Miller
Jiri Pirko says: ==================== expose devlink instances relationships From: Jiri Pirko <jiri@nvidia.com> Currently, the user can instantiate new SF using "devlink port add" command. That creates an E-switch representor devlink port. When user activates this SF, there is an auxiliary device created and probed for it which leads to SF devlink instance creation. There is 1:1 relationship between E-switch representor devlink port and the SF auxiliary device devlink instance. Also, for example in mlx5, one devlink instance is created for PCI device and one is created for an auxiliary device that represents the uplink port. The relation between these is invisible to the user. Patches #1-#3 and #5 are small preparations. Patch #4 adds netnsid attribute for nested devlink if that in a different namespace. Patch #5 is the main one in this set, introduces the relationship tracking infrastructure later on used to track SFs, linecards and devlink instance relationships with nested devlink instances. Expose the relation to the user by introducing new netlink attribute DEVLINK_PORT_FN_ATTR_DEVLINK which contains the devlink instance related to devlink port function. This is done by patch #8. Patch #9 implements this in mlx5 driver. Patch #10 converts the linecard nested devlink handling to the newly introduced rel infrastructure. Patch #11 benefits from the rel infra and introduces possiblitily to have relation between devlink instances. Patch #12 implements this in mlx5 driver. Examples: $ devlink dev pci/0000:08:00.0: nested_devlink auxiliary/mlx5_core.eth.0 pci/0000:08:00.1: nested_devlink auxiliary/mlx5_core.eth.1 auxiliary/mlx5_core.eth.1 auxiliary/mlx5_core.eth.0 $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 106 pci/0000:08:00.0/32768: type eth netdev eth4 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable $ devlink port function set pci/0000:08:00.0/32768 state active $ devlink port show pci/0000:08:00.0/32768 pci/0000:08:00.0/32768: type eth netdev eth4 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false function: hw_addr 00:00:00:00:00:00 state active opstate attached roce enable nested_devlink auxiliary/mlx5_core.sf.2 $ devlink port show pci/0000:08:00.0/32768 pci/0000:08:00.0/32768: type eth netdev eth4 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false function: hw_addr 00:00:00:00:00:00 state active opstate attached roce enable nested_devlink auxiliary/mlx5_core.sf.2 nested_devlink_netns ns1 ==================== Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net/mlx5e: Set en auxiliary devlink instance as nestedJiri Pirko
Benefit from the previous commit introducing exposure of devlink instances relationship and set the nested instance for en auxiliary device. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: introduce possibility to expose info about nested devlinksJiri Pirko
In mlx5, there is a devlink instance created for PCI device. Also, one separate devlink instance is created for auxiliary device that represents the netdev of uplink port. This relation is currently invisible to the devlink user. Benefit from the rel infrastructure and allow for nested devlink instance to set the relationship for the nested-in devlink instance. Note that there may be many nested instances, therefore use xarray to hold the list of rel_indexes for individual nested instances. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: convert linecard nested devlink to new rel infrastructureJiri Pirko
Benefit from the newly introduced rel infrastructure, treat the linecard nested devlink instances in the same way as port function instances. Convert the code to use the rel infrastructure. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net/mlx5: SF, Implement peer devlink set for SF representor devlink portJiri Pirko
Benefit from the existence of internal mlx5 notifier and extend it by event MLX5_DRIVER_EVENT_SF_PEER_DEVLINK. Use this event from SF auxiliary device probe/remove functions to pass the registered SF devlink instance to the SF representor. Process the new event in SF representor code and call devl_port_fn_devlink_set() to do the assignments. Implement this in work to avoid possible deadlock when probe/remove function of SF may be called with devlink instance lock held during devlink reload. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: expose peer SF devlink instanceJiri Pirko
Introduce a new helper devl_port_fn_devlink_set() to be used by driver assigning a devlink instance to the peer devlink port function. Expose this to user over new netlink attribute nested under port function nest to expose devlink handle related to the port function. This is particularly helpful for user to understand the relationship between devlink instances created for SFs and the port functions they belong to. Note that caller of devlink_port_notify() needs to hold devlink instance lock, put the assertion to devl_port_fn_devlink_set() to make this requirement explicit. Also note the limitations that only allow to make this assignment for registered objects. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: introduce object and nested devlink relationship infraJiri Pirko
It is a bit tricky to maintain relationship between devlink objects and nested devlink instances due to following aspects: 1) Locking. It is necessary to lock the devlink instance that contains the object first, only after that to lock the nested instance. 2) Lifetimes. Objects (e.g devlink port) may be removed before the nested devlink instance. 3) Notifications. If nested instance changes (e.g. gets registered/unregistered) the nested-in object needs to send appropriate notifications. Resolve this by introducing an xarray that holds 1:1 relationships between devlink object and related nested devlink instance. Use that xarray index to get the object/nested devlink instance on the other side. Provide necessary helpers: devlink_rel_nested_in_add/clear() to add and clear the relationship. devlink_rel_nested_in_notify() to call the nested-in object to send notifications during nested instance register/unregister/netns change. devlink_rel_devlink_handle_put() to be used by nested-in object fill function to fill the nested handle. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: extend devlink_nl_put_nested_handle() with attrtype argJiri Pirko
As the next patch is going to call this helper with need to fill another type of nested attribute, pass it over function arg. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: move devlink_nl_put_nested_handle() into netlink.cJiri Pirko
As the next patch is going to call this helper out of the linecard.c, move to netlink.c. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: put netnsid to nested handleJiri Pirko
If netns of devlink instance and nested devlink instance differs, put netnsid attr to indicate that. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net/mlx5: Lift reload limitation when SFs are presentJiri Pirko
Historically, the shared devlink_mutex prevented devlink instances from being registered/unregistered during another devlink instance reload operation. However, devlink_muxex is gone for some time now, this limitation is no longer needed. Lift it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17net/mlx5: Disable eswitch as the first thing in mlx5_unload()Jiri Pirko
The eswitch disable call does removal of all representors. Do that before clearing the SF device table and maintain the same flow as during SF devlink port removal, where the representor is removed before the actual SF is removed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17devlink: move linecard struct into linecard.cJiri Pirko
Instead of exposing linecard struct, expose a simple helper to get the linecard index, which is all is needed outside linecard.c. Move the linecard struct to linecard.c and keep it private similar to the rest of the devlink objects. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>