summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-12bpf: Make bpf_cgroup_acquire() KF_RCU | KF_RET_NULLDavid Vernet
struct cgroup is already an RCU-safe type in the verifier. We can therefore update bpf_cgroup_acquire() to be KF_RCU | KF_RET_NULL, and subsequently remove bpf_cgroup_kptr_get(). This patch does the first of these by updating bpf_cgroup_acquire() to be KF_RCU | KF_RET_NULL, and also updates selftests accordingly. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230411041633.179404-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-12cgroup/cpuset: Make cpuset_attach_task() skip subpartitions CPUs for top_cpusetWaiman Long
It is found that attaching a task to the top_cpuset does not currently ignore CPUs allocated to subpartitions in cpuset_attach_task(). So the code is changed to fix that. Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-04-12cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methodsWaiman Long
In the case of CLONE_INTO_CGROUP, not all cpusets are ready to accept new tasks. It is too late to check that in cpuset_fork(). So we need to add the cpuset_can_fork() and cpuset_cancel_fork() methods to pre-check it before we can allow attachment to a different cpuset. We also need to set the attach_in_progress flag to alert other code that a new task is going to be added to the cpuset. Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups") Suggested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Waiman Long <longman@redhat.com> Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Tejun Heo <tj@kernel.org>
2023-04-12cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properlyWaiman Long
By default, the clone(2) syscall spawn a child process into the same cgroup as its parent. With the use of the CLONE_INTO_CGROUP flag introduced by commit ef2c41cf38a7 ("clone3: allow spawning processes into cgroups"), the child will be spawned into a different cgroup which is somewhat similar to writing the child's tid into "cgroup.threads". The current cpuset_fork() method does not properly handle the CLONE_INTO_CGROUP case where the cpuset of the child may be different from that of its parent. Update the cpuset_fork() method to treat the CLONE_INTO_CGROUP case similar to cpuset_attach(). Since the newly cloned task has not been running yet, its actual memory usage isn't known. So it is not necessary to make change to mm in cpuset_fork(). Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups") Reported-by: Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: Waiman Long <longman@redhat.com> Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Tejun Heo <tj@kernel.org>
2023-04-12cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()Waiman Long
After a successful cpuset_can_attach() call which increments the attach_in_progress flag, either cpuset_cancel_attach() or cpuset_attach() will be called later. In cpuset_attach(), tasks in cpuset_attach_wq, if present, will be woken up at the end. That is not the case in cpuset_cancel_attach(). So missed wakeup is possible if the attach operation is somehow cancelled. Fix that by doing the wakeup in cpuset_cancel_attach() as well. Fixes: e44193d39e8d ("cpuset: let hotplug propagation work wait for task attaching") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Tejun Heo <tj@kernel.org>
2023-04-12cgroup,freezer: hold cpu_hotplug_lock before freezer_mutexTetsuo Handa
syzbot is reporting circular locking dependency between cpu_hotplug_lock and freezer_mutex, for commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic") replaced atomic_inc() in freezer_apply_state() with static_branch_inc() which holds cpu_hotplug_lock. cpu_hotplug_lock => cgroup_threadgroup_rwsem => freezer_mutex cgroup_file_write() { cgroup_procs_write() { __cgroup_procs_write() { cgroup_procs_write_start() { cgroup_attach_lock() { cpus_read_lock() { percpu_down_read(&cpu_hotplug_lock); } percpu_down_write(&cgroup_threadgroup_rwsem); } } cgroup_attach_task() { cgroup_migrate() { cgroup_migrate_execute() { freezer_attach() { mutex_lock(&freezer_mutex); (...snipped...) } } } } (...snipped...) } } } freezer_mutex => cpu_hotplug_lock cgroup_file_write() { freezer_write() { freezer_change_state() { mutex_lock(&freezer_mutex); freezer_apply_state() { static_branch_inc(&freezer_active) { static_key_slow_inc() { cpus_read_lock(); static_key_slow_inc_cpuslocked(); cpus_read_unlock(); } } } mutex_unlock(&freezer_mutex); } } } Swap locking order by moving cpus_read_lock() in freezer_apply_state() to before mutex_lock(&freezer_mutex) in freezer_change_state(). Reported-by: syzbot <syzbot+c39682e86c9d84152f93@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=c39682e86c9d84152f93 Suggested-by: Hillf Danton <hdanton@sina.com> Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-04-12bpf: Handle NULL in bpf_local_storage_free.Alexei Starovoitov
During OOM bpf_local_storage_alloc() may fail to allocate 'storage' and call to bpf_local_storage_free() with NULL pointer will cause a crash like: [ 271718.917646] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 271719.019620] RIP: 0010:call_rcu+0x2d/0x240 [ 271719.216274] bpf_local_storage_alloc+0x19e/0x1e0 [ 271719.250121] bpf_local_storage_update+0x33b/0x740 Fixes: 7e30a8477b0b ("bpf: Add bpf_local_storage_free()") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230412171252.15635-1-alexei.starovoitov@gmail.com
2023-04-12netfs: Fix netfs_extract_iter_to_sg() for ITER_UBUF/IOVECDavid Howells
Fix netfs_extract_iter_to_sg() for ITER_UBUF and ITER_IOVEC to set the size of the page to the part of the page extracted, not the remaining amount of data in the extracted page array at that point. This doesn't yet affect anything as cifs, the only current user, only passes in non-user-backed iterators. Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a scatterlist") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: Steve French <sfrench@samba.org> Cc: Shyam Prasad N <nspmangalore@gmail.com> Cc: Rohith Surabattula <rohiths.msft@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-12sched/fair: Fix imbalance overflowVincent Guittot
When local group is fully busy but its average load is above system load, computing the imbalance will overflow and local group is not the best target for pulling this load. Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()") Reported-by: Tingjia Cao <tjcao980311@gmail.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Tingjia Cao <tjcao980311@gmail.com> Link: https://lore.kernel.org/lkml/CABcWv9_DAhVBOq2=W=2ypKE9dKM5s2DvoV8-U0+GDwwuKZ89jQ@mail.gmail.com/T/
2023-04-12Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixesMaarten Lankhorst
We were stuck on rc2, should at least attempt to track drm-fixes slightly. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2023-04-12net: ti/cpsw: Add explicit platform_device.h and of_platform.h includesRob Herring
TI CPSW uses of_platform_* functions which are declared in of_platform.h. of_platform.h gets implicitly included by of_device.h, but that is going to be removed soon. Nothing else depends on of_device.h so it can be dropped. of_platform.h also implicitly includes platform_device.h, so add an explicit include for it, too. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12net: wwan: iosm: Fix error handling path in ipc_pcie_probe()Harshit Mogalapalli
Smatch reports: drivers/net/wwan/iosm/iosm_ipc_pcie.c:298 ipc_pcie_probe() warn: missing unwind goto? When dma_set_mask fails it directly returns without disabling pci device and freeing ipc_pcie. Fix this my calling a correct goto label As dma_set_mask returns either 0 or -EIO, we can use a goto label, as it finally returns -EIO. Add a set_mask_fail goto label which stands consistent with other goto labels in this function.. Fixes: 035e3befc191 ("net: wwan: iosm: fix driver not working with INTEL_IOMMU disabled") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12smc: Fix use-after-free in tcp_write_timer_handler().Kuniyuki Iwashima
With Eric's ref tracker, syzbot finally found a repro for use-after-free in tcp_write_timer_handler() by kernel TCP sockets. [0] If SMC creates a kernel socket in __smc_create(), the kernel socket is supposed to be freed in smc_clcsock_release() by calling sock_release() when we close() the parent SMC socket. However, at the end of smc_clcsock_release(), the kernel socket's sk_state might not be TCP_CLOSE. This means that we have not called inet_csk_destroy_sock() in __tcp_close() and have not stopped the TCP timers. The kernel socket's TCP timers can be fired later, so we need to hold a refcnt for net as we do for MPTCP subflows in mptcp_subflow_create_socket(). [0]: leaked reference. sk_alloc (./include/net/net_namespace.h:335 net/core/sock.c:2108) inet_create (net/ipv4/af_inet.c:319 net/ipv4/af_inet.c:244) __sock_create (net/socket.c:1546) smc_create (net/smc/af_smc.c:3269 net/smc/af_smc.c:3284) __sock_create (net/socket.c:1546) __sys_socket (net/socket.c:1634 net/socket.c:1618 net/socket.c:1661) __x64_sys_socket (net/socket.c:1672) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) ================================================================== BUG: KASAN: slab-use-after-free in tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594) Read of size 1 at addr ffff888052b65e0d by task syzrepro/18091 CPU: 0 PID: 18091 Comm: syzrepro Tainted: G W 6.3.0-rc4-01174-gb5d54eb5899a #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014 Call Trace: <IRQ> dump_stack_lvl (lib/dump_stack.c:107) print_report (mm/kasan/report.c:320 mm/kasan/report.c:430) kasan_report (mm/kasan/report.c:538) tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594) tcp_write_timer (./include/linux/spinlock.h:390 net/ipv4/tcp_timer.c:643) call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701) __run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2022) run_timer_softirq (kernel/time/timer.c:2037) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572) __irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650) irq_exit_rcu (kernel/softirq.c:664) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1107 (discriminator 14)) </IRQ> Fixes: ac7138746e14 ("smc: establish new socket family") Reported-by: syzbot+7e1e1bdb852961150198@syzkaller.appspotmail.com Link: https://lore.kernel.org/netdev/000000000000a3f51805f8bcc43a@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12ksz884x: Remove unused functionsSimon Horman
Remove unused functions. These functions may have some value in documenting the hardware. But that information may be accessed via SCM history. Flagged by clang-16 with W=1. No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12ionic: Don't overwrite the cyclecounter bitmaskBrett Creeley
The driver was incorrectly overwriting the cyclecounter bitmask, which was truncating it and not aligning to the hardware mask value. This isn't causing any issues, but it's wrong. Fix this by not constraining the cyclecounter/hardware mask. Luckily, this seems to cause no issues, which is why this change doesn't have a fixes tag and isn't being sent to net. However, if any transformations from time->cycles are needed in the future, this change will be needed. Suggested-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12Merge branch 'rk3588-error-prints'David S. Miller
Sebastian Reichel says: ==================== stmmac: Fix RK3588 error prints This fixes a couple of false positive error messages printed by stmmac on RK3588. I expect them to go via net-next since the fixes are not critical. Changes since PATCHv1: * https://lore.kernel.org/all/20230317174243.61500-1-sebastian.reichel@collabora.com/ * Add Fixes tags * Use loop to request clocks ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12net: ethernet: stmmac: dwmac-rk: fix optional phy regulator handlingSebastian Reichel
The usual devm_regulator_get() call already handles "optional" regulators by returning a valid dummy and printing a warning that the dummy regulator should be described properly. This code open coded the same behaviour, but masked any errors that are not -EPROBE_DEFER and is quite noisy. This change effectively unmasks and propagates regulators errors not involving -ENODEV, downgrades the error print to warning level if no regulator is specified and captures the probe defer message for /sys/kernel/debug/devices_deferred. Fixes: 2e12f536635f ("net: stmmac: dwmac-rk: Use standard devicetree property for phy regulator") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12net: ethernet: stmmac: dwmac-rk: rework optional clock handlingSebastian Reichel
The clock requesting code is quite repetitive. Fix this by requesting the clocks via devm_clk_bulk_get_optional. The optional variant has been used, since this is effectively what the old code did. The exact clocks required depend on the platform and configuration. As a side effect this change adds correct -EPROBE_DEFER handling. Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Andrew Lunn <andrew@lunn.ch> Fixes: 7ad269ea1a2b ("GMAC: add driver for Rockchip RK3288 SoCs integrated GMAC") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12Merge branch 'dsa-trace-events'David S. Miller
Vladimir Oltean says: ==================== DSA trace events This series introduces the "dsa" trace event class, with the following events: $ trace-cmd list | grep dsa dsa dsa:dsa_fdb_add_hw dsa:dsa_mdb_add_hw dsa:dsa_fdb_del_hw dsa:dsa_mdb_del_hw dsa:dsa_fdb_add_bump dsa:dsa_mdb_add_bump dsa:dsa_fdb_del_drop dsa:dsa_mdb_del_drop dsa:dsa_fdb_del_not_found dsa:dsa_mdb_del_not_found dsa:dsa_lag_fdb_add_hw dsa:dsa_lag_fdb_add_bump dsa:dsa_lag_fdb_del_hw dsa:dsa_lag_fdb_del_drop dsa:dsa_lag_fdb_del_not_found dsa:dsa_vlan_add_hw dsa:dsa_vlan_del_hw dsa:dsa_vlan_add_bump dsa:dsa_vlan_del_drop dsa:dsa_vlan_del_not_found These are useful to debug refcounting issues on CPU and DSA ports, where entries may remain lingering, or may be removed too soon, depending on bugs in higher layers of the network stack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12net: dsa: add trace points for VLAN operationsVladimir Oltean
These are not as critical as the FDB/MDB trace points (I'm not aware of outstanding VLAN related bugs), but maybe they are useful to somebody, either debugging something or simply trying to learn more. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12net: dsa: add trace points for FDB/MDB operationsVladimir Oltean
DSA performs non-trivial housekeeping of unicast and multicast addresses on shared (CPU and DSA) ports, and puts a bit of pressure on higher layers, requiring them to behave correctly (remove these addresses exactly as many times as they were added). Otherwise, either addresses linger around forever, or DSA returns -ENOENT complaining that entries that were already deleted must be deleted again. To aid debugging, introduce some trace points specifically for FDB and MDB - that's where some of the bugs still are right now. Some bugs I have seen were also due to race conditions, see: - 630fd4822af2 ("net: dsa: flush switchdev workqueue on bridge join error path") - a2614140dc0f ("net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN") so it would be good to not disturb the timing too much, hence the choice to use trace points vs regular dev_dbg(). I've had these for some time on my computer in a less polished form, and they've proven useful. What I found most useful was to enable CONFIG_BOOTTIME_TRACING, add "trace_event=dsa" to the kernel cmdline, and run "cat /sys/kernel/debug/tracing/trace". This is to debug more complex environments with network managers started by the init system, things like that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12qlcnic: check pci_reset_function resultDenis Plotnikov
Static code analyzer complains to unchecked return value. The result of pci_reset_function() is unchecked. Despite, the issue is on the FLR supported code path and in that case reset can be done with pcie_flr(), the patch uses less invasive approach by adding the result check of pci_reset_function(). Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 7e2cf4feba05 ("qlcnic: change driver hardware interface mechanism") Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-11Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== iavf: fix racing in VLANs Ahmed Zaki says: This patchset mainly fixes a racing issue in the iavf where the number of VLANs in the vlan_filter_list might be more than the PF limit. To fix that, we get rid of the cvlans and svlans bitmaps and keep all the required info in the list. The second patch adds two new states that are needed so that we keep the VLAN info while the interface goes DOWN: -- DISABLE (notify PF, but keep the filter in the list) -- INACTIVE (dev is DOWN, filter is removed from PF) Finally, the current code keeps each state in a separate bit field, which is error prone. The first patch refactors that by replacing all bits with a single enum. The changes are minimal where each bit change is replaced with the new state value. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: remove active_cvlans and active_svlans bitmaps iavf: refactor VLAN filter states ==================== Link: https://lore.kernel.org/r/20230407210730.3046149-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-11Merge tag 'for-net-2023-04-10' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix not setting Dath Path for broadcast sink - Fix not cleaning up on LE Connection failure - SCO: Fix possible circular locking dependency - L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp} - Fix race condition in hidp_session_thread - btbcm: Fix logic error in forming the board name - btbcm: Fix use after free in btsdio_remove * tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp} Bluetooth: Set ISO Data Path on broadcast sink Bluetooth: hci_conn: Fix possible UAF Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm bluetooth: btbcm: Fix logic error in forming the board name. Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition Bluetooth: Fix race condition in hidp_session_thread Bluetooth: Fix printing errors if LE Connection times out Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure ==================== Link: https://lore.kernel.org/r/20230410172718.4067798-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-11net: dsa: mv88e6xxx: Correct cmode to PHY_INTERFACE_Andrew Lunn
The switch can either take the MAC or the PHY role in an MII or RMII link. There are distinct PHY_INTERFACE_ macros for these two roles. Correct the mapping so that the `REV` version is used for the PHY role. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230411023541.2372609-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-11net/mlx5: DR, Add modify-header-pattern ICM poolYevgeny Kliteynik
There is a new ICM area for that memory, so we need to handle it as we did for the others ICM types. The patch added that specific pool with its requirements and management. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: DR, Prepare sending new WQE typeYevgeny Kliteynik
The send engine should be ready to handle more opcodes in addition to RDMA_WRITE/RDMA_READ. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Add new WQE for updating flow tableYevgeny Kliteynik
Add new WQE type: FLOW_TBL_ACCESS, which will be used for writing modify header arguments. This type has specific control segment and special data segment. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Add mlx5_ifc bits for modify header argumentYevgeny Kliteynik
Add enum value for modify-header argument object and mlx5_bits for the related capabilities. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: DR, Set counter ID on the last STE for STEv1 TXYevgeny Kliteynik
In STEv1 counter action can be set either by filling counter ID on STE, in which case it is executed before other actions on this STE, or as a single action, in which case it is executed in accordance with the actions order. FW steering on STEv1 devices implements counter as counter ID on STE, and this counter is set on the last STE. Fix SMFS to be consistent with this behaviour - move TX counter to the last STE, this way the counter will include all actions of the previous STEs that might have changed packet headers length, e.g. encap, vlan push, etc. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Create a new profile for SFsParav Pandit
Create a new profile for SFs in order to disable the command cache. Each function command cache consumes ~500KB of memory, when using a large number of SFs this savings is notable on memory constarined systems. Use a new profile to provide for future differences between SFs and PFs. The mr_cache not used for non-PF functions, so it is excluded from the new profile. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Bridge, add tracepoints for multicastVlad Buslov
Pass target struct net_device to mdb attach/detach handler in order to expose the port name to the new tracepoints. Implemented following tracepoints: - Attach mdb to port. - Detach mdb from port. Usage example: ># cd /sys/kernel/debug/tracing ># echo mlx5:mlx5_esw_bridge_port_mdb_attach >> set_event ># cat trace ... kworker/0:0-19071 [000] ..... 259004.253848: mlx5_esw_bridge_port_mdb_attach: net_device=enp8s0f0_0 addr=33:33:ff:00:00:01 vid=0 num_ports=1 offloaded=1 Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Bridge, implement mdb offloadVlad Buslov
Implement support for add/del SWITCHDEV_OBJ_ID_PORT_MDB events. For mdb destination addresses configure egress table rules to replicate to per-port multicast tables of all ports that are member of the multicast group as illustrated by 'MDB1' rule in the following diagram: +--------+--+ +---------------------------------------> Port 1 | | | +-^------+--+ | | | | +-----------------------------------------+ | +---------------------------+ | | EGRESS table | | +--> PORT 1 multicast table | | +----------------------------------+ +-----------------------------------------+ | | +---------------------------+ | | INGRESS table | | | | | | | | +----------------------------------+ | dst_mac=P1,vlan=X -> pop vlan, goto P1 +--+ | | FG0: | | | | | dst_mac=P1,vlan=Y -> pop vlan, goto P1 | | | src_port=dst_port -> drop | | | src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2 +--+ | | FG1: | | | ... | | dst_mac=P2,vlan=Y -> goto P2 | | | | VLAN X -> pop, goto port | | | | | dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+ | ... | | +----------------------------------+ | | | | | VLAN Y -> pop, goto port +-------+ +-----------------------------------------+ | | | FG3: | | | | matchall -> goto port | | | | | | | +---------------------------+ | | | | | | +--------+--+ +---------------------------------------> Port 2 | | | +-^------+--+ | | | | | +---------------------------+ | +--> PORT 2 multicast table | | +---------------------------+ | | | | | FG0: | | | src_port=dst_port -> drop | | | FG1: | | | VLAN X -> pop, goto port | | | ... | | | | | | FG3: | | | matchall -> goto port +-------+ | | +---------------------------+ MDB is managed by extending mlx5 bridge to store an entry in mlx5_esw_bridge->mdb_list linked list (used to iterate over all offloaded MDBs) and mlx5_esw_bridge->mdb_ht hash table (used to lookup existing MDB by MAC+VLAN). Every MDB entry can be attached to arbitrary amount of bridge ports that are stored in mlx5_esw_bridge_mdb_entry->ports xarray in order to allow both efficient lookup of the port and also iteration over all ports that the entry is attached to. Every time MDB is attached/detached to/from a port, the hardware rule is recreated with list of destinations corresponding to all attached ports. When the entry is detached from the last port it is removed from mdb and destroyed which means that the ports xarray also acts as implicit reference counting mechanism. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Bridge, support multicast VLAN popVlad Buslov
When VLAN with 'untagged' flag is created on port also provision the per-port multicast table rule to pop the VLAN during packet replication. This functionality must be in per-port table because some subset of ports that are member of multicast group can require just a match on VLAN (trunk mode) while other subset can be configured to remove the VLAN tag from packets received on the ports (access mode). Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Bridge, add per-port multicast replication tablesVlad Buslov
Multicast replication requires adding one more level of FDB_BR_OFFLOAD priority flow tables. The new level is used for per-port multicast-specific tables that have following flow groups structure (flow highest to lowest priority): - Flow group of size one that matches on source port metadata. This will have a static single rule that prevent packets from being replicated to their source port. - Flow group of size one that matches all packets and forwards them to the port that owns the table. Initialize the table dynamically on all bridge ports when adding a port to the bridge that has multicast enabled and on all existing bridge ports when receiving multicast enable notification. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Bridge, snoop igmp/mld packetsVlad Buslov
Handle SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED attribute notification to dynamically toggle bridge multicast offload. Set new MLX5_ESW_BRIDGE_MCAST_FLAG bridge flag when multicast offload is enabled. Put multicast-specific code into new bridge_mcast.c file. When initializing bridge multicast pipeline create a static rule for snooping on IGMP traffic and three rules for snooping on MLD traffic (for query, report and done message types). Note that matching MLD traffic requires having flexparser MLX5_FLEX_PROTO_ICMPV6 capability enabled. By default Linux bridge is created with multicast enabled which can be modified by 'mcast_snooping' argument: $ ip link set name my_bridge type bridge mcast_snooping 0 Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Bridge, extract code to lookup parent bridge of portVlad Buslov
The pattern when function looks up a port by vport_num+vhca_id tuple in order to just obtain its parent bridge is repeated multiple times in bridge.c file. Further commits in this series use the pattern even more. Extract the pattern to standalone mlx5_esw_bridge_from_port_lookup() function to improve code readability. This commits doesn't change functionality. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Bridge, move additional data structures to priv headerVlad Buslov
Following patches in series will require accessing flow tables and groups sizes, table levels and struct mlx5_esw_bridge from new the new source file dedicated to multicast code. Expose these data in bridge_priv.h to reduce clutter in following patches that will implement the actual functionality. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Bridge, increase bridge tables sizesVlad Buslov
Bridge ingress and egress tables got more flow groups recently for QinQ support and will get more in following patches of this series. Increase the sizes of the tables to allow offloading more flows in each mode. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11net/mlx5: Add mlx5_ifc definitions for bridge multicast supportVlad Buslov
Add the required hardware definitions to mlx5_ifc: fdb_uplink_hairpin, fdb_multi_path_any_table_limit_regc, fdb_multi_path_any_table. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11scsi: ses: Handle enclosure with just a primary component gracefullyJiri Kosina
This reverts commit 3fe97ff3d949 ("scsi: ses: Don't attach if enclosure has no components") and introduces proper handling of case where there are no detected secondary components, but primary component (enumerated in num_enclosures) does exist. That fix was originally proposed by Ding Hui <dinghui@sangfor.com.cn>. Completely ignoring devices that have one primary enclosure and no secondary one results in ses_intf_add() bailing completely scsi 2:0:0:254: enclosure has no enumerated components scsi 2:0:0:254: Failed to bind enclosure -12ven in valid configurations such even on valid configurations with 1 primary and 0 secondary enclosures as below: # sg_ses /dev/sg0 3PARdata SES 3321 Supported diagnostic pages: Supported Diagnostic Pages [sdp] [0x0] Configuration (SES) [cf] [0x1] Short Enclosure Status (SES) [ses] [0x8] # sg_ses -p cf /dev/sg0 3PARdata SES 3321 Configuration diagnostic page: number of secondary subenclosures: 0 generation code: 0x0 enclosure descriptor list Subenclosure identifier: 0 [primary] relative ES process id: 0, number of ES processes: 1 number of type descriptor headers: 1 enclosure logical identifier (hex): 20000002ac02068d enclosure vendor: 3PARdata product: VV rev: 3321 type descriptor header and text list Element type: Unspecified, subenclosure id: 0 number of possible elements: 1 The changelog for the original fix follows ===== We can get a crash when disconnecting the iSCSI session, the call trace like this: [ffff00002a00fb70] kfree at ffff00000830e224 [ffff00002a00fba0] ses_intf_remove at ffff000001f200e4 [ffff00002a00fbd0] device_del at ffff0000086b6a98 [ffff00002a00fc50] device_unregister at ffff0000086b6d58 [ffff00002a00fc70] __scsi_remove_device at ffff00000870608c [ffff00002a00fca0] scsi_remove_device at ffff000008706134 [ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4 [ffff00002a00fd10] scsi_remove_target at ffff0000087064c0 [ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4 [ffff00002a00fdb0] process_one_work at ffff00000810f35c [ffff00002a00fe00] worker_thread at ffff00000810f648 [ffff00002a00fe70] kthread at ffff000008116e98 In ses_intf_add, components count could be 0, and kcalloc 0 size scomp, but not saved in edev->component[i].scratch In this situation, edev->component[0].scratch is an invalid pointer, when kfree it in ses_intf_remove_enclosure, a crash like above would happen The call trace also could be other random cases when kfree cannot catch the invalid pointer We should not use edev->component[] array when the components count is 0 We also need check index when use edev->component[] array in ses_enclosure_data_process ===== Reported-by: Michal Kolar <mich.k@seznam.cz> Originally-by: Ding Hui <dinghui@sangfor.com.cn> Cc: stable@vger.kernel.org Fixes: 3fe97ff3d949 ("scsi: ses: Don't attach if enclosure has no components") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2304042122270.29760@cbobk.fhfr.pm Tested-by: Michal Kolar <mich.k@seznam.cz> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11Revert "pinctrl: amd: Disable and mask interrupts on resume"Kornel Dulęba
This reverts commit b26cd9325be4c1fcd331b77f10acb627c560d4d7. This patch introduces a regression on Lenovo Z13, which can't wake from the lid with it applied; and some unspecified AMD based Dell platforms are unable to wake from hitting the power button Signed-off-by: Kornel Dulęba <korneld@chromium.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20230411134932.292287-1-korneld@chromium.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-04-11riscv: add icache flush for nommu sigreturn trampolineMathis Salmen
In a NOMMU kernel, sigreturn trampolines are generated on the user stack by setup_rt_frame. Currently, these trampolines are not instruction fenced, thus their visibility to ifetch is not guaranteed. This patch adds a flush_icache_range in setup_rt_frame to fix this problem. Signed-off-by: Mathis Salmen <mathis.salmen@matsal.de> Fixes: 6bd33e1ece52 ("riscv: add nommu support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230406101130.82304-1-mathis.salmen@matsal.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-11treewide: Fix probing of devices in DT overlaysGeert Uytterhoeven
When loading a DT overlay that creates a device, the device is not probed, unless the DT overlay is unloaded and reloaded again. After the recent refactoring to improve fw_devlink, it no longer depends on the "compatible" property to identify which device tree nodes will become struct devices. fw_devlink now picks up dangling consumers (consumers pointing to descendent device tree nodes of a device that aren't converted to child devices) when a device is successfully bound to a driver. See __fw_devlink_pickup_dangling_consumers(). However, during DT overlay, a device's device tree node can have sub-nodes added/removed without unbinding/rebinding the driver. This difference in behavior between the normal device instantiation and probing flow vs. the DT overlay flow has a bunch of implications that are pointed out elsewhere[1]. One of them is that the fw_devlink logic to pick up dangling consumers is never exercised. This patch solves the fw_devlink issue by marking all DT nodes added by DT overlays with FWNODE_FLAG_NOT_DEVICE (fwnode that won't become device), and by clearing the flag when a struct device is actually created for the DT node. This way, fw_devlink knows not to have consumers waiting on these newly added DT nodes, and to propagate the dependency to an ancestor DT node that has the corresponding struct device. Based on a patch by Saravana Kannan, which covered only platform and spi devices. [1] https://lore.kernel.org/r/CAGETcx_bkuFaLCiPrAWCPQz+w79ccDp6=9e881qmK=vx3hBMyg@mail.gmail.com Fixes: 4a032827daa89350 ("of: property: Simplify of_link_to_phandle()") Link: https://lore.kernel.org/r/CAGETcx_+rhHvaC_HJXGrr5_WAd2+k5f=rWYnkCZ6z5bGX-wj4w@mail.gmail.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C Acked-by: Shawn Guo <shawnguo@kernel.org> Acked-by: Saravana Kannan <saravanak@google.com> Tested-by: Ivan Bornyakov <i.bornyakov@metrotek.ru> Link: https://lore.kernel.org/r/e1fa546682ea4c8474ff997ab6244c5e11b6f8bc.1680182615.git.geert+renesas@glider.be Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-11dt-bindings: interrupt-controller: loongarch: Fix mismatched compatibleLiu Peibao
The "compatible" doesn't match what the kernel is using. Fix it as kernel using. Fixes: 6b2748ada244 ("dt-bindings: interrupt-controller: add yaml for LoongArch CPU interrupt controller") Reported-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/all/20221208020954.GA3368836-robh@kernel.org/ Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Liu Peibao <liupeibao@loongson.cn> Link: https://lore.kernel.org/r/20230401091304.12633-1-liupeibao@loongson.cn [robh: Rename file to match compatible, fix subject typo] Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-11Merge tag 'pci-v6.3-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Provide pci_msix_can_alloc_dyn() stub when CONFIG_PCI_MSI unset to avoid build errors (Reinette Chatre) - Quirk AMD XHCI controller that loses MSI-X state in D3hot to avoid broken USB after hotplug or suspend/resume (Basavaraj Natikar) - Fix use-after-free in pci_bus_release_domain_nr() (Rob Herring) * tag 'pci-v6.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: Fix use-after-free in pci_bus_release_domain_nr() x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()
2023-04-11ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CBAPaul Menzel
Like the ASUS ExpertBook B2502CBA and various ASUS Vivobook laptops, the ASUS ExpertBook B1502CBA has an ACPI DSDT table that describes IRQ 1 as ActiveLow while the kernel overrides it to Edge_High. $ sudo dmesg | grep DMI DMI: ASUSTeK COMPUTER INC. ASUS EXPERTBOOK B1502CBA_B1502CBA/B1502CBA, BIOS B1502CBA.300 01/18/2023 $ grep -A 40 PS2K dsdt.dsl | grep IRQ -A 1 IRQ (Level, ActiveLow, Exclusive, ) {1} This prevents the keyboard from working. To fix this issue, add this laptop to the skip_override_table so that the kernel does not override IRQ 1. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217323 Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-11amd-pstate: Fix amd_pstate mode switchWyes Karny
amd_pstate mode can be changed by writing the mode name to the `status` sysfs. But some combinations are not working. Fix this issue by taking care of the edge cases. Before the fix the mode change combination test fails: #./pst_test.sh Test passed: from: disable, to Test passed: from: disable, to disable Test failed: 1, From mode: disable, to mode: passive Test failed: 1, From mode: disable, to mode: active Test failed: 1, From mode: passive, to mode: active Test passed: from: passive, to disable Test failed: 1, From mode: passive, to mode: passive Test failed: 1, From mode: passive, to mode: active Test failed: 1, From mode: active, to mode: active Test passed: from: active, to disable Test failed: 1, From mode: active, to mode: passive Test failed: 1, From mode: active, to mode: active After the fix test passes: #./pst_test.sh Test passed: from: disable, to Test passed: from: disable, to disable Test passed: from: disable, to passive Test passed: from: disable, to active Test passed: from: passive, to active Test passed: from: passive, to disable Test passed: from: passive, to passive Test passed: from: passive, to active Test passed: from: active, to active Test passed: from: active, to disable Test passed: from: active, to passive Test passed: from: active, to active Fixes: abd61c08ef349 ("cpufreq: amd-pstate: add driver working mode switch support") Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-11Merge tag 'for-6.3-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix fast checksum detection, this affects filesystems with non-crc32c checksum, calculation would not be offloaded to worker threads - restore thread_pool mount option behaviour for endio workers, the new value for maximum active threads would not be set to the actual work queues * tag 'for-6.3-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix fast csum implementation detection btrfs: restore the thread_pool= behavior in remount for the end I/O workqueues
2023-04-11selftests/bpf: Add test to access u32 ptr argument in tracing programFeng Zhou
Adding verifier test for accessing u32 pointer argument in tracing programs. The test program loads 1nd argument of bpf_fentry_test9 function which is u32 pointer and checks that verifier allows that. Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230410085908.98493-3-zhoufeng.zf@bytedance.com