git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-11-16	perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and ↵	Ravi Bangoria
	throttling amd_pmu_enable_all() does: if (!test_bit(idx, cpuc->active_mask)) continue; amd_pmu_enable_event(cpuc->events[idx]); A perf NMI of another event can come between these two steps. Perf NMI handler internally disables and enables _all_ events, including the one which nmi-intercepted amd_pmu_enable_all() was in process of enabling. If that unintentionally enabled event has very low sampling period and causes immediate successive NMI, causing the event to be throttled, cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop(). This will result in amd_pmu_enable_event() getting called with event=NULL when amd_pmu_enable_all() resumes after handling the NMIs. This causes a kernel crash: BUG: kernel NULL pointer dereference, address: 0000000000000198 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page [...] Call Trace: <TASK> amd_pmu_enable_all+0x68/0xb0 ctx_resched+0xd9/0x150 event_function+0xb8/0x130 ? hrtimer_start_range_ns+0x141/0x4a0 ? perf_duration_warn+0x30/0x30 remote_function+0x4d/0x60 __flush_smp_call_function_queue+0xc4/0x500 flush_smp_call_function_queue+0x11d/0x1b0 do_idle+0x18f/0x2d0 cpu_startup_entry+0x19/0x20 start_secondary+0x121/0x160 secondary_startup_64_no_verify+0xe5/0xeb </TASK> amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler were recently added as part of BRS enablement but I'm not sure whether we really need them. We can just disable BRS in the beginning and enable it back while returning from NMI. This will solve the issue by not enabling those events whose active_masks are set but are not yet enabled in hw pmu. Fixes: ada543459cab ("perf/x86/amd: Add AMD Fam19h Branch Sampling support") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221114044029.373-1-ravi.bangoria@amd.com
2022-11-16	Merge branch 'microchip-fixes'	David S. Miller
	Shang XiaoJing says: ==================== net: microchip: Fix potential null-ptr-deref due to create_singlethread_workqueue() There are some functions call create_singlethread_workqueue() without checking ret value, and the NULL workqueue_struct pointer may causes null-ptr-deref. Will be fixed by this patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() ↵	Shang XiaoJing
	and sparx5_start() sparx_stats_init() calls create_singlethread_workqueue() and not checked the ret value, which may return NULL. And a null-ptr-deref may happen: sparx_stats_init() create_singlethread_workqueue() # failed, sparx5->stats_queue is NULL queue_delayed_work() queue_delayed_work_on() __queue_delayed_work() # warning here, but continue __queue_work() # access wq->flags, null-ptr-deref Check the ret value and return -ENOMEM if it is NULL. So as sparx5_start(). Fixes: af4b11022e2d ("net: sparx5: add ethtool configuration and statistics support") Fixes: b37a1bae742f ("net: sparx5: add mactable support") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	net: lan966x: Fix potential null-ptr-deref in lan966x_stats_init()	Shang XiaoJing
	lan966x_stats_init() calls create_singlethread_workqueue() and not checked the ret value, which may return NULL. And a null-ptr-deref may happen: lan966x_stats_init() create_singlethread_workqueue() # failed, lan966x->stats_queue is NULL queue_delayed_work() queue_delayed_work_on() __queue_delayed_work() # warning here, but continue __queue_work() # access wq->flags, null-ptr-deref Check the ret value and return -ENOMEM if it is NULL. Fixes: 12c2d0a5b8e2 ("net: lan966x: add ethtool configuration and statistics") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	Merge branch 'sfc-TC-offload-counters'	David S. Miller
	Edward Cree says: ==================== sfc: TC offload counters EF100 hardware supports attaching counters to action-sets in the MAE. Use these counters to implement stats for TC flower offload. The counters are delivered to the host over a special hardware RX queue which should only ever receive counter update messages, not 'real' network packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: implement counters readout to TC stats	Edward Cree
	On FLOW_CLS_STATS, look up the MAE counter by TC cookie, and report the change in packet and byte count since the last time FLOW_CLS_STATS read them. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: validate MAE action order	Edward Cree
	Currently the only actions supported are COUNT and DELIVER, which can only happen in the right order; but when more actions are added, it will be necessary to check that they are only used in the same order in which the hardware performs them (since the hardware API takes an action set in which the order is implicit). For instance, a VLAN pop must not follow a VLAN push. Most practical use-cases should be unaffected by these restrictions. Add a function efx_tc_flower_action_order_ok() that checks whether it is appropriate to add a specified action to the existing action-set. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: attach an MAE counter to TC actions that need it	Edward Cree
	The only actions that expect stats (that sfc HW supports) are gact shot (drop), mirred redirect and mirred mirror. Since these are 'deliverish' actions that end an action-set, we only require at most one counter per action-set. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: accumulate MAE counter values from update packets	Edward Cree
	Add the packet and byte counts to the software running total, and store the latest jiffies every time the counter is bumped. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: add functions to allocate/free MAE counters	Edward Cree
	efx_tc_flower_get_counter_index() will create an MAE counter mapped to the passed (TC filter) cookie, or increment the reference if one already exists for that cookie. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: add hashtables for MAE counters and counter ID mappings	Edward Cree
	Nothing populates them yet. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: add extra RX channel to receive MAE counter updates on ef100	Edward Cree
	Currently there is no counter-allocating machinery to connect the resulting counter update values to; that will be added in a subsequent patch. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: add ef100 MAE counter support functions	Edward Cree
	Start and stop MAE counter streaming, and grant credits. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: add ability for extra channels to receive raw RX buffers	Edward Cree
	The TC extra channel will need its own special RX handling, which must operate before any code that expects the RX buffer to contain a network packet; buffers on this RX queue contain MAE counter packets in a special format that does not resemble an Ethernet frame, and many fields of the RX packet prefix are not populated. The USER_MARK field, however, is populated with the generation count from the counter subsystem, which needs to be passed on to the RX handler. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: add start and stop methods to channels	Edward Cree
	The TC extra channel needs to do extra work in efx_{start,stop}_channels() to start/stop MAE counter streaming from the hardware. Add callbacks for it to implement. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: add ability for an RXQ to grant credits on refill	Edward Cree
	EF100 hardware streams MAE counter updates to the driver over a dedicated RX queue; however, the MCPU is not able to detect when RX buffers have been posted to the ring. Thus, the driver must call MC_CMD_MAE_COUNTERS_STREAM_GIVE_CREDITS; this patch adds the infrastructure to support that to the core RXQ handling code. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	sfc: fix ef100 RX prefix macro	Edward Cree
	Macro PREFIX_WIDTH_MASK uses unsigned long arithmetic for a shift of up to 32 bits, which breaks on 32-bit systems. This did not previously show up as we weren't using any fields of width 32, but we now need to access ESF_GZ_RX_PREFIX_USER_MARK. Change it to unsigned long long. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16	rxrpc: Fix network address validation	David Howells
	Fix network address validation on entry to uapi functions such as connect() for AF_RXRPC. The check for address compatibility with the transport socket isn't correct and allows an AF_INET6 address to be given to an AF_INET socket, resulting in an oops now that rxrpc is calling udp_sendmsg() directly. Sample program: #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <sys/socket.h> #include <arpa/inet.h> #include <linux/rxrpc.h> static unsigned char ctrl[256] = "\x18\x00\x00\x00\x00\x00\x00\x00\x10\x01\x00\x00\x01"; int main(void) { struct sockaddr_rxrpc srx = { .srx_family = AF_RXRPC, .transport_type = SOCK_DGRAM, .transport_len = 28, .transport.sin6.sin6_family = AF_INET6, }; struct mmsghdr vec = { .msg_hdr.msg_control = ctrl, .msg_hdr.msg_controllen = 0x18, }; int s; s = socket(AF_RXRPC, SOCK_DGRAM, AF_INET); if (s < 0) { perror("socket"); exit(1); } if (connect(s, (struct sockaddr *)&srx, sizeof(srx)) < 0) { perror("connect"); exit(1); } if (sendmmsg(s, &vec, 1, MSG_NOSIGNAL \| MSG_MORE) < 0) { perror("sendmmsg"); exit(1); } return 0; } If working properly, connect() should fail with EAFNOSUPPORT. Fixes: ed472b0c8783 ("rxrpc: Call udp_sendmsg() directly") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-11-16	rxrpc: Fix oops from calling udpv6_sendmsg() on AF_INET socket	David Howells
	If rxrpc sees an IPv6 address, it assumes it can call udpv6_sendmsg() on it - even if it got it on an IPv4 socket. Fix do_udp_sendmsg() to give an error in such a case. general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] ... RIP: 0010:ipv6_addr_v4mapped include/net/ipv6.h:749 [inline] RIP: 0010:udpv6_sendmsg+0xd0a/0x2c70 net/ipv6/udp.c:1361 ... Call Trace: do_udp_sendmsg net/rxrpc/output.c:27 [inline] do_udp_sendmsg net/rxrpc/output.c:21 [inline] rxrpc_send_abort_packet+0x73b/0x860 net/rxrpc/output.c:367 rxrpc_release_calls_on_socket+0x211/0x300 net/rxrpc/call_object.c:595 rxrpc_release_sock net/rxrpc/af_rxrpc.c:886 [inline] rxrpc_release+0x263/0x5a0 net/rxrpc/af_rxrpc.c:917 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x18/0x20 net/socket.c:1365 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16b/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xb35/0x2a20 kernel/exit.c:820 do_group_exit+0xd0/0x2a0 kernel/exit.c:950 __do_sys_exit_group kernel/exit.c:961 [inline] __se_sys_exit_group kernel/exit.c:959 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:959 Fixes: ed472b0c8783 ("rxrpc: Call udp_sendmsg() directly") Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-11-16	platform/x86: ideapad-laptop: Add module parameters to match DMI quirk tables	Hans de Goede
	Add module parameters to allow setting the hw_rfkill_switch and set_fn_lock_led feature flags for testing these on laptops which are not on the DMI-id based allow lists for these 2 flags. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20221115193400.376159-1-hdegoede@redhat.com
2022-11-16	platform/x86: ideapad-laptop: Fix interrupt storm on fn-lock toggle on some ↵	Arnav Rawat
	Yoga laptops Commit 3ae86d2d4704 ("platform/x86: ideapad-laptop: Fix Legion 5 Fn lock LED") uses the WMI event-id for the fn-lock event on some Legion 5 laptops to manually toggle the fn-lock LED because the EC does not do it itself. However, the same WMI ID is also sent on some Yoga laptops. Here, setting the fn-lock state is not valid behavior, and causes the EC to spam interrupts until the laptop is rebooted. Add a set_fn_lock_led_list[] DMI-id list and only enable the workaround to manually set the LED on models on this list. Link: https://bugzilla.kernel.org/show_bug.cgi?id=212671 Cc: Meng Dong <whenov@gmail.com> Signed-off-by: Arnav Rawat <arnavr3@illinois.edu> Link: https://lore.kernel.org/r/12093851.O9o76ZdvQC@fedora [hdegoede@redhat.com: Check DMI-id list only once and store the result] Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-11-16	platform/x86: hp-wmi: Ignore Smart Experience App event	Kai-Heng Feng
	Sometimes hp-wmi driver complains on system resume: [ 483.116451] hp_wmi: Unknown event_id - 33 - 0x0 According to HP it's a feature called "HP Smart Experience App" and it's safe to be ignored. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Link: https://lore.kernel.org/r/20221114073842.205392-1-kai.heng.feng@canonical.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-11-16	platform/surface: aggregator_registry: Add support for Surface Laptop 5	Maximilian Luz
	Add device nodes to enable support for battery and charger status, the ACPI platform profile, as well as internal HID devices (including touchpad and keyboard) on the Surface Laptop 5. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Link: https://lore.kernel.org/r/20221115231440.1338142-1-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-11-16	zonefs: Remove to_attr() helper function	Damien Le Moal
	to_attr() in zonefs sysfs code is unused, which it causes a warning when compiling with clang and W=1. Delete it to prevent the warning. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2022-11-16	zonefs: fix zone report size in __zonefs_io_error()	Damien Le Moal
	When an IO error occurs, the function __zonefs_io_error() is used to issue a zone report to obtain the latest zone information from the device. This function gets a zone report for all zones used as storage for a file, which is always 1 zone except for files representing aggregated conventional zones. The number of zones of a zone report for a file is calculated in __zonefs_io_error() by doing a bit-shift of the inode i_zone_size field, which is equal to or larger than the device zone size. However, this calculation does not take into account that the last zone of a zoned device may be smaller than the zone size reported by bdev_zone_sectors() (which is used to set the bit shift size). As a result, if an error occurs for an IO targetting such last smaller zone, the zone report will ask for 0 zones, leading to an invalid zone report. Fix this by using the fact that all files require a 1 zone report, except if the inode i_zone_size field indicates a zone size larger than the device zone size. This exception case corresponds to a mount with aggregated conventional zones. A check for this exception is added to the file inode initialization during mount. If an invalid setup is detected, emit an error and fail the mount (check contributed by Johannes Thumshirn). Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-16	cifs: Fix wrong return value checking when GETFLAGS	Zhang Xiaoxu
	The return value of CIFSGetExtAttr is negative, should be checked with -EOPNOTSUPP rather than EOPNOTSUPP. Fixes: 64a5cfa6db94 ("Allow setting per-file compression via SMB2/3") Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-16	nvmet: fix a memory leak in nvmet_auth_set_key	Sagi Grimberg
	When changing dhchap secrets we need to release the old secrets as well. kmemleak complaint: -- unreferenced object 0xffff8c7f44ed8180 (size 64): comm "check", pid 7304, jiffies 4295686133 (age 72034.246s) hex dump (first 32 bytes): 44 48 48 43 2d 31 3a 30 30 3a 4c 64 4c 4f 64 71 DHHC-1:00:LdLOdq 79 56 69 67 77 48 55 32 6d 5a 59 4c 7a 35 59 38 yVigwHU2mZYLz5Y8 backtrace: [<00000000b6fc5071>] kstrdup+0x2e/0x60 [<00000000f0f4633f>] 0xffffffffc0e07ee6 [<0000000053006c05>] 0xffffffffc0dff783 [<00000000419ae922>] configfs_write_iter+0xb1/0x120 [<000000008183c424>] vfs_write+0x2be/0x3c0 [<000000009005a2a5>] ksys_write+0x5f/0xe0 [<00000000cd495c89>] do_syscall_64+0x38/0x90 [<00000000f2a84ac5>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-11-16	nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV7000	Tiago Dias Ferreira
	Added a quirk to fix the Netac NV7000 SSD reporting duplicate NGUIDs. Cc: <stable@vger.kernel.org> Signed-off-by: Tiago Dias Ferreira <tiagodfer@gmail.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-11-16	MAINTAINERS: Add linux-kbuild's patchwork	Nicolas Schier
	Add patchwork URL for Kconfig and Kbuild. Signed-off-by: Nicolas Schier <nicolas@fjasle.eu> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-11-16	MAINTAINERS: Remove Michal Marek from Kbuild maintainers	Nicolas Schier
	Remove Michal Marek from Kbuild maintainers as there is no response from him since October 2017. Add an entry for Michal in CREDITS. Michal, thanks for maintaining Kbuild for almost eight years! Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Michal Marek <michal.lkml@markovi.net> Signed-off-by: Nicolas Schier <nicolas@fjasle.eu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-11-16	MAINTAINERS: Add Nathan and Nicolas to Kbuild reviewers	Nicolas Schier
	As suggested by Nick, add Nathan and myself to Kbuild reviewers to share more review responsibilities. Signed-off-by: Nicolas Schier <nicolas@fjasle.eu> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-11-15	selftests/bpf: allow unpriv bpf for selftests by default	Eduard Zingerman
	Enable unprivileged bpf for selftests kernel by default. This forces CI to run test_verifier tests in both privileged and unprivileged modes. The test_verifier.c:do_test uses sysctl kernel.unprivileged_bpf_disabled to decide whether to run or to skip test cases in unprivileged mode. The CONFIG_BPF_UNPRIV_DEFAULT_OFF controls the default value of the kernel.unprivileged_bpf_disabled. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221116015456.2461135-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-15	bpftool: Check argc first before "file" in do_batch()	Tiezhu Yang
	If the parameters for batch are more than 2, check argc first can return immediately, no need to use is_prefix() to check "file" with a little overhead and then check argc, it is better to check "file" only when the parameters for batch are 2. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/1668517207-11822-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-15	docs/bpf: Fix sample code in MAP_TYPE_ARRAY docs	Donald Hunter
	Remove mistaken & from code example in MAP_TYPE_ARRAY docs Fixes: 1cfa97b30c5a ("bpf, docs: Document BPF_MAP_TYPE_ARRAY") Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20221115095910.86407-1-donald.hunter@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-15	net: dsa: don't leak tagger-owned storage on switch driver unbind	Vladimir Oltean
	In the initial commit dc452a471dba ("net: dsa: introduce tagger-owned storage for private and shared data"), we had a call to tag_ops->disconnect(dst) issued from dsa_tree_free(), which is called at tree teardown time. There were problems with connecting to a switch tree as a whole, so this got reworked to connecting to individual switches within the tree. In this process, tag_ops->disconnect(ds) was made to be called only from switch.c (cross-chip notifiers emitted as a result of dynamic tag proto changes), but the normal driver teardown code path wasn't replaced with anything. Solve this problem by adding a function that does the opposite of dsa_switch_setup_tag_protocol(), which is called from the equivalent spot in dsa_switch_teardown(). The positioning here also ensures that we won't have any use-after-free in tagging protocol (*rcv) ops, since the teardown sequence is as follows: dsa_tree_teardown -> dsa_tree_teardown_master -> dsa_master_teardown -> unsets master->dsa_ptr, making no further packets match the ETH_P_XDSA packet type handler -> dsa_tree_teardown_ports -> dsa_port_teardown -> dsa_slave_destroy -> unregisters DSA net devices, there is even a synchronize_net() in unregister_netdevice_many() -> dsa_tree_teardown_switches -> dsa_switch_teardown -> dsa_switch_teardown_tag_protocol -> finally frees the tagger-owned storage Fixes: 7f2973149c22 ("net: dsa: make tagging protocols connect to individual switches from a tree") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221114143551.1906361-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	Merge branch 'remove-phylink_validate-from-felix-dsa-driver'	Jakub Kicinski
	Vladimir Oltean says: ==================== Remove phylink_validate() from Felix DSA driver The Felix DSA driver still uses its own phylink_validate() procedure rather than the (relatively newly introduced) phylink_generic_validate() because the latter did not cater for the case where a PHY provides rate matching between the Ethernet cable side speed and the SERDES side speed (and does not advertise other speeds except for the SERDES speed). This changed with Sean Anderson's generic support for rate matching PHYs in phylib and phylink: https://patchwork.kernel.org/project/netdevbpf/cover/20220920221235.1487501-1-sean.anderson@seco.com/ Building upon that support, this patch set makes Linux understand that the PHYs used in combination with the Felix DSA driver (SCH-30841 riser card with AQR412 PHY, used with SERDES protocol 0x7777 - 4x2500base-x, plugged into LS1028A-QDS) do support PAUSE rate matching. This requires Aquantia PHY driver support for new PHY IDs. To activate the rate matching support in phylink, config->mac_capabilities must be populated. Coincidentally, this also opts the Felix driver into the generic phylink validation. Next, code that is no longer necessary is eliminated. This includes the Felix driver validation procedures for VSC9959 and VSC9953, the workaround in the Ocelot switch library to leave RX flow control always enabled, as well as DSA plumbing necessary for a custom phylink validation procedure to be propagated to the hardware driver level. Many thanks go to Sean Anderson for providing generic support for rate matching. ==================== Link: https://lore.kernel.org/r/20221114170730.2189282-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	net: dsa: remove phylink_validate() method	Vladimir Oltean
	As of now, no DSA driver uses a custom link mode validation procedure anymore. So remove this DSA operation and let phylink determine what is supported based on config->mac_capabilities (if provided by the driver). Leave a comment why we left the code that we did, and that there is more work to do. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	net: mscc: ocelot: drop workaround for forcing RX flow control	Vladimir Oltean
	As phylink gained generic support for PHYs with rate matching via PAUSE frames, the phylink_mac_link_up() method will be called with the maximum speed and with rx_pause=true if rate matching is in use. This means that setups with 2500base-x as the SERDES protocol between the MAC/PCS and the PHY now work with no need for the driver to do anything special. Tested with fsl-ls1028a-qds-7777.dts. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	net: dsa: felix: use phylink_generic_validate()	Vladimir Oltean
	Drop the custom implementation of phylink_validate() in favor of the generic one, which requires config->mac_capabilities to be set. This was used up until now because of the possibility of being paired with Aquantia PHYs with support for rate matching. The phylink framework gained generic support for these, and knows to advertise all 10/100/1000 lower speed link modes when our SERDES protocol is 2500base-x (fixed speed). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	net: phy: aquantia: add AQR112 and AQR412 PHY IDs	Vladimir Oltean
	These are Gen3 Aquantia N-BASET PHYs which support 5GBASE-T, 2.5GBASE-T, 1000BASE-T and 100BASE-TX (not 10G); also EEE, Sync-E, PTP, PoE. The 112 is a single PHY package, the 412 is a quad PHY package. The system-side SERDES interface of these PHYs selects its protocol depending on the negotiated media side link speed. That protocol can be 1000BASE-X, 2500BASE-X, 10GBASE-R, SGMII, USXGMII. The configuration of which SERDES protocol to use for which link speed is made by firmware; even though it could be overwritten over MDIO by Linux, we assume that the firmware provisioning is ok for the board on which the driver probes. For cases when the system side runs at a fixed rate, we want phylib/phylink to detect the PAUSE rate matching ability of these PHYs, so we need to use the Aquantia rather than the generic C45 driver. This needs aqr107_read_status() -> aqr107_read_rate() to set phydev->rate_matching, as well as the aqr107_get_rate_matching() method. I am a bit unsure about the naming convention in the driver. Since AQR107 is a Gen2 PHY, I assume all functions prefixed with "aqr107_" rather than "aqr_" mean Gen2+ features. So I've reused this naming convention. I've tested PHY "SGMII" statistics as well as the .link_change_notify method, which prints: Aquantia AQR412 mdio_mux-0.4:00: Link partner is Aquantia PHY, FW 4.3, fast-retrain downshift advertised, fast reframe advertised Tested SERDES protocols are usxgmii and 2500base-x (the latter with PAUSE rate matching). Tested link modes are 100/1000/2500 Base-T (with Aquantia link partner and with other link partners). No notable events observed. The placement of these PHY IDs in the driver is right before AQR113C, a Gen4 PHY. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next	Jakub Kicinski
	Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Fix sparse warning in the new nft_inner expression, reported by Jakub Kicinski. 2) Incorrect vlan header check in nft_inner, from Peng Wu. 3) Two patches to pass reset boolean to expression dump operation, in preparation for allowing to reset stateful expressions in rules. This adds a new NFT_MSG_GETRULE_RESET command. From Phil Sutter. 4) Inconsistent indentation in nft_fib, from Jiapeng Chong. 5) Speed up siphash calculation in conntrack, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: conntrack: use siphash_4u64 netfilter: rpfilter/fib: clean up some inconsistent indenting netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters netfilter: nft_inner: fix return value check in nft_inner_parse_l2l3() netfilter: nft_payload: use __be16 to store gre version ==================== Link: https://lore.kernel.org/r/20221115095922.139954-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	Documentation: nfp: update documentation	Walter Heymans
	The NFP documentation is updated to include information about Corigine, and the new NFP3800 chips. The 'Acquiring Firmware' section is updated with new information about where to find firmware. Two new sections are added to expand the coverage of the documentation. The new sections include: - Devlink Info - Configure Device Signed-off-by: Walter Heymans <walter.heymans@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20221115090834.738645-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	Merge branch ↵	Jakub Kicinski
	'mtk_eth_soc-rx-vlan-offload-improvement-dsa-hardware-untag-support' Felix Fietkau says: ==================== mtk_eth_soc rx vlan offload improvement + dsa hardware untag support This series improves rx vlan offloading on mtk_eth_soc and extends it to support hardware DSA untagging where possible. This improves performance by avoiding calls into the DSA tag driver receive function, including mangling of skb->data. This is split out of a previous series, which added other fixes and multiqueue support ==================== Link: https://lore.kernel.org/r/20221114124214.58199-1-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	net/x25: Fix skb leak in x25_lapb_receive_frame()	Wei Yongjun
	x25_lapb_receive_frame() using skb_copy() to get a private copy of skb, the new skb should be freed in the undersized/fragmented skb error handling path. Otherwise there is a memory leak. Fixes: cb101ed2c3c7 ("x25: Handle undersized/fragmented skbs") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20221114110519.514538-1-weiyongjun@huaweicloud.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ↵	Liu Jian
	ag71xx_open() If ag71xx_hw_enable() fails, call phylink_disconnect_phy() to clean up. And if phylink_of_phy_connect() fails, nothing needs to be done. Compile tested only. Fixes: 892e09153fa3 ("net: ag71xx: port to phylink") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20221114095549.40342-1-liujian56@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	net: ethernet: mtk_eth_soc: enable hardware DSA untagging	Felix Fietkau
	- pass the tag to DSA via metadata dst - disabled on 7986 for now, since it's not working yet - disabled if a MAC is enabled that does not use DSA This improves performance by bypassing the DSA tag driver and avoiding extra skb data mangling Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	net: ethernet: mtk_eth_soc: add support for configuring vlan rx offload	Felix Fietkau
	Keep the vlan rx offload feature in sync across all netdevs belonging to the device, since the feature is global and can't be turned off per MAC Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	net: ethernet: mtk_eth_soc: pass correct VLAN protocol ID to the network stack	Felix Fietkau
	Use the id from the DMA descriptor instead of hardcoding 802.1q Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	net: dsa: add support for DSA rx offloading via metadata dst	Felix Fietkau
	If a metadata dst is present with the type METADATA_HW_PORT_MUX on a dsa cpu port netdev, assume that it carries the port number and that there is no DSA tag present in the skb data. Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15	Merge branch 'propagate nullness information for reg to reg comparisons'	Alexei Starovoitov
	Eduard Zingerman says: ==================== This patchset adds ability to propagates nullness information for branches of register to register equality compare instructions. The following rules are used: - suppose register A maybe null - suppose register B is not null - for JNE A, B, ... - A is not null in the false branch - for JEQ A, B, ... - A is not null in the true branch E.g. for program like below: r6 = skb->sk; r7 = sk_fullsock(r6); r0 = sk_fullsock(r6); if (r0 == 0) return 0; (a) if (r0 != r7) return 0; (b) r7->type; (c) return 0; It is safe to dereference r7 at point (c), because of (a) and (b). The utility of this change came up while working on BPF CLang backend issue [1]. Specifically, while debugging issue with selftest `test_sk_lookup.c`. This test has the following structure: int access_ctx_sk(struct bpf_sk_lookup ctx __CTX__) { struct bpf_sock sk1 = NULL, sk2 = NULL; ... sk1 = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A); if (!sk1) // (a) goto out; ... if (ctx->sk != sk1) // (b) goto out; ... if (ctx->sk->family != AF_INET \|\| // (c) ctx->sk->type != SOCK_STREAM \|\| ctx->sk->state != BPF_TCP_LISTEN) goto out; ... } - at (a) `sk1` is checked to be not null; - at (b) `ctx->sk` is verified to be equal to `sk1`; - at (c) `ctx->sk` is accessed w/o nullness check. Currently Global Value Numbering pass considers expressions `sk1` and `ctx->sk` to be identical at point (c) and replaces `ctx->sk` with `sk1` (not expressions themselves but corresponding SSA values). Since `sk1` is known to be not null after (b) verifier allows execution of the program. However, such optimization is not guaranteed to happen. When it does not happen verifier reports an error. Changelog: v2 -> v3: - verifier tests are updated with correct error message for unprivileged mode (pointer comparisons are forbidden in unprivileged mode). v1 -> v2: - after investigation described in [2] as suggested by John, Daniel and Shung-Hsi, function `type_is_pointer` is removed, calls to this function are replaced by `__is_pointer_value(false, src_reg)`. RFC -> v1: - newly added if block in `check_cond_jmp_op` is moved down to keep `make_ptr_not_null_reg` actions together; - tests rewritten to have a single `r0 = 0; exit;` block. [1] https://reviews.llvm.org/D131633#3722231 [2] https://lore.kernel.org/bpf/bad8be826d088e0d180232628160bf932006de89.camel@gmail.com/ [RFC] https://lore.kernel.org/bpf/20220822094312.175448-1-eddyz87@gmail.com/ [v1] https://lore.kernel.org/bpf/20220826172915.1536914-1-eddyz87@gmail.com/ [v2] https://lore.kernel.org/bpf/20221106214921.117631-1-eddyz87@gmail.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>