summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-04-26tcp: support rstreason for passive resetJason Xing
Reuse the dropreason logic to show the exact reason of tcp reset, so we can finally display the corresponding item in enum sk_reset_reason instead of reinventing new reset reasons. This patch replaces all the prior NOT_SPECIFIED reasons. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26rstreason: prepare for active resetJason Xing
Like what we did to passive reset: only passing possible reset reason in each active reset path. No functional changes. Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26rstreason: prepare for passive resetJason Xing
Adjust the parameter and support passing reason of reset which is for now NOT_SPECIFIED. No functional changes. Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26net: introduce rstreason to detect why the RST is sentJason Xing
Add a new standalone file for the easy future extension to support both active reset and passive reset in the TCP/DCCP/MPTCP protocols. This patch only does the preparations for reset reason mechanism, nothing else changes. The reset reasons are divided into three parts: 1) reuse drop reasons for passive reset in TCP 2) our own independent reasons which aren't relying on other reasons at all 3) reuse MP_TCPRST option for MPTCP The benefits of a standalone reset reason are listed here: 1) it can cover more than one case, such as reset reasons in MPTCP, active reset reasons. 2) people can easily/fastly understand and maintain this mechanism. 3) we get unified format of output with prefix stripped. 4) more new reset reasons are on the way ... I will implement the basic codes of active/passive reset reason in those three protocols, which are not complete for this moment. For passive reset part in TCP, I only introduce the NO_SOCKET common case which could be set as an example. After this series applied, it will have the ability to open a new gate to let other people contribute more reasons into it :) Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26igc: Add Tx hardware timestamp request for AF_XDP zero-copy packetSong Yoong Siang
This patch adds support to per-packet Tx hardware timestamp request to AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that user needs to enable Tx HW timestamp capability via igc_ioctl() with SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request. Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0 (adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have four sets of timestamping registers. *skb and *xsk_tx_buffer pointers are used to indicate whether the timestamping register is already occupied. Furthermore, a boolean variable named xsk_pending_ts is used to hold the transmit completion until the tx hardware timestamp is ready. This is because, for i225/i226, the timestamp notification event comes some time after the transmit completion event. The driver will retrigger hardware irq to clean the packet after retrieve the tx hardware timestamp. Besides, xsk_meta is added into struct igc_tx_timestamp_request as a hook to the metadata location of the transmit packet. When the Tx timestamp interrupt is fired, the interrupt handler will copy the value of Tx hwts into metadata location via xsk_tx_metadata_complete(). This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel ADL-S platform. Below are the test steps and results. Test Step 1: Run xdp_hw_metadata app ./xdp_hw_metadata <iface> > /dev/shm/result.log Test Step 2: Enable Tx hardware timestamp hwstamp_ctl -i <iface> -t 1 -r 1 Test Step 3: Run ptp4l and phc2sys for time synchronization Test Step 4: Generate UDP packets with 1ms interval for 10s trafgen --dev <iface> '{eth(da=<addr>), udp(dp=9091)}' -t 1ms -n 10000 Test Step 5: Rerun Step 1-3 with 10s iperf3 as background traffic Test Step 6: Rerun Step 1-4 with 10s iperf3 as background traffic Based on iperf3 results below, the impact of holding tx completion to throughput is not observable. Result of last UDP packet (no. 10000) in Step 4: poll: 1 (0) skip=99 fail=0 redir=10000 xsk_ring_cons__peek: 1 0x5640a37972d0: rx_desc[9999]->addr=f2110 addr=f2110 comp_addr=f2110 EoP rx_hash: 0x2049BE1D with RSS type:0x1 HW RX-time: 1679819246792971268 (sec:1679819246.7930) delta to User RX-time sec:0.0000 (14.990 usec) XDP RX-time: 1679819246792981987 (sec:1679819246.7930) delta to User RX-time sec:0.0000 (4.271 usec) No rx_vlan_tci or rx_vlan_proto, err=-95 0x5640a37972d0: ping-pong with csum=ab19 (want 315b) csum_start=34 csum_offset=6 0x5640a37972d0: complete tx idx=9999 addr=f010 HW TX-complete-time: 1679819246793036971 (sec:1679819246.7930) delta to User TX-complete-time sec:0.0001 (77.656 usec) XDP RX-time: 1679819246792981987 (sec:1679819246.7930) delta to User TX-complete-time sec:0.0001 (132.640 usec) HW RX-time: 1679819246792971268 (sec:1679819246.7930) delta to HW TX-complete-time sec:0.0001 (65.703 usec) 0x5640a37972d0: complete rx idx=10127 addr=f2110 Result of iperf3 without tx hwts request in step 5: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.74 GBytes 2.36 Gbits/sec 0 sender [ 5] 0.00-10.05 sec 2.74 GBytes 2.34 Gbits/sec receiver Result of iperf3 running parallel with trafgen command in step 6: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.74 GBytes 2.36 Gbits/sec 0 sender [ 5] 0.00-10.04 sec 2.74 GBytes 2.34 Gbits/sec receiver Co-developed-by: Lai Peter Jun Ann <jun.ann.lai@intel.com> Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240424210256.3440903-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26thermal/debugfs: Avoid printing zero duration for mitigation events in progressRafael J. Wysocki
If a thermal mitigation event is in progress, its duration value has not been updated yet, so 0 will be printed as the event duration by tze_seq_show() which is confusing. Avoid doing that by marking the beginning of the event with the KTIME_MIN duration value and making tze_seq_show() compute the current event duration on the fly, in which case '>' will be printed instead of '=' in the event duration value field. Similarly, for trip points that have been crossed on the down, mark the end of mitigation with the KTIME_MAX timestamp value and make tze_seq_show() compute the current duration on the fly for the trip points still involved in the mitigation, in which cases the duration value printed by it will be prepended with a '>' character. Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()Rafael J. Wysocki
If cdev_dt_seq_show() runs before the first state transition of a cooling device, it will not print any state residency information for it, even though it might be reasonably expected to print residency information for the initial state of the cooling device. For this reason, rearrange the code to get the initial state of a cooling device at the registration time and pass it to thermal_debug_cdev_add(), so that the latter can create a duration record for that state which will allow cdev_dt_seq_show() to print its residency information. Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information") Reported-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26thermal/debugfs: Create records for cdev states as they get usedRafael J. Wysocki
Because thermal_debug_cdev_state_update() only creates a duration record for the old state of a cooling device, if its new state is used for the first time, there will be no record for it and cdev_dt_seq_show() will not print the duration information for it even though it contains code to compute the duration value in that case. Address this by making thermal_debug_cdev_state_update() create a duration record for the new state if there is none. Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information") Reported-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26Merge back earlier thermal core changes for v6.10.Rafael J. Wysocki
2024-04-26thermal/debugfs: Prevent use-after-free from occurring after cdev removalRafael J. Wysocki
Since thermal_debug_cdev_remove() does not run under cdev->lock, it can run in parallel with thermal_debug_cdev_state_update() and it may free the struct thermal_debugfs object used by the latter after it has been checked against NULL. If that happens, thermal_debug_cdev_state_update() will access memory that has been freed already causing the kernel to crash. Address this by using cdev->lock in thermal_debug_cdev_remove() around the cdev->debugfs value check (in case the same cdev is removed at the same time in two different threads) and its reset to NULL. Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information") Cc :6.8+ <stable@vger.kernel.org> # 6.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26netfs: Fix the pre-flush when appending to a file in writethrough modeDavid Howells
In netfs_perform_write(), when the file is marked NETFS_ICTX_WRITETHROUGH or O_*SYNC or RWF_*SYNC was specified, write-through caching is performed on a buffered file. When setting up for write-through, we flush any conflicting writes in the region and wait for the write to complete, failing if there's a write error to return. The issue arises if we're writing at or above the EOF position because we skip the flush and - more importantly - the wait. This becomes a problem if there's a partial folio at the end of the file that is being written out and we want to make a write to it too. Both the already-running write and the write we start both want to clear the writeback mark, but whoever is second causes a warning looking something like: ------------[ cut here ]------------ R=00000012: folio 11 is not under writeback WARNING: CPU: 34 PID: 654 at fs/netfs/write_collect.c:105 ... CPU: 34 PID: 654 Comm: kworker/u386:27 Tainted: G S ... ... Workqueue: events_unbound netfs_write_collection_worker ... RIP: 0010:netfs_writeback_lookup_folio Fix this by making the flush-and-wait unconditional. It will do nothing if there are no folios in the pagecache and will return quickly if there are no folios in the region specified. Further, move the WBC attachment above the flush call as the flush is going to attach a WBC and detach it again if it is not present - and since we need one anyway we might as well share it. Fixes: 41d8e7673a77 ("netfs: Implement a write-through caching option") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202404161031.468b84f-oliver.sang@intel.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2150448.1714130115@warthog.procyon.org.uk Reviewed-by: Jeffrey Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-26dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM4450 compatiblesTengfei Fan
Add compatible for EPSS CPUFREQ-HW on SM4450. Signed-off-by: Tengfei Fan <quic_tengfan@quicinc.com> Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-04-26net l2tp: drop flow hash on forwardDavid Bauer
Drop the flow-hash of the skb when forwarding to the L2TP netdev. This avoids the L2TP qdisc from using the flow-hash from the outer packet, which is identical for every flow within the tunnel. This does not affect every platform but is specific for the ethernet driver. It depends on the platform including L4 information in the flow-hash. One such example is the Mediatek Filogic MT798x family of networking processors. Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David Bauer <mail@david-bauer.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240424171110.13701-1-mail@david-bauer.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26Merge branch 'selftests-virtio_net-introduce-initial-testing-infrastructure'Paolo Abeni
Jiri Pirko says: ==================== selftests: virtio_net: introduce initial testing infrastructure This patchset aims at introducing very basic initial infrastructure for virtio_net testing, namely it focuses on virtio feature testing. The first patch adds support for debugfs for virtio devices, allowing user to filter features to pretend to be driver that is not capable of the filtered feature. Example: $ cat /sys/bus/virtio/devices/virtio0/features 1110010111111111111101010000110010000000100000000000000000000000 $ echo "5" >/sys/kernel/debug/virtio/virtio0/filter_feature_add $ cat /sys/kernel/debug/virtio/virtio0/filter_features 5 $ echo "virtio0" > /sys/bus/virtio/drivers/virtio_net/unbind $ echo "virtio0" > /sys/bus/virtio/drivers/virtio_net/bind $ cat /sys/bus/virtio/devices/virtio0/features 1110000111111111111101010000110010000000100000000000000000000000 Leverage that in the last patch that lays ground for virtio_net selftests testing, including very basic F_MAC feature test. To run this, do: $ make -C tools/testing/selftests/ TARGETS=drivers/net/virtio_net/ run_tests It is assumed, as with lot of other selftests in the net group, that there are netdevices connected back-to-back. In this case, two virtio_net devices connected back to back. If you use "tap" qemu netdevice type, to configure this loop on a hypervisor, one may use this script: DEV1="$1" DEV2="$2" sudo tc qdisc add dev $DEV1 clsact sudo tc qdisc add dev $DEV2 clsact sudo tc filter add dev $DEV1 ingress protocol all pref 1 matchall action mirred egress redirect dev $DEV2 sudo tc filter add dev $DEV2 ingress protocol all pref 1 matchall action mirred egress redirect dev $DEV1 sudo ip link set $DEV1 up sudo ip link set $DEV2 up Another possibility is to use virtme-ng like this: $ vng --network=loop or directly: $ vng --network=loop -- make -C tools/testing/selftests/ TARGETS=drivers/net/virtio_net/ run_tests "loop" network type will take care of creating two "hubport" qemu netdevs putting them into a single hub. To do it manually with qemu, pass following command line options: -nic hubport,hubid=1,id=nd0,model=virtio-net-pci -nic hubport,hubid=1,id=nd1,model=virtio-net-pci ==================== Link: https://lore.kernel.org/r/20240424104049.3935572-1-jiri@resnulli.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26selftests: virtio_net: add initial testsJiri Pirko
Introduce initial tests for virtio_net driver. Focus on feature testing leveraging previously introduced debugfs feature filtering infrastructure. Add very basic ping and F_MAC feature tests. To run this, do: $ make -C tools/testing/selftests/ TARGETS=drivers/net/virtio_net/ run_tests Run it on a system with 2 virtio_net devices connected back-to-back on the hypervisor. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Tested-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26selftests: forwarding: add wait_for_dev() helperJiri Pirko
The existing setup_wait*() helper family check the status of the interface to be up. Introduce wait_for_dev() to wait for the netdevice to appear, for example after test script does manual device bind. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26selftests: forwarding: add check_driver() helperJiri Pirko
Add a helper to be used to check if the netdevice is backed by specified driver. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26selftests: forwarding: add ability to assemble NETIFS array by driver nameJiri Pirko
Allow driver tests to work without specifying the netdevice names. Introduce a possibility to search for available netdevices according to set driver name. Allow test to specify the name by setting NETIF_FIND_DRIVER variable. Note that user overrides this either by passing netdevice names on the command line or by declaring NETIFS array in custom forwarding.config configuration file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26virtio: add debugfs infrastructure to allow to debug virtio featuresJiri Pirko
Currently there is no way for user to set what features the driver should obey or not, it is hard wired in the code. In order to be able to debug the device behavior in case some feature is disabled, introduce a debugfs infrastructure with couple of files allowing user to see what features the device advertises and to set filter for features used by driver. Example: $cat /sys/bus/virtio/devices/virtio0/features 1110010111111111111101010000110010000000100000000000000000000000 $ echo "5" >/sys/kernel/debug/virtio/virtio0/filter_feature_add $ cat /sys/kernel/debug/virtio/virtio0/filter_features 5 $ echo "virtio0" > /sys/bus/virtio/drivers/virtio_net/unbind $ echo "virtio0" > /sys/bus/virtio/drivers/virtio_net/bind $ cat /sys/bus/virtio/devices/virtio0/features 1110000111111111111101010000110010000000100000000000000000000000 Note that sysfs "features" now already exists, this patch does not touch it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26wifi: brcmfmac: remove unused brcmf_usb_image structChristophe JAILLET
struct brcmf_usb_image was added in the initial commit 71bb244ba2fd5 ("brcm80211: fmac: add USB support for bcm43235/6/8 chipsets") and updated in commit 803599d40418 ("brcmfmac: store usb fw images in local linked list.") Its only usage was removed in commit 52f98a57d8c1 ("brcmfmac: remove firmware list from USB driver"). Remove the structure definition now. This saves a few lines of code. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/23afd8c1733ad087ce2399a07a30d689aef861d5.1714039373.git.christophe.jaillet@wanadoo.fr
2024-04-26wifi: brcmsmac: ampdu: remove unused cb_del_ampdu_pars structChristophe JAILLET
struct cb_del_ampdu_pars was added in the initial commit 5b435de0d7868 ("net: wireless: add brcm80211 drivers") and its only usage was removed in commit e041f65d5f00 ("brcmsmac: Remove internal tx queue"). Remove the structure definition now. This saves a few lines of code. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/fa3b190b6e9cba65ecc36fc93121c6ed8704f704.1714036681.git.christophe.jaillet@wanadoo.fr
2024-04-26nsh: Restore skb->{protocol,data,mac_header} for outer header in ↵Kuniyuki Iwashima
nsh_gso_segment(). syzbot triggered various splats (see [0] and links) by a crafted GSO packet of VIRTIO_NET_HDR_GSO_UDP layering the following protocols: ETH_P_8021AD + ETH_P_NSH + ETH_P_IPV6 + IPPROTO_UDP NSH can encapsulate IPv4, IPv6, Ethernet, NSH, and MPLS. As the inner protocol can be Ethernet, NSH GSO handler, nsh_gso_segment(), calls skb_mac_gso_segment() to invoke inner protocol GSO handlers. nsh_gso_segment() does the following for the original skb before calling skb_mac_gso_segment() 1. reset skb->network_header 2. save the original skb->{mac_heaeder,mac_len} in a local variable 3. pull the NSH header 4. resets skb->mac_header 5. set up skb->mac_len and skb->protocol for the inner protocol. and does the following for the segmented skb 6. set ntohs(ETH_P_NSH) to skb->protocol 7. push the NSH header 8. restore skb->mac_header 9. set skb->mac_header + mac_len to skb->network_header 10. restore skb->mac_len There are two problems in 6-7 and 8-9. (a) After 6 & 7, skb->data points to the NSH header, so the outer header (ETH_P_8021AD in this case) is stripped when skb is sent out of netdev. Also, if NSH is encapsulated by NSH + Ethernet (so NSH-Ethernet-NSH), skb_pull() in the first nsh_gso_segment() will make skb->data point to the middle of the outer NSH or Ethernet header because the Ethernet header is not pulled by the second nsh_gso_segment(). (b) While restoring skb->{mac_header,network_header} in 8 & 9, nsh_gso_segment() does not assume that the data in the linear buffer is shifted. However, udp6_ufo_fragment() could shift the data and change skb->mac_header accordingly as demonstrated by syzbot. If this happens, even the restored skb->mac_header points to the middle of the outer header. It seems nsh_gso_segment() has never worked with outer headers so far. At the end of nsh_gso_segment(), the outer header must be restored for the segmented skb, instead of the NSH header. To do that, let's calculate the outer header position relatively from the inner header and set skb->{data,mac_header,protocol} properly. [0]: BUG: KMSAN: uninit-value in ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:524 [inline] BUG: KMSAN: uninit-value in ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline] BUG: KMSAN: uninit-value in ipvlan_queue_xmit+0xf44/0x16b0 drivers/net/ipvlan/ipvlan_core.c:668 ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:524 [inline] ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline] ipvlan_queue_xmit+0xf44/0x16b0 drivers/net/ipvlan/ipvlan_core.c:668 ipvlan_start_xmit+0x5c/0x1a0 drivers/net/ipvlan/ipvlan_main.c:222 __netdev_start_xmit include/linux/netdevice.h:4989 [inline] netdev_start_xmit include/linux/netdevice.h:5003 [inline] xmit_one net/core/dev.c:3547 [inline] dev_hard_start_xmit+0x244/0xa10 net/core/dev.c:3563 __dev_queue_xmit+0x33ed/0x51c0 net/core/dev.c:4351 dev_queue_xmit include/linux/netdevice.h:3171 [inline] packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3081 [inline] packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] __sys_sendto+0x735/0xa10 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was created at: slab_post_alloc_hook mm/slub.c:3819 [inline] slab_alloc_node mm/slub.c:3860 [inline] __do_kmalloc_node mm/slub.c:3980 [inline] __kmalloc_node_track_caller+0x705/0x1000 mm/slub.c:4001 kmalloc_reserve+0x249/0x4a0 net/core/skbuff.c:582 __alloc_skb+0x352/0x790 net/core/skbuff.c:651 skb_segment+0x20aa/0x7080 net/core/skbuff.c:4647 udp6_ufo_fragment+0xcab/0x1150 net/ipv6/udp_offload.c:109 ipv6_gso_segment+0x14be/0x2ca0 net/ipv6/ip6_offload.c:152 skb_mac_gso_segment+0x3e8/0x760 net/core/gso.c:53 nsh_gso_segment+0x6f4/0xf70 net/nsh/nsh.c:108 skb_mac_gso_segment+0x3e8/0x760 net/core/gso.c:53 __skb_gso_segment+0x4b0/0x730 net/core/gso.c:124 skb_gso_segment include/net/gso.h:83 [inline] validate_xmit_skb+0x107f/0x1930 net/core/dev.c:3628 __dev_queue_xmit+0x1f28/0x51c0 net/core/dev.c:4343 dev_queue_xmit include/linux/netdevice.h:3171 [inline] packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3081 [inline] packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] __sys_sendto+0x735/0xa10 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b CPU: 1 PID: 5101 Comm: syz-executor421 Not tainted 6.8.0-rc5-syzkaller-00297-gf2e367d6ad3b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Fixes: c411ed854584 ("nsh: add GSO support") Reported-and-tested-by: syzbot+42a0dc856239de4de60e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=42a0dc856239de4de60e Reported-and-tested-by: syzbot+c298c9f0e46a3c86332b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c298c9f0e46a3c86332b Link: https://lore.kernel.org/netdev/20240415222041.18537-1-kuniyu@amazon.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240424023549.21862-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26iommu/amd: Enhance def_domain_type to handle untrusted deviceVasant Hegde
Previously, IOMMU core layer was forcing IOMMU_DOMAIN_DMA domain for untrusted device. This always took precedence over driver's def_domain_type(). Commit 59ddce4418da ("iommu: Reorganize iommu_get_default_domain_type() to respect def_domain_type()") changed the behaviour. Current code calls def_domain_type() but if it doesn't return IOMMU_DOMAIN_DMA for untrusted device it throws error. This results in IOMMU group (and potentially IOMMU itself) in undetermined state. This patch adds untrusted check in AMD IOMMU driver code. So that it allows eGPUs behind Thunderbolt work again. Fine tuning amd_iommu_def_domain_type() will be done later. Reported-by: Eric Wagner <ewagner12@gmail.com> Link: https://lore.kernel.org/linux-iommu/CAHudX3zLH6CsRmLE-yb+gRjhh-v4bU5_1jW_xCcxOo_oUUZKYg@mail.gmail.com Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3182 Fixes: 59ddce4418da ("iommu: Reorganize iommu_get_default_domain_type() to respect def_domain_type()") Cc: Robin Murphy <robin.murphy@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: stable@kernel.org # v6.7+ Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20240423111725.5813-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26Merge branch 'net-hsr-add-support-for-hsr-san-redbox'Paolo Abeni
Lukasz Majewski says: ==================== net: hsr: Add support for HSR-SAN (RedBOX) This patch set provides v6 of HSR-SAN (RedBOX) as well as hsr_redbox.sh test script. The most straightforward way to test those patches is to use buildroot (2024.02.01) to create rootfs and QEMU based environment to run x86_64 Linux. Then one shall run hsr_redbox.sh and hsr_ping.sh from tools/testing/selftests/net/hsr. ==================== Link: https://lore.kernel.org/r/20240423124908.2073400-1-lukma@denx.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26test: hsr: Add test for HSR RedBOX (HSR-SAN) mode of operationLukasz Majewski
This patch adds hsr_redbox.sh script to test if HSR-SAN mode of operation works correctly. Signed-off-by: Lukasz Majewski <lukma@denx.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26test: hsr: Extract version agnostic information from ping command outputLukasz Majewski
Current code checks if ping command output match hardcoded pattern: "10 packets transmitted, 10 received, 0% packet loss,". Such approach will work only from one ping program version (for which this test has been originally written). This patch address problem when ping with different summary output like "10 packets transmitted, 10 packets received, 0% packet" is used to run this test - for example one from busybox (as the test system runs in QEMU with rootfs created with buildroot). The fix is to modify output of ping command to be agnostic to ping version used on the platform. Signed-off-by: Lukasz Majewski <lukma@denx.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26test: hsr: Move common code to hsr_common.sh fileLukasz Majewski
Some of the code already present in the hsr_ping.sh test program can be moved to a separate script file, so it can be reused by other HSR functionality (like HSR-SAN) tests. Signed-off-by: Lukasz Majewski <lukma@denx.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26test: hsr: Remove script code already implemented in lib.shLukasz Majewski
Some parts (like netns creation and cleanup) of hsr_ping.sh script are already implemented in ../lib.sh common script, so can be replaced by it. Signed-off-by: Lukasz Majewski <lukma@denx.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26net: hsr: Provide RedBox support (HSR-SAN)Lukasz Majewski
Introduce RedBox support (HSR-SAN to be more precise) for HSR networks. Following traffic reduction optimizations have been implemented: - Do not send HSR supervisory frames to Port C (interlink) - Do not forward to HSR ring frames addressed to Port C - Do not forward to Port C frames from HSR ring - Do not send duplicate HSR frame to HSR ring when destination is Port C The corresponding patch to modify iptable2 sources has already been sent: https://lore.kernel.org/netdev/20240308145729.490863-1-lukma@denx.de/T/ Testing procedure (veth and netns): ----------------------------------- One shall run: linux-vanila/tools/testing/selftests/net/hsr/hsr_redbox.sh (Detailed description of the setup one can find in the test script file). Testing procedure (real hardware): ---------------------------------- The EVB-KSZ9477 has been used for testing on net-next branch (SHA1: 5fc68320c1fb3c7d456ddcae0b4757326a043e6f). Ports 4/5 were used for SW managed HSR (hsr1) as first hsr0 for ports 1/2 (with HW offloading for ksz9477) was created. Port 3 has been used as interlink port (single USB-ETH dongle). Configuration - RedBox (EVB-KSZ9477): if link set lan1 down;ip link set lan2 down ip link add name hsr0 type hsr slave1 lan1 slave2 lan2 supervision 45 version 1 ip link add name hsr1 type hsr slave1 lan4 slave2 lan5 interlink lan3 supervision 45 version 1 ip link set lan4 up;ip link set lan5 up ip link set lan3 up ip addr add 192.168.0.11/24 dev hsr1 ip link set hsr1 up Configuration - DAN-H (EVB-KSZ9477): ip link set lan1 down;ip link set lan2 down ip link add name hsr0 type hsr slave1 lan1 slave2 lan2 supervision 45 version 1 ip link add name hsr1 type hsr slave1 lan4 slave2 lan5 supervision 45 version 1 ip link set lan4 up;ip link set lan5 up ip addr add 192.168.0.12/24 dev hsr1 ip link set hsr1 up This approach uses only SW based HSR devices (hsr1). -------------- ----------------- ------------ DAN-H Port5 | <------> | Port5 | | Port4 | <------> | Port4 Port3 | <---> | PC | | (RedBox) | | (USB-ETH) EVB-KSZ9477 | | EVB-KSZ9477 | | -------------- ----------------- ------------ Signed-off-by: Lukasz Majewski <lukma@denx.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26crypto: x86/aes-gcm - simplify GCM hash subkey derivationEric Biggers
Remove a redundant expansion of the AES key, and use rodata for zeroes. Also rename rfc4106_set_hash_subkey() to aes_gcm_derive_hash_subkey() because it's used for both versions of AES-GCM, not just RFC4106. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: x86/aes-gcm - delete unused GCM assembly codeEric Biggers
Delete aesni_gcm_enc() and aesni_gcm_dec() because they are unused. Only the incremental AES-GCM functions (aesni_gcm_init(), aesni_gcm_enc_update(), aesni_gcm_finalize()) are actually used. This saves 17 KB of object code. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath()Eric Biggers
Since the total length processed by the loop in xts_crypt_slowpath() is a multiple of AES_BLOCK_SIZE, just round the length down to AES_BLOCK_SIZE even on the last step. This doesn't change behavior, as the last step will process a multiple of AES_BLOCK_SIZE regardless. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26hwrng: stm32 - repair clock handlingMarek Vasut
The clock management in this driver does not seem to be correct. The struct hwrng .init callback enables the clock, but there is no matching .cleanup callback to disable the clock. The clock get disabled as some later point by runtime PM suspend callback. Furthermore, both runtime PM and sleep suspend callbacks access registers first and disable clock which are used for register access second. If the IP is already in RPM suspend and the system enters sleep state, the sleep callback will attempt to access registers while the register clock are already disabled. This bug has been fixed once before already in commit 9bae54942b13 ("hwrng: stm32 - fix pm_suspend issue"), and regressed in commit ff4e46104f2e ("hwrng: stm32 - rework power management sequences") . Fix this slightly differently, disable register clock at the end of .init callback, this way the IP is disabled after .init. On every access to the IP, which really is only stm32_rng_read(), do pm_runtime_get_sync() which is already done in stm32_rng_read() to bring the IP from RPM suspend, and pm_runtime_mark_last_busy()/pm_runtime_put_sync_autosuspend() to put it back into RPM suspend. Change sleep suspend/resume callbacks to enable and disable register clock around register access, as those cannot use the RPM suspend/resume callbacks due to slightly different initialization in those sleep callbacks. This way, the register access should always be performed with clock surely enabled. Fixes: ff4e46104f2e ("hwrng: stm32 - rework power management sequences") Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26hwrng: stm32 - put IP into RPM suspend on failureMarek Vasut
In case of an irrecoverable failure, put the IP into RPM suspend to avoid RPM imbalance. I did not trigger this case, but it seems it should be done based on reading the code. Fixes: b17bc6eb7c2b ("hwrng: stm32 - rework error handling in stm32_rng_read()") Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26hwrng: stm32 - use logical OR in conditionalMarek Vasut
The conditional is used to check whether err is non-zero OR whether reg variable is non-zero after clearing bits from it. This should be done using logical OR, not bitwise OR, fix it. Fixes: 6b85a7e141cb ("hwrng: stm32 - implement STM32MP13x support") Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: ecdh - Initialize ctx->private_key in proper byte orderStefan Berger
The private key in ctx->private_key is currently initialized in reverse byte order in ecdh_set_secret and whenever the key is needed in proper byte order the variable priv is introduced and the bytes from ctx->private_key are copied into priv while being byte-swapped (ecc_swap_digits). To get rid of the unnecessary byte swapping initialize ctx->private_key in proper byte order and clean up all functions that were previously using priv or were called with ctx->private_key: - ecc_gen_privkey: Directly initialize the passed ctx->private_key with random bytes filling all the digits of the private key. Get rid of the priv variable. This function only has ecdh_set_secret as a caller to create NIST P192/256/384 private keys. - crypto_ecdh_shared_secret: Called only from ecdh_compute_value with ctx->private_key. Get rid of the priv variable and work with the passed private_key directly. - ecc_make_pub_key: Called only from ecdh_compute_value with ctx->private_key. Get rid of the priv variable and work with the passed private_key directly. Cc: Salvatore Benedetto <salvatore.benedetto@intel.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: ecdh - Pass private key in proper byte order to check valid keyStefan Berger
ecc_is_key_valid expects a key with the most significant digit in the last entry of the digit array. Currently ecdh_set_secret passes a reversed key to ecc_is_key_valid that then passes the rather simple test checking whether the private key is in range [2, n-3]. For all current ecdh- supported curves (NIST P192/256/384) the 'n' parameter is a rather large number, therefore easily passing this test. Throughout the ecdh and ecc codebase the variable 'priv' is used for a private_key holding the bytes in proper byte order. Therefore, introduce priv in ecdh_set_secret and copy the bytes from ctx->private_key into priv in proper byte order by using ecc_swap_digits. Pass priv to ecc_is_valid_key. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Salvatore Benedetto <salvatore.benedetto@intel.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: tegra - Fix some error codesDan Carpenter
Return negative -ENOMEM, instead of positive ENOMEM. Fixes: 0880bb3b00c8 ("crypto: tegra - Add Tegra Security Engine driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Akhil R <akhilrajeev@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26dt-bindings: crypto: starfive: Restore sort orderGeert Uytterhoeven
Restore alphabetical sort order of the list of supported compatible values. Fixes: 2ccf7a5d9c50f3ea ("dt-bindings: crypto: starfive: Add jh8100 support") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: qat - validate slices count returned by FWLucas Segarra Fernandez
The function adf_send_admin_tl_start() enables the telemetry (TL) feature on a QAT device by sending the ICP_QAT_FW_TL_START message to the firmware. This triggers the FW to start writing TL data to a DMA buffer in memory and returns an array containing the number of accelerators of each type (slices) supported by this HW. The pointer to this array is stored in the adf_tl_hw_data data structure called slice_cnt. The array slice_cnt is then used in the function tl_print_dev_data() to report in debugfs only statistics about the supported accelerators. An incorrect value of the elements in slice_cnt might lead to an out of bounds memory read. At the moment, there isn't an implementation of FW that returns a wrong value, but for robustness validate the slice count array returned by FW. Fixes: 69e7649f7cc2 ("crypto: qat - add support for device telemetry") Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: aead,cipher - zeroize key buffer after useHailey Mothershead
I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding cryptographic information should be zeroized once they are no longer needed. Accomplish this by using kfree_sensitive for buffers that previously held the private key. Signed-off-by: Hailey Mothershead <hailmo@amazon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: arm64/aes-ce - Simplify round key load sequenceArd Biesheuvel
Tweak the round key logic so that they can be loaded using a single branchless sequence using overlapping loads. This is shorter and simpler, and puts the conditional branches based on the key size further apart, which might benefit microarchitectures that cannot record taken branches at every instruction. For these branches, use test-bit-branch instructions that don't clobber the condition flags. Note that none of this has any impact on performance, positive or otherwise (and the branch prediction benefit would only benefit AES-192 which nobody uses). It does make for nicer code, though. While at it, use \@ to generate the labels inside the macros, which is more robust than using fixed numbers, which could clash inadvertently. Also, bring aes-neon.S in line with these changes, including the switch to test-and-branch instructions, to avoid surprises in the future when we might start relying on the condition flags being preserved in the chaining mode wrappers in aes-modes.S Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: tegra - Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Fixes: 0880bb3b00c8 ("crypto: tegra - Add Tegra Security Engine driver") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Akhil R <akhilrajeev@nvidia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26thermal/debugfs: Fix two locking issues with thermal zone debugRafael J. Wysocki
With the current thermal zone locking arrangement in the debugfs code, user space can open the "mitigations" file for a thermal zone before the zone's debugfs pointer is set which will result in a NULL pointer dereference in tze_seq_start(). Moreover, thermal_debug_tz_remove() is not called under the thermal zone lock, so it can run in parallel with the other functions accessing the thermal zone's struct thermal_debugfs object. Then, it may clear tz->debugfs after one of those functions has checked it and the struct thermal_debugfs object may be freed prematurely. To address the first problem, pass a pointer to the thermal zone's struct thermal_debugfs object to debugfs_create_file() in thermal_debug_tz_add() and make tze_seq_start(), tze_seq_next(), tze_seq_stop(), and tze_seq_show() retrieve it from s->private instead of a pointer to the thermal zone object. This will ensure that tz_debugfs will be valid across the "mitigations" file accesses until thermal_debugfs_remove_id() called by thermal_debug_tz_remove() removes that file. To address the second problem, use tz->lock in thermal_debug_tz_remove() around the tz->debugfs value check (in case the same thermal zone is removed at the same time in two different threads) and its reset to NULL. Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes") Cc :6.8+ <stable@vger.kernel.org> # 6.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26thermal/debugfs: Free all thermal zone debug memory on zone removalRafael J. Wysocki
Because thermal_debug_tz_remove() does not free all memory allocated for thermal zone diagnostics, some of that memory becomes unreachable after freeing the thermal zone's struct thermal_debugfs object. Address this by making thermal_debug_tz_remove() free all of the memory in question. Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes") Cc :6.8+ <stable@vger.kernel.org> # 6.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26net/sched: fix false lockdep warning on qdisc root lockDavide Caratti
Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ #598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ #598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <maxim@isovalent.com> CC: Xiumei Mu <xmu@redhat.com> Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/451 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26fs: Create anon_inode_getfile_fmode()Dawid Osuchowski
Creates an anon_inode_getfile_fmode() function that works similarly to anon_inode_getfile() with the addition of being able to set the fmode member. Signed-off-by: Dawid Osuchowski <linux@osuchow.ski> Link: https://lore.kernel.org/r/20240426075854.4723-1-linux@osuchow.ski Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-26gpio: brcmstb: add support for gpio-rangesDoug Berger
A pin controller device mapped with the gpio-ranges property will need implementations of the .request and .free members of the gpiochip. Signed-off-by: Doug Berger <opendmb@gmail.com> Tested-by: Phil Elwell <phil@raspberrypi.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240424185039.1707812-4-opendmb@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-04-26gpio: of: support gpio-ranges for multiple gpiochip devicesDoug Berger
Some drivers (e.g. gpio-mt7621 and gpio-brcmstb) have multiple gpiochip banks within a single device. Unfortunately, the gpio-ranges property of the device node was being applied to every gpiochip of the device with device relative GPIO offset values rather than gpiochip relative GPIO offset values. This commit makes use of the gpio_chip offset value which can be non-zero for such devices to split the device node gpio-ranges property into GPIO offset ranges that can be applied to each of the relevant gpiochips of the device. Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240424185039.1707812-3-opendmb@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-04-26dt-bindings: gpio: brcmstb: add gpio-rangesDoug Berger
Add optional gpio-ranges device-tree property to the Broadcom Set-Top-Box GPIO controller. Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240424185039.1707812-2-opendmb@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>