summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-04selftests: add regression test for br_netfilter panicAndy Roulin
Add a new netfilter selftests to test against br_netfilter panics when VxLAN single-device is used together with untagged traffic and high MTU. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Andy Roulin <aroulin@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241001154400.22787-3-aroulin@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04netfilter: br_netfilter: fix panic with metadata_dst skbAndy Roulin
Fix a kernel panic in the br_netfilter module when sending untagged traffic via a VxLAN device. This happens during the check for fragmentation in br_nf_dev_queue_xmit. It is dependent on: 1) the br_netfilter module being loaded; 2) net.bridge.bridge-nf-call-iptables set to 1; 3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port; 4) untagged frames with size higher than the VxLAN MTU forwarded/flooded When forwarding the untagged packet to the VxLAN bridge port, before the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL. Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check for frames that needs to be fragmented: frames with higher MTU than the VxLAN device end up calling br_nf_ip_fragment, which in turns call ip_skb_dst_mtu. The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst with valid dst->dev, thus the crash. This case was never supported in the first place, so drop the packet instead. PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data. [ 176.291791] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000110 [ 176.292101] Mem abort info: [ 176.292184] ESR = 0x0000000096000004 [ 176.292322] EC = 0x25: DABT (current EL), IL = 32 bits [ 176.292530] SET = 0, FnV = 0 [ 176.292709] EA = 0, S1PTW = 0 [ 176.292862] FSC = 0x04: level 0 translation fault [ 176.293013] Data abort info: [ 176.293104] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 176.293488] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 176.293787] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000 [ 176.294166] [0000000000000110] pgd=0000000000000000, p4d=0000000000000000 [ 176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth br_netfilter bridge stp llc ipv6 crct10dif_ce [ 176.295923] CPU: 0 PID: 188 Comm: ping Not tainted 6.8.0-rc3-g5b3fbd61b9d1 #2 [ 176.296314] Hardware name: linux,dummy-virt (DT) [ 176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter] [ 176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter] [ 176.297636] sp : ffff800080003630 [ 176.297743] x29: ffff800080003630 x28: 0000000000000008 x27: ffff6828c49ad9f8 [ 176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24: 00000000000003e8 [ 176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21: ffff6828c3b16d28 [ 176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18: 0000000000000014 [ 176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15: 0000000095744632 [ 176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12: ffffb7e137926a70 [ 176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 : 0000000000000000 [ 176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 : f20e0100bebafeca [ 176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 : 0000000000000000 [ 176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 : ffff6828c7f918f0 [ 176.300889] Call trace: [ 176.301123] br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter] [ 176.301411] br_nf_post_routing+0x2a8/0x3e4 [br_netfilter] [ 176.301703] nf_hook_slow+0x48/0x124 [ 176.302060] br_forward_finish+0xc8/0xe8 [bridge] [ 176.302371] br_nf_hook_thresh+0x124/0x134 [br_netfilter] [ 176.302605] br_nf_forward_finish+0x118/0x22c [br_netfilter] [ 176.302824] br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter] [ 176.303136] br_nf_forward+0x2b8/0x4e0 [br_netfilter] [ 176.303359] nf_hook_slow+0x48/0x124 [ 176.303803] __br_forward+0xc4/0x194 [bridge] [ 176.304013] br_flood+0xd4/0x168 [bridge] [ 176.304300] br_handle_frame_finish+0x1d4/0x5c4 [bridge] [ 176.304536] br_nf_hook_thresh+0x124/0x134 [br_netfilter] [ 176.304978] br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter] [ 176.305188] br_nf_pre_routing+0x250/0x524 [br_netfilter] [ 176.305428] br_handle_frame+0x244/0x3cc [bridge] [ 176.305695] __netif_receive_skb_core.constprop.0+0x33c/0xecc [ 176.306080] __netif_receive_skb_one_core+0x40/0x8c [ 176.306197] __netif_receive_skb+0x18/0x64 [ 176.306369] process_backlog+0x80/0x124 [ 176.306540] __napi_poll+0x38/0x17c [ 176.306636] net_rx_action+0x124/0x26c [ 176.306758] __do_softirq+0x100/0x26c [ 176.307051] ____do_softirq+0x10/0x1c [ 176.307162] call_on_irq_stack+0x24/0x4c [ 176.307289] do_softirq_own_stack+0x1c/0x2c [ 176.307396] do_softirq+0x54/0x6c [ 176.307485] __local_bh_enable_ip+0x8c/0x98 [ 176.307637] __dev_queue_xmit+0x22c/0xd28 [ 176.307775] neigh_resolve_output+0xf4/0x1a0 [ 176.308018] ip_finish_output2+0x1c8/0x628 [ 176.308137] ip_do_fragment+0x5b4/0x658 [ 176.308279] ip_fragment.constprop.0+0x48/0xec [ 176.308420] __ip_finish_output+0xa4/0x254 [ 176.308593] ip_finish_output+0x34/0x130 [ 176.308814] ip_output+0x6c/0x108 [ 176.308929] ip_send_skb+0x50/0xf0 [ 176.309095] ip_push_pending_frames+0x30/0x54 [ 176.309254] raw_sendmsg+0x758/0xaec [ 176.309568] inet_sendmsg+0x44/0x70 [ 176.309667] __sys_sendto+0x110/0x178 [ 176.309758] __arm64_sys_sendto+0x28/0x38 [ 176.309918] invoke_syscall+0x48/0x110 [ 176.310211] el0_svc_common.constprop.0+0x40/0xe0 [ 176.310353] do_el0_svc+0x1c/0x28 [ 176.310434] el0_svc+0x34/0xb4 [ 176.310551] el0t_64_sync_handler+0x120/0x12c [ 176.310690] el0t_64_sync+0x190/0x194 [ 176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860) [ 176.315743] ---[ end trace 0000000000000000 ]--- [ 176.316060] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000 [ 176.316564] PHYS_OFFSET: 0xffff97d780000000 [ 176.316782] CPU features: 0x0,88000203,3c020000,0100421b [ 176.317210] Memory Limit: none [ 176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal Exception in interrupt ]---\ Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Andy Roulin <aroulin@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241001154400.22787-2-aroulin@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04Merge tag 'gpio-fixes-for-v6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix a potential NULL-pointer dereference in gpiolib core - fix a probe() regression from the v6.12 merge window and an older bug leading to missed interrupts in gpio-davinci * tag 'gpio-fixes-for-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: Fix potential NULL pointer dereference in gpiod_get_label() gpio: davinci: Fix condition for irqchip registration gpio: davinci: fix lazy disable
2024-10-04Merge tag 'sound-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Slightly high amount of changes in this round, partly because of my vacation in the last weeks. But all changes are small and nothing looks worrisome. The biggest LOCs is MAINTAINERS updates, and there is a core change for card-ID string creation for non-ASCII inputs. Others are rather device-specific, such as new quirks and device IDs for ASoC, usual HD-audio and USB-audio quirks and fixes, as well as regression fixes in HD-audio HDMI audio and Conexant codec" * tag 'sound-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits) ALSA: hda/conexant: Fix conflicting quirk for System76 Pangolin ALSA: line6: add hw monitor volume control to POD HD500X ALSA: gus: Fix some error handling paths related to get_bpos() usage ALSA: hda: Add missing parameter description for snd_hdac_stream_timecounter_init() ALSA: usb-audio: Add native DSD support for Luxman D-08u ALSA: core: add isascii() check to card ID generator MAINTAINERS: ALSA: use linux-sound@vger.kernel.org list Revert "ALSA: hda: Conditionally use snooping for AMD HDMI" ASoC: intel: sof_sdw: Add check devm_kasprintf() returned value ASoC: imx-card: Set card.owner to avoid a warning calltrace if SND=m ASoC: dt-bindings: davinci-mcasp: Fix interrupts property ASoC: qcom: sm8250: add qrb4210-rb2-sndcard compatible string ASoC: dt-bindings: qcom,sm8250: add qrb4210-rb2-sndcard ALSA: hda: fix trigger_tstamp_latched ALSA: hda/realtek: Add a quirk for HP Pavilion 15z-ec200 ALSA: hda/generic: Drop obsoleted obey_preferred_dacs flag ALSA: hda/generic: Unconditionally prefer preferred_dacs pairs ALSA: silence integer wrapping warning ASoC: Intel: soc-acpi: arl: Fix some missing empty terminators ASoC: Intel: soc-acpi-intel-rpl-match: add missing empty item ...
2024-10-04Merge tag 'drm-fixes-2024-10-04' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly fixes, xe and amdgpu lead the way, with panthor, and few core components getting various fixes. Nothing seems too out of the ordinary. atomic: - Use correct type when reading damage rectangles display: - Fix kernel docs dp-mst: - Fix DSC decompression detection hdmi: - Fix infoframe size sched: - Update maintainers - Fix race condition whne queueing up jobs - Fix locking in drm_sched_entity_modify_sched() - Fix pointer deref if entity queue changes sysfb: - Disable sysfb if framebuffer parent device is unknown amdgpu: - DML2 fix - DSC fix - Dispclk fix - eDP HDR fix - IPS fix - TBT fix i915: - One fix for bitwise and logical "and" mixup in PM code xe: - Restore pci state on resume - Fix locking on submission, queue and vm - Fix UAF on queue destruction - Fix resource release on freq init error path - Use rw_semaphore to reduce contention on ASID->VM lookup - Fix steering for media on Xe2_HPM - Tuning updates to Xe2 - Resume TDR after GT reset to prevent jobs running forever - Move id allocation to avoid userspace using a guessed number to trigger UAF - Fix OA stream close preventing pbatch buffers to complete - Fix NPD when migrating memory on LNL - Fix memory leak when aborting binds panthor: - Fix locking - Set FOP_UNSIGNED_OFFSET in fops instance - Acquire lock in panthor_vm_prepare_map_op_ctx() - Avoid uninitialized variable in tick_ctx_cleanup() - Do not block scheduler queue if work is pending - Do not add write fences to the shared BOs vbox: - Fix VLA handling" * tag 'drm-fixes-2024-10-04' of https://gitlab.freedesktop.org/drm/kernel: (41 commits) drm/xe: Fix memory leak when aborting binds drm/xe: Prevent null pointer access in xe_migrate_copy drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close drm/xe/queue: move xa_alloc to prevent UAF drm/xe/vm: move xa_alloc to prevent UAF drm/xe: Clean up VM / exec queue file lock usage. drm/xe: Resume TDR after GT reset drm/xe/xe2: Add performance tuning for L3 cache flushing drm/xe/xe2: Extend performance tuning to media GT drm/xe/mcr: Use Xe2_LPM steering tables for Xe2_HPM drm/xe: Use helper for ASID -> VM in GPU faults and access counters drm/xe: Convert to USM lock to rwsem drm/xe: use devm_add_action_or_reset() helper drm/xe: fix UAF around queue destruction drm/xe/guc_submit: add missing locking in wedged_fini drm/xe: Restore pci state upon resume drm/amd/display: Fix system hang while resume with TBT monitor drm/amd/display: Enable idle workqueue for more IPS modes drm/amd/display: Add HDR workaround for specific eDP drm/amd/display: avoid set dispclk to 0 ...
2024-10-04net: dsa: sja1105: fix reception from VLAN-unaware bridgesVladimir Oltean
The blamed commit introduced an unexpected regression in the sja1105 driver. Packets from VLAN-unaware bridge ports get received correctly, but the protocol stack can't seem to decode them properly. For ds->untag_bridge_pvid users (thus also sja1105), the blamed commit did introduce a functional change: dsa_switch_rcv() used to call dsa_untag_bridge_pvid(), which looked like this: err = br_vlan_get_proto(br, &proto); if (err) return skb; /* Move VLAN tag from data to hwaccel */ if (!skb_vlan_tag_present(skb) && skb->protocol == htons(proto)) { skb = skb_vlan_untag(skb); if (!skb) return NULL; } and now it calls dsa_software_vlan_untag() which has just this: /* Move VLAN tag from data to hwaccel */ if (!skb_vlan_tag_present(skb)) { skb = skb_vlan_untag(skb); if (!skb) return NULL; } thus lacks any skb->protocol == bridge VLAN protocol check. That check is deferred until a later check for skb->vlan_proto (in the hwaccel area). The new code is problematic because, for VLAN-untagged packets, skb_vlan_untag() blindly takes the 4 bytes starting with the EtherType and turns them into a hwaccel VLAN tag. This is what breaks the protocol stack. It would be tempting to "make it work as before" and only call skb_vlan_untag() for those packets with the skb->protocol actually representing a VLAN. But the premise of the newly introduced dsa_software_vlan_untag() core function is not wrong. Drivers set ds->untag_bridge_pvid or ds->untag_vlan_aware_bridge_pvid presumably because they send all traffic to the CPU reception path as VLAN-tagged. So why should we spend any additional CPU cycles assuming that the packet may be VLAN-untagged? And why does the sja1105 driver opt into ds->untag_bridge_pvid if it doesn't always deliver packets to the CPU as VLAN-tagged? The answer to the latter question is indeed more interesting: it doesn't need to. This got done in commit 884be12f8566 ("net: dsa: sja1105: add support for imprecise RX"), because I thought it would be needed, but I didn't realize that it doesn't actually make a difference. As explained in the commit message of the blamed patch, ds->untag_bridge_pvid only makes a difference in the VLAN-untagged receive path of a bridge port. However, in that operating mode, tag_sja1105.c makes use of VLAN tags with the ETH_P_SJA1105 TPID, and it decodes and consumes these VLAN tags as if they were DSA tags (aka tag_8021q operation). Even if commit 884be12f8566 ("net: dsa: sja1105: add support for imprecise RX") added this logic in sja1105_bridge_vlan_add(): /* Always install bridge VLANs as egress-tagged on the CPU port. */ if (dsa_is_cpu_port(ds, port)) flags = 0; that was for _bridge_ VLANs, which are _not_ committed to hardware in VLAN-unaware mode (aka the mode where ds->untag_bridge_pvid does anything at all). Even prior to that change, the tag_8021q VLANs were always installed as egress-tagged on the CPU port, see dsa_switch_tag_8021q_vlan_add(): u16 flags = 0; // egress-tagged, non-PVID if (dsa_port_is_user(dp)) flags |= BRIDGE_VLAN_INFO_UNTAGGED | BRIDGE_VLAN_INFO_PVID; err = dsa_port_do_tag_8021q_vlan_add(dp, info->vid, flags); if (err) return err; Whether the sja1105 driver needs the new flag, ds->untag_vlan_aware_bridge_pvid, rather than ds->untag_bridge_pvid, is a separate discussion. To fix the current bug in VLAN-unaware bridge mode, I would argue that the sja1105 driver should not request something it doesn't need, rather than complicating the core DSA helper. Whereas before the blamed commit, this setting was harmless, now it has caused breakage. Fixes: 93e4649efa96 ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241001140206.50933-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04Merge tag 'block-6.12-20241004' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix another use-after-free in aoe - Fixup wrong nested non-saving irq disable/restore in blk-iocost - Fixup a kerneldoc complaint introduced by a merge window patch * tag 'block-6.12-20241004' of git://git.kernel.dk/linux: aoe: fix the potential use-after-free problem in more places blk_iocost: remove some duplicate irq disable/enables block: fix blk_rq_map_integrity_sg kernel-doc
2024-10-04Merge tag 'io_uring-6.12-20241004' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Fix an error path memory leak, if one part fails to allocate. Obviously not something that'll generally hit without error injection. - Fix an io_req_flags_t cast to make sparse happier. - Improve the recv multishot termination. Not a bug now, but could be one in the future. This makes it do the same thing that recvmsg does in terms of when to terminate a request or not. * tag 'io_uring-6.12-20241004' of git://git.kernel.dk/linux: io_uring/net: harden multishot termination case for recv io_uring: fix casts to io_req_flags_t io_uring: fix memory leak when cache init fail
2024-10-04Merge tag 'fsnotify_for_v6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "Fixes for an inotify deadlock and a data race in fsnotify" * tag 'fsnotify_for_v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Fix possible deadlock in fsnotify_destroy_mark fsnotify: Avoid data race between fsnotify_recalc_mask() and fsnotify_object_watched()
2024-10-04Merge tag 'fs_for_v6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fixes from Jan Kara: "A couple of UDF error handling fixes for issues spotted by syzbot" * tag 'fs_for_v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: fix uninit-value use in udf_get_fileshortad udf: refactor inode_bmap() to handle error udf: refactor udf_next_aext() to handle error udf: refactor udf_current_aext() to handle error
2024-10-04Merge tag 'ceph-for-6.12-rc2' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A fix from Patrick for a variety of CephFS lockup scenarios caused by a regression in cap handling which sneaked in through the netfs helper library in 5.18 (marked for stable) and an unrelated one-line cleanup" * tag 'ceph-for-6.12-rc2' of https://github.com/ceph/ceph-client: ceph: fix cap ref leak via netfs init_request ceph: use struct_size() helper in __ceph_pool_perm_get()
2024-10-04Merge branches 'acpi-video' and 'acpi-battery'Rafael J. Wysocki
Merge an ACPI backlight (video) quirk and ACPI battery driver fix and cleanup for 6.12-rc2: - Add a quirk for Dell OptiPlex 5480 AIO to the ACPI backlight (video) driver (Hans de Goede). - Prevent the ACPI battery driver from crashing when unregistering a battery hook and simplify battery hook locking in it (Armin Wolf). * acpi-video: ACPI: video: Add backlight=native quirk for Dell OptiPlex 5480 AIO * acpi-battery: ACPI: battery: Fix possible crash when unregistering a battery hook ACPI: battery: Simplify battery hook locking
2024-10-04Merge tag 'for-6.12-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - in incremental send, fix invalid clone operation for file that got its size decreased - fix __counted_by() annotation of send path cache entries, we do not store the terminating NUL - fix a longstanding bug in relocation (and quite hard to hit by chance), drop back reference cache that can get out of sync after transaction commit - wait for fixup worker kthread before finishing umount - add missing raid-stripe-tree extent for NOCOW files, zoned mode cannot have NOCOW files but RST is meant to be a standalone feature - handle transaction start error during relocation, avoid potential NULL pointer dereference of relocation control structure (reported by syzbot) - disable module-wide rate limiting of debug level messages - minor fix to tracepoint definition (reported by checkpatch.pl) * tag 'for-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: disable rate limiting when debug enabled btrfs: wait for fixup workers before stopping cleaner kthread during umount btrfs: fix a NULL pointer dereference when failed to start a new trasacntion btrfs: send: fix invalid clone operation for file that got its size decreased btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent event class btrfs: drop the backref cache during relocation if we commit btrfs: also add stripe entries for NOCOW writes btrfs: send: fix buffer overflow detection when copying path to cache entry
2024-10-04Merge tag 'v6.12-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - statfs fix (e.g. when limited access to root directory of share) - special file handling fixes: fix packet validation to avoid buffer overflow for reparse points, fixes for symlink path parsing (one for reparse points, and one for SFU use case), and fix for cleanup after failed SET_REPARSE operation. - fix for SMB2.1 signing bug introduced by recent patch to NFS symlink path, and NFS reparse point validation - comment cleanup * tag 'v6.12-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Do not convert delimiter when parsing NFS-style symlinks cifs: Validate content of NFS reparse point buffer cifs: Fix buffer overflow when parsing NFS reparse points smb: client: Correct typos in multiple comments across various files smb: client: use actual path when queryfs cifs: Remove intermediate object of failed create reparse call Revert "smb: client: make SHA-512 TFM ephemeral" smb: Update comments about some reparse point tags cifs: Check for UTF-16 null codepoint in SFU symlink target location
2024-10-04Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull close_range() fix from Al Viro: "Fix the logic in descriptor table trimming" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: close_range(): fix the logics in descriptor table trimming
2024-10-04Merge tag 'i2c-host-fixes-6.12-rc2' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current i2c-host fixes for v6.12-rc2 In the stm32f7 a potential deadlock is fixed during runtime suspend and resume.
2024-10-04tomoyo: revert CONFIG_SECURITY_TOMOYO_LKM supportPaul Moore
This patch reverts two TOMOYO patches that were merged into Linus' tree during the v6.12 merge window: 8b985bbfabbe ("tomoyo: allow building as a loadable LSM module") 268225a1de1a ("tomoyo: preparation step for building as a loadable LSM module") Together these two patches introduced the CONFIG_SECURITY_TOMOYO_LKM Kconfig build option which enabled a TOMOYO specific dynamic LSM loading mechanism (see the original commits for more details). Unfortunately, this approach was widely rejected by the LSM community as well as some members of the general kernel community. Objections included concerns over setting a bad precedent regarding individual LSMs managing their LSM callback registrations as well as general kernel symbol exporting practices. With little to no support for the CONFIG_SECURITY_TOMOYO_LKM approach outside of Tetsuo, and multiple objections, we need to revert these changes. Link: https://lore.kernel.org/all/0c4b443a-9c72-4800-97e8-a3816b6a9ae2@I-love.SAKURA.ne.jp Link: https://lore.kernel.org/all/CAHC9VhR=QjdoHG3wJgHFJkKYBg7vkQH2MpffgVzQ0tAByo_wRg@mail.gmail.com Acked-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-10-04arm64: Subscribe Microsoft Azure Cobalt 100 to erratum 3194386Easwar Hariharan
Add the Microsoft Azure Cobalt 100 CPU to the list of CPUs suffering from erratum 3194386 added in commit 75b3c43eab59 ("arm64: errata: Expand speculative SSBS workaround") CC: Mark Rutland <mark.rutland@arm.com> CC: James More <james.morse@arm.com> CC: Will Deacon <will@kernel.org> CC: stable@vger.kernel.org # 6.6+ Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/20241003225239.321774-1-eahariha@linux.microsoft.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-04ALSA: hda/conexant: Fix conflicting quirk for System76 PangolinTakashi Iwai
We received a regression report for System76 Pangolin (pang14) due to the recent fix for Tuxedo Sirius devices to support the top speaker. The reason was the conflicting PCI SSID, as often seen. As a workaround, now the codec SSID is checked and the quirk is applied conditionally only to Sirius devices. Fixes: 4178d78cd7a8 ("ALSA: hda/conexant: Add pincfg quirk to enable top speakers on Sirius devices") Reported-by: Christian Heusel <christian@heusel.eu> Reported-by: Jerry <jerryluo225@gmail.com> Closes: https://lore.kernel.org/c930b6a6-64e5-498f-b65a-1cd5e0a1d733@heusel.eu Link: https://patch.msgid.link/20241004082602.29016-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-10-04ALSA: line6: add hw monitor volume control to POD HD500XHans P. Moller
Add hw monitor volume control for POD HD500X. This is done adding LINE6_CAP_HWMON_CTL to the capabilities Signed-off-by: Hans P. Moller <hmoller@uc.cl> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20241003232828.5819-1-hmoller@uc.cl
2024-10-04ALSA: gus: Fix some error handling paths related to get_bpos() usageChristophe JAILLET
If get_bpos() fails, it is likely that the corresponding error code should be returned. Fixes: a6970bb1dd99 ("ALSA: gus: Convert to the new PCM ops") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/d9ca841edad697154afa97c73a5d7a14919330d9.1727984008.git.christophe.jaillet@wanadoo.fr Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-10-04Merge tag 'drm-xe-fixes-2024-10-03' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Restore pci state on resume (Rodrigo Vivi) - Fix locking on submission, queue and vm (Matthew Auld, Matthew Brost) - Fix UAF on queue destruction (Matthew Auld) - Fix resource release on freq init error path (He Lugang) - Use rw_semaphore to reduce contention on ASID->VM lookup (Matthew Brost) - Fix steering for media on Xe2_HPM (Gustavo Sousa) - Tuning updates to Xe2 (Gustavo Sousa) - Resume TDR after GT reset to prevent jobs running forever (Matthew Brost) - Move id allocation to avoid userspace using a guessed number to trigger UAF (Matthew Auld, Matthew Brost) - Fix OA stream close preventing pbatch buffers to complete (José) - Fix NPD when migrating memory on LNL (Zhanjun Dong) - Fix memory leak when aborting binds (Matthew Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/2fiv63yanlal5mpw3mxtotte6yvkvtex74c7mkjxca4bazlyja@o4iejcfragxy
2024-10-03Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-09-30 (ice, idpf) This series contains updates to ice and idpf drivers: For ice: Michal corrects setting of dst VSI on LAN filters and adds clearing of port VLAN configuration during reset. Gui-Dong Han corrects failures to decrement refcount in some error paths. Przemek resolves a memory leak in ice_init_tx_topology(). Arkadiusz prevents setting of DPLL_PIN_STATE_SELECTABLE to an improper value. Dave stops clearing of VLAN tracking bit to allow for VLANs to be properly restored after reset. For idpf: Ahmed sets uninitialized dyn_ctl_intrvl_s value. Josh corrects use and reporting of mailbox size. Larysa corrects order of function calls during de-initialization. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: idpf: deinit virtchnl transaction manager after vport and vectors idpf: use actual mbx receive payload length idpf: fix VF dynamic interrupt ctl register initialization ice: fix VLAN replay after reset ice: disallow DPLL_PIN_STATE_SELECTABLE for dpll output pins ice: fix memleak in ice_init_tx_topology() ice: clear port vlan config during reset ice: Fix improper handling of refcount in ice_sriov_set_msix_vec_count() ice: Fix improper handling of refcount in ice_dpll_init_rclk_pins() ice: set correct dst VSI in only LAN filters ==================== Link: https://patch.msgid.link/20240930223601.3137464-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03Merge tag 'rust-fixes-6.12' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull Rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Fix/improve a couple 'depends on' on the newly added CFI/KASAN suppport to avoid build errors/warnings - Fix ARCH_SLAB_MINALIGN multiple definition error for RISC-V under !CONFIG_MMU - Clean upcoming (Rust 1.83.0) Clippy warnings 'kernel' crate: - 'sync' module: fix soundness issue by requiring 'T: Sync' for 'LockedBy::access'; and fix helpers build error under PREEMPT_RT - Fix trivial sorting issue ('rustfmtcheck') on the v6.12 Rust merge" * tag 'rust-fixes-6.12' of https://github.com/Rust-for-Linux/linux: rust: kunit: use C-string literals to clean warning cfi: encode cfi normalized integers + kasan/gcov bug in Kconfig rust: KASAN+RETHUNK requires rustc 1.83.0 rust: cfi: fix `patchable-function-entry` starting version rust: mutex: fix __mutex_init() usage in case of PREEMPT_RT rust: fix `ARCH_SLAB_MINALIGN` multiple definition error rust: sync: require `T: Sync` for `LockedBy::access` rust: kernel: sort Rust modules
2024-10-03Merge tag 'pull-fixes.ufs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ufs fix from Al Viro: "Fix ufs_rename() braino introduced this cycle. The 'folio_release_kmap(dir_folio, new_dir)' in ufs_rename() part of folio conversion should've been getting a pointer to ufs directory entry within the page, rather than a pointer to directory struct inode..." * tag 'pull-fixes.ufs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs_rename(): fix bogus argument of folio_release_kmap()
2024-10-03Documentation: networking/tcp_ao: typo and grammar fixesLeo Stone
Fix multiple grammatical issues and add a missing period to improve readability. Signed-off-by: Leo Stone <leocstone@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240929005001.370991-1-leocstone@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03Merge branch 'rxrpc-miscellaneous-fixes'Jakub Kicinski
David Howells says: ==================== rxrpc: Miscellaneous fixes Here some miscellaneous fixes for AF_RXRPC: (1) Fix a race in the I/O thread vs UDP socket setup. (2) Fix an uninitialised variable. ==================== Link: https://patch.msgid.link/20241001132702.3122709-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03rxrpc: Fix uninitialised variable in rxrpc_send_data()David Howells
Fix the uninitialised txb variable in rxrpc_send_data() by moving the code that loads it above all the jumps to maybe_error, txb being stored back into call->tx_pending right before the normal return. Fixes: b0f571ecd794 ("rxrpc: Fix locking in rxrpc's sendmsg") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lists.infradead.org/pipermail/linux-afs/2024-October/008896.html Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241001132702.3122709-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03rxrpc: Fix a race between socket set up and I/O thread creationDavid Howells
In rxrpc_open_socket(), it sets up the socket and then sets up the I/O thread that will handle it. This is a problem, however, as there's a gap between the two phases in which a packet may come into rxrpc_encap_rcv() from the UDP packet but we oops when trying to wake the not-yet created I/O thread. As a quick fix, just make rxrpc_encap_rcv() discard the packet if there's no I/O thread yet. A better, but more intrusive fix would perhaps be to rearrange things such that the socket creation is done by the I/O thread. Fixes: a275da62e8c1 ("rxrpc: Create a per-local endpoint receive queue and I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: yuxuanzhe@outlook.com cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241001132702.3122709-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03Merge branch 'tcp-3-fixes-for-retrans_stamp-and-undo-logic'Jakub Kicinski
Neal Cardwell says: ==================== tcp: 3 fixes for retrans_stamp and undo logic Geumhwan Yu <geumhwan.yu@samsung.com> recently reported and diagnosed a regression in TCP loss recovery undo logic in the case where a TCP connection enters fast recovery, is unable to retransmit anything due to TSQ, and then receives an ACK allowing forward progress. The sender should be able to undo the spurious loss recovery in this case, but was not doing so. The first patch fixes this regression. Running our suite of packetdrill tests with the first fix, the tests highlighted two other small bugs in the way retrans_stamp is updated in some rare corner cases. The second two patches fix those other two small bugs. Thanks to Geumhwan Yu for the bug report! ==================== Link: https://patch.msgid.link/20241001200517.2756803-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03tcp: fix TFO SYN_RECV to not zero retrans_stamp with retransmits outNeal Cardwell
Fix tcp_rcv_synrecv_state_fastopen() to not zero retrans_stamp if retransmits are outstanding. tcp_fastopen_synack_timer() sets retrans_stamp, so typically we'll need to zero retrans_stamp here to prevent spurious retransmits_timed_out(). The logic to zero retrans_stamp is from this 2019 commit: commit cd736d8b67fb ("tcp: fix retrans timestamp on passive Fast Open") However, in the corner case where the ACK of our TFO SYNACK carried some SACK blocks that caused us to enter TCP_CA_Recovery then that non-zero retrans_stamp corresponds to the active fast recovery, and we need to leave retrans_stamp with its current non-zero value, for correct ETIMEDOUT and undo behavior. Fixes: cd736d8b67fb ("tcp: fix retrans timestamp on passive Fast Open") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241001200517.2756803-4-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safeNeal Cardwell
Fix tcp_enter_recovery() so that if there are no retransmits out then we zero retrans_stamp when entering fast recovery. This is necessary to fix two buggy behaviors. Currently a non-zero retrans_stamp value can persist across multiple back-to-back loss recovery episodes. This is because we generally only clears retrans_stamp if we are completely done with loss recoveries, and get to tcp_try_to_open() and find !tcp_any_retrans_done(sk). This behavior causes two bugs: (1) When a loss recovery episode (CA_Loss or CA_Recovery) is followed immediately by a new CA_Recovery, the retrans_stamp value can persist and can be a time before this new CA_Recovery episode starts. That means that timestamp-based undo will be using the wrong retrans_stamp (a value that is too old) when comparing incoming TS ecr values to retrans_stamp to see if the current fast recovery episode can be undone. (2) If there is a roughly minutes-long sequence of back-to-back fast recovery episodes, one after another (e.g. in a shallow-buffered or policed bottleneck), where each fast recovery successfully makes forward progress and recovers one window of sequence space (but leaves at least one retransmit in flight at the end of the recovery), followed by several RTOs, then the ETIMEDOUT check may be using the wrong retrans_stamp (a value set at the start of the first fast recovery in the sequence). This can cause a very premature ETIMEDOUT, killing the connection prematurely. This commit changes the code to zero retrans_stamp when entering fast recovery, when this is known to be safe (no retransmits are out in the network). That ensures that when starting a fast recovery episode, and it is safe to do so, retrans_stamp is set when we send the fast retransmit packet. That addresses both bug (1) and bug (2) by ensuring that (if no retransmits are out when we start a fast recovery) we use the initial fast retransmit of this fast recovery as the time value for undo and ETIMEDOUT calculations. This makes intuitive sense, since the start of a new fast recovery episode (in a scenario where no lost packets are out in the network) means that the connection has made forward progress since the last RTO or fast recovery, and we should thus "restart the clock" used for both undo and ETIMEDOUT logic. Note that if when we start fast recovery there *are* retransmits out in the network, there can still be undesirable (1)/(2) issues. For example, after this patch we can still have the (1) and (2) problems in cases like this: + round 1: sender sends flight 1 + round 2: sender receives SACKs and enters fast recovery 1, retransmits some packets in flight 1 and then sends some new data as flight 2 + round 3: sender receives some SACKs for flight 2, notes losses, and retransmits some packets to fill the holes in flight 2 + fast recovery has some lost retransmits in flight 1 and continues for one or more rounds sending retransmits for flight 1 and flight 2 + fast recovery 1 completes when snd_una reaches high_seq at end of flight 1 + there are still holes in the SACK scoreboard in flight 2, so we enter fast recovery 2, but some retransmits in the flight 2 sequence range are still in flight (retrans_out > 0), so we can't execute the new retrans_stamp=0 added here to clear retrans_stamp It's not yet clear how to fix these remaining (1)/(2) issues in an efficient way without breaking undo behavior, given that retrans_stamp is currently used for undo and ETIMEDOUT. Perhaps the optimal (but expensive) strategy would be to set retrans_stamp to the timestamp of the earliest outstanding retransmit when entering fast recovery. But at least this commit makes things better. Note that this does not change the semantics of retrans_stamp; it simply makes retrans_stamp accurate in some cases where it was not before: (1) Some loss recovery, followed by an immediate entry into a fast recovery, where there are no retransmits out when entering the fast recovery. (2) When a TFO server has a SYNACK retransmit that sets retrans_stamp, and then the ACK that completes the 3-way handshake has SACK blocks that trigger a fast recovery. In this case when entering fast recovery we want to zero out the retrans_stamp from the TFO SYNACK retransmit, and set the retrans_stamp based on the timestamp of the fast recovery. We introduce a tcp_retrans_stamp_cleanup() helper, because this two-line sequence already appears in 3 places and is about to appear in 2 more as a result of this bug fix patch series. Once this bug fix patches series in the net branch makes it into the net-next branch we'll update the 3 other call sites to use the new helper. This is a long-standing issue. The Fixes tag below is chosen to be the oldest commit at which the patch will apply cleanly, which is from Linux v3.5 in 2012. Fixes: 1fbc340514fc ("tcp: early retransmit: tcp_enter_recovery()") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241001200517.2756803-3-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03tcp: fix to allow timestamp undo if no retransmits were sentNeal Cardwell
Fix the TCP loss recovery undo logic in tcp_packet_delayed() so that it can trigger undo even if TSQ prevents a fast recovery episode from reaching tcp_retransmit_skb(). Geumhwan Yu <geumhwan.yu@samsung.com> recently reported that after this commit from 2019: commit bc9f38c8328e ("tcp: avoid unconditional congestion window undo on SYN retransmit") ...and before this fix we could have buggy scenarios like the following: + Due to reordering, a TCP connection receives some SACKs and enters a spurious fast recovery. + TSQ prevents all invocations of tcp_retransmit_skb(), because many skbs are queued in lower layers of the sending machine's network stack; thus tp->retrans_stamp remains 0. + The connection receives a TCP timestamp ECR value echoing a timestamp before the fast recovery, indicating that the fast recovery was spurious. + The connection fails to undo the spurious fast recovery because tp->retrans_stamp is 0, and thus tcp_packet_delayed() returns false, due to the new logic in the 2019 commit: commit bc9f38c8328e ("tcp: avoid unconditional congestion window undo on SYN retransmit") This fix tweaks the logic to be more similar to the tcp_packet_delayed() logic before bc9f38c8328e, except that we take care not to be fooled by the FLAG_SYN_ACKED code path zeroing out tp->retrans_stamp (the bug noted and fixed by Yuchung in bc9f38c8328e). Note that this returns the high-level behavior of tcp_packet_delayed() to again match the comment for the function, which says: "Nothing was retransmitted or returned timestamp is less than timestamp of the first retransmission." Note that this comment is in the original 2005-04-16 Linux git commit, so this is evidently long-standing behavior. Fixes: bc9f38c8328e ("tcp: avoid unconditional congestion window undo on SYN retransmit") Reported-by: Geumhwan Yu <geumhwan.yu@samsung.com> Diagnosed-by: Geumhwan Yu <geumhwan.yu@samsung.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241001200517.2756803-2-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03Merge branch 'fix-aqr-pma-capabilities'Jakub Kicinski
Abhishek Chauhan says: ==================== Fix AQR PMA capabilities Patch 1:- AQR115c reports incorrect PMA capabilities which includes 10G/5G and also incorrectly disables capabilities like autoneg and 10Mbps support. AQR115c as per the Marvell databook supports speeds up to 2.5Gbps with autonegotiation. Patch 2:- Remove the use of phy_set_max_speed in phy driver as the function is mainly used in MAC driver to set the max speed. Instead use get_features to fix up Phy PMA capabilities for AQR111, AQR111B0, AQR114C and AQCS109 ==================== Link: https://patch.msgid.link/20241001224626.2400222-1-quic_abchauha@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03net: phy: aquantia: remove usage of phy_set_max_speedAbhishek Chauhan
Remove the use of phy_set_max_speed in phy driver as the function is mainly used in MAC driver to set the max speed. Instead use get_features to fix up Phy PMA capabilities for AQR111, AQR111B0, AQR114C and AQCS109 Fixes: 038ba1dc4e54 ("net: phy: aquantia: add AQR111 and AQR111B0 PHY ID") Fixes: 0974f1f03b07 ("net: phy: aquantia: remove false 5G and 10G speed ability for AQCS109") Fixes: c278ec644377 ("net: phy: aquantia: add support for AQR114C PHY ID") Link: https://lore.kernel.org/all/20240913011635.1286027-1-quic_abchauha@quicinc.com/T/ Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20241001224626.2400222-3-quic_abchauha@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03net: phy: aquantia: AQR115c fix up PMA capabilitiesAbhishek Chauhan
AQR115c reports incorrect PMA capabilities which includes 10G/5G and also incorrectly disables capabilities like autoneg and 10Mbps support. AQR115c as per the Marvell databook supports speeds up to 2.5Gbps with autonegotiation. Fixes: 0ebc581f8a4b ("net: phy: aquantia: add support for aqr115c") Link: https://lore.kernel.org/all/20240913011635.1286027-1-quic_abchauha@quicinc.com/T/ Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20241001224626.2400222-2-quic_abchauha@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03sched: psi: fix bogus pressure spikes from aggregation raceJohannes Weiner
Brandon reports sporadic, non-sensical spikes in cumulative pressure time (total=) when reading cpu.pressure at a high rate. This is due to a race condition between reader aggregation and tasks changing states. While it affects all states and all resources captured by PSI, in practice it most likely triggers with CPU pressure, since scheduling events are so frequent compared to other resource events. The race context is the live snooping of ongoing stalls during a pressure read. The read aggregates per-cpu records for stalls that have concluded, but will also incorporate ad-hoc the duration of any active state that hasn't been recorded yet. This is important to get timely measurements of ongoing stalls. Those ad-hoc samples are calculated on-the-fly up to the current time on that CPU; since the stall hasn't concluded, it's expected that this is the minimum amount of stall time that will enter the per-cpu records once it does. The problem is that the path that concludes the state uses a CPU clock read that is not synchronized against aggregators; the clock is read outside of the seqlock protection. This allows aggregators to race and snoop a stall with a longer duration than will actually be recorded. With the recorded stall time being less than the last snapshot remembered by the aggregator, a subsequent sample will underflow and observe a bogus delta value, resulting in an erratic jump in pressure. Fix this by moving the clock read of the state change into the seqlock protection. This ensures no aggregation can snoop live stalls past the time that's recorded when the state concludes. Reported-by: Brandon Duffany <brandon@buildbuddy.io> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219194 Link: https://lore.kernel.org/lkml/20240827121851.GB438928@cmpxchg.org/ Fixes: df77430639c9 ("psi: Reduce calls to sched_clock() in psi") Cc: stable@vger.kernel.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-03KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMUPaolo Bonzini
As was tried in commit 4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs, need to be zapped. All of the accounting for a shadow page is tied to the memslot, i.e. the shadow page holds a reference to the memslot, for all intents and purposes. Deleting the memslot without removing all relevant shadow pages, as is done when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled, results in NULL pointer derefs when tearing down the VM. Reintroduce from that commit the code that walks the whole memslot when there are active shadow MMU pages. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-03sfc: Don't invoke xdp_do_flush() from netpoll.Sebastian Andrzej Siewior
Yury reported a crash in the sfc driver originated from netpoll_send_udp(). The netconsole sends a message and then netpoll invokes the driver's NAPI function with a budget of zero. It is dedicated to allow driver to free TX resources, that it may have used while sending the packet. In the netpoll case the driver invokes xdp_do_flush() unconditionally, leading to crash because bpf_net_context was never assigned. Invoke xdp_do_flush() only if budget is not zero. Fixes: 401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") Reported-by: Yury Vostrikov <mon@unformed.ru> Closes: https://lore.kernel.org/5627f6d1-5491-4462-9d75-bc0612c26a22@app.fastmail.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20241002125837.utOcRo6Y@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03net: phy: dp83869: fix memory corruption when enabling fiberIngo van Lil
When configuring the fiber port, the DP83869 PHY driver incorrectly calls linkmode_set_bit() with a bit mask (1 << 10) rather than a bit number (10). This corrupts some other memory location -- in case of arm64 the priv pointer in the same structure. Since the advertising flags are updated from supported at the end of the function the incorrect line isn't needed at all and can be removed. Fixes: a29de52ba2a1 ("net: dp83869: Add ability to advertise Fiber connection") Signed-off-by: Ingo van Lil <inguin@gmx.de> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241002161807.440378-1-inguin@gmx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03tracing/hwlat: Fix a race during cpuhp processingWei Li
The cpuhp online/offline processing race also exists in percpu-mode hwlat tracer in theory, apply the fix too. That is: T1 | T2 [CPUHP_ONLINE] | cpu_device_down() hwlat_hotplug_workfn() | | cpus_write_lock() | takedown_cpu(1) | cpus_write_unlock() [CPUHP_OFFLINE] | cpus_read_lock() | start_kthread(1) | cpus_read_unlock() | Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20240924094515.3561410-5-liwei391@huawei.com Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations") Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03tracing/timerlat: Fix a race during cpuhp processingWei Li
There is another found exception that the "timerlat/1" thread was scheduled on CPU0, and lead to timer corruption finally: ``` ODEBUG: init active (active state 0) object: ffff888237c2e108 object type: hrtimer hint: timerlat_irq+0x0/0x220 WARNING: CPU: 0 PID: 426 at lib/debugobjects.c:518 debug_print_object+0x7d/0xb0 Modules linked in: CPU: 0 UID: 0 PID: 426 Comm: timerlat/1 Not tainted 6.11.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:debug_print_object+0x7d/0xb0 ... Call Trace: <TASK> ? __warn+0x7c/0x110 ? debug_print_object+0x7d/0xb0 ? report_bug+0xf1/0x1d0 ? prb_read_valid+0x17/0x20 ? handle_bug+0x3f/0x70 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? debug_print_object+0x7d/0xb0 ? debug_print_object+0x7d/0xb0 ? __pfx_timerlat_irq+0x10/0x10 __debug_object_init+0x110/0x150 hrtimer_init+0x1d/0x60 timerlat_main+0xab/0x2d0 ? __pfx_timerlat_main+0x10/0x10 kthread+0xb7/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ``` After tracing the scheduling event, it was discovered that the migration of the "timerlat/1" thread was performed during thread creation. Further analysis confirmed that it is because the CPU online processing for osnoise is implemented through workers, which is asynchronous with the offline processing. When the worker was scheduled to create a thread, the CPU may has already been removed from the cpu_online_mask during the offline process, resulting in the inability to select the right CPU: T1 | T2 [CPUHP_ONLINE] | cpu_device_down() osnoise_hotplug_workfn() | | cpus_write_lock() | takedown_cpu(1) | cpus_write_unlock() [CPUHP_OFFLINE] | cpus_read_lock() | start_kthread(1) | cpus_read_unlock() | To fix this, skip online processing if the CPU is already offline. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20240924094515.3561410-4-liwei391@huawei.com Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03tracing/timerlat: Drop interface_lock in stop_kthread()Wei Li
stop_kthread() is the offline callback for "trace/osnoise:online", since commit 5bfbcd1ee57b ("tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()"), the following ABBA deadlock scenario is introduced: T1 | T2 [BP] | T3 [AP] osnoise_hotplug_workfn() | work_for_cpu_fn() | cpuhp_thread_fun() | _cpu_down() | osnoise_cpu_die() mutex_lock(&interface_lock) | | stop_kthread() | cpus_write_lock() | mutex_lock(&interface_lock) cpus_read_lock() | cpuhp_kick_ap() | As the interface_lock here in just for protecting the "kthread" field of the osn_var, use xchg() instead to fix this issue. Also use for_each_online_cpu() back in stop_per_cpu_kthreads() as it can take cpu_read_lock() again. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20240924094515.3561410-3-liwei391@huawei.com Fixes: 5bfbcd1ee57b ("tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()") Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03tracing/timerlat: Fix duplicated kthread creation due to CPU online/offlineWei Li
osnoise_hotplug_workfn() is the asynchronous online callback for "trace/osnoise:online". It may be congested when a CPU goes online and offline repeatedly and is invoked for multiple times after a certain online. This will lead to kthread leak and timer corruption. Add a check in start_kthread() to prevent this situation. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20240924094515.3561410-2-liwei391@huawei.com Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03x86/ftrace: Include <asm/ptrace.h>Sami Tolvanen
<asm/ftrace.h> uses struct pt_regs in several places. Include <asm/ptrace.h> to ensure it's visible. This is needed to make sure object files that only include <asm/asm-prototypes.h> compile. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/20240916221557.846853-2-samitolvanen@google.com Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03rtla: Fix the help text in osnoise and timerlat top toolsEder Zulian
The help text in osnoise top and timerlat top had some minor errors and omissions. The -d option was missing the 's' (second) abbreviation and the error message for '-d' used '-D'. Cc: stable@vger.kernel.org Fixes: 1eceb2fc2ca54 ("rtla/osnoise: Add osnoise top mode") Fixes: a828cd18bc4ad ("rtla: Add timerlat tool and timelart top mode") Link: https://lore.kernel.org/20240813155831.384446-1-ezulian@redhat.com Suggested-by: Tomas Glozar <tglozar@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Eder Zulian <ezulian@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03tools/rtla: Fix installation from out-of-tree buildBen Hutchings
rtla now supports out-of-tree builds, but installation fails as it still tries to install the rtla binary from the source tree. Use the existing macro $(RTLA) to refer to the binary. Link: https://lore.kernel.org/ZudubuoU_JHjPZ7w@decadent.org.uk Fixes: 01474dc706ca ("tools/rtla: Use tools/build makefiles to build rtla") Reviewed-by: Tomas Glozar <tglozar@redhat.com> Tested-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Ben Hutchings <benh@debian.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03tracing: Fix trace_check_vprintf() when tp_printk is usedSteven Rostedt
When the tp_printk kernel command line is used, the trace events go directly to printk(). It is still checked via the trace_check_vprintf() function to make sure the pointers of the trace event are legit. The addition of reading buffers from previous boots required adding a delta between the addresses of the previous boot and the current boot so that the pointers in the old buffer can still be used. But this required adding a trace_array pointer to acquire the delta offsets. The tp_printk code does not provide a trace_array (tr) pointer, so when the offsets were examined, a NULL pointer dereference happened and the kernel crashed. If the trace_array does not exist, just default the delta offsets to zero, as that also means the trace event is not being read from a previous boot. Link: https://lore.kernel.org/all/Zv3z5UsG_jsO9_Tb@aschofie-mobl2.lan/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241003104925.4e1b1fd9@gandalf.local.home Fixes: 07714b4bb3f98 ("tracing: Handle old buffer mappings for event strings and functions") Reported-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03gpiolib: Fix potential NULL pointer dereference in gpiod_get_label()Lad Prabhakar
In `gpiod_get_label()`, it is possible that `srcu_dereference_check()` may return a NULL pointer, leading to a scenario where `label->str` is accessed without verifying if `label` itself is NULL. This patch adds a proper NULL check for `label` before accessing `label->str`. The check for `label->str != NULL` is removed because `label->str` can never be NULL if `label` is not NULL. This fixes the issue where the label name was being printed as `(efault)` when dumping the sysfs GPIO file when `label == NULL`. Fixes: 5a646e03e956 ("gpiolib: Return label, if set, for IRQ only line") Fixes: a86d27693066 ("gpiolib: fix the speed of descriptor label setting with SRCU") Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20241003131351.472015-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-10-03KVM: arm64: Fix kvm_has_feat*() handling of negative featuresMarc Zyngier
Oliver reports that the kvm_has_feat() helper is not behaviing as expected for negative feature. On investigation, the main issue seems to be caused by the following construct: #define get_idreg_field(kvm, id, fld) \ (id##_##fld##_SIGNED ? \ get_idreg_field_signed(kvm, id, fld) : \ get_idreg_field_unsigned(kvm, id, fld)) where one side of the expression evaluates as something signed, and the other as something unsigned. In retrospect, this is totally braindead, as the compiler converts this into an unsigned expression. When compared to something that is 0, the test is simply elided. Epic fail. Similar issue exists in the expand_field_sign() macro. The correct way to handle this is to chose between signed and unsigned comparisons, so that both sides of the ternary expression are of the same type (bool). In order to keep the code readable (sort of), we introduce new comparison primitives taking an operator as a parameter, and rewrite the kvm_has_feat*() helpers in terms of these primitives. Fixes: c62d7a23b947 ("KVM: arm64: Add feature checking helpers") Reported-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Oliver Upton <oliver.upton@linux.dev> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241002204239.2051637-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>