summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-17drm/xe/migrate: Fix alignment checkLucas De Marchi
The check would fail if the address is unaligned, but not when accounting the offset. Instead of `buf | offset` it should have been `buf + offset`. To make it more readable and also drop the uintptr_t, just use the IS_ALIGNED() macro. Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250710-migrate-aligned-v1-1-44003ef3c078@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 81e139db6900503a2e68009764054fad128fbf95) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17drm/xe: Move page fault init after topology initMatthew Brost
We need the topology to determine GT page fault queue size, move page fault init after topology init. Cc: stable@vger.kernel.org Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250710191208.1040215-1-matthew.brost@intel.com (cherry picked from commit beb72acb5b38dbe670d8eb752d1ad7a32f9c4119) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17drm/xe/mocs: Initialize MOCS index earlyBalasubramani Vivekanandan
MOCS uc_index is used even before it is initialized in the following callstack guc_prepare_xfer() __xe_guc_upload() xe_guc_min_load_for_hwconfig() xe_uc_init_hwconfig() xe_gt_init_hwconfig() Do MOCS index initialization earlier in the device probe. Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250520142445.2792824-1-balasubramani.vivekanandan@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 241cc827c0987d7173714fc5a95a7c8fc9bf15c0) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17Merge tag 'nf-25-07-17' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: 1) Three patches to enhance conntrack selftests for resize and clash resolution, from Florian Westphal. 2) Expand nft_concat_range.sh selftest to improve coverage from error path, from Florian Westphal. 3) Hide clash bit to userspace from netlink dumps until there is a good reason to expose, from Florian Westphal. 4) Revert notification for device registration/unregistration for nftables basechains and flowtables, we decided to go for a better way to handle this through the nfnetlink_hook infrastructure which will come via nf-next, patch from Phil Sutter. 5) Fix crash in conntrack due to race related to SLAB_TYPESAFE_BY_RCU that results in removing a recycled object that is not yet in the hashes. Move IPS_CONFIRM setting after the object is in the hashes. From Florian Westphal. netfilter pull request 25-07-17 * tag 'nf-25-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_conntrack: fix crash due to removal of uninitialised entry Revert "netfilter: nf_tables: Add notifications for hook changes" netfilter: nf_tables: hide clash bit from userspace selftests: netfilter: nft_concat_range.sh: send packets to empty set selftests: netfilter: conntrack_resize.sh: also use udpclash tool selftests: netfilter: add conntrack clash resolution test case selftests: netfilter: conntrack_resize.sh: extend resize test ==================== Link: https://patch.msgid.link/20250717095808.41725-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17drm/xe/migrate: fix copy direction in access_memoryMatthew Auld
After we do the modification on the host side, ensure we write the result back to VRAM and not the other way around, otherwise the modification will be lost if treated like a read. Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710134128.800756-2-matthew.auld@intel.com (cherry picked from commit c12fe703cab93f9d8bfe0ff32b58e7b1fd52be1f) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17drm/xe: Dont skip TLB invalidations on VFTejas Upadhyay
Skipping TLB invalidations on VF causing unrecoverable faults. Probable reason for skipping TLB invalidations on SRIOV could be lack of support for instruction MI_FLUSH_DW_STORE_INDEX. Add back TLB flush with some additional handling. Helps in resolving, [ 704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] ASID: 0 VFID: 0 PDATA: 0x0d92 Faulted Address: 0x0000000002fa0000 FaultType: 0 AccessType: 1 FaultLevel: 0 EngineClass: 3 bcs EngineInstance: 8 [ 704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22 V2: - Use Xmas tree (MichalW) Suggested-by: Matthew Brost <matthew.brost@intel.com> Fixes: 97515d0b3ed92 ("drm/xe/vf: Don't emit access to Global HWSP if VF") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250710045945.1023840-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit b528e896fa570844d654b5a4617a97fa770a1030) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17Merge tag 'nvme-6.16-2025-07-17' of git://git.infradead.org/nvme into block-6.16Jens Axboe
Pull NVMe fixes from Christoph: "- revert the cross-controller atomic write size validation that caused regressions (Christoph Hellwig) - fix endianness of command word prints in nvme_log_err_passthru() (John Garry) - fix callback lock for TLS handshake (Maurizio Lombardi) - fix misaccounting of nvme-mpath inflight I/O (Yu Kuai) - fix inconsistent RCU list manipulation in nvme_ns_add_to_ctrl_list() (Zheng Qixing)" * tag 'nvme-6.16-2025-07-17' of git://git.infradead.org/nvme: nvmet-tcp: fix callback lock for TLS handshake nvme: fix misaccounting of nvme-mpath inflight I/O nvme: revert the cross-controller atomic write size validation nvme: fix endianness of command word prints in nvme_log_err_passthru() nvme: fix inconsistent RCU list manipulation in nvme_ns_add_to_ctrl_list()
2025-07-17netfilter: nf_conntrack: fix crash due to removal of uninitialised entryFlorian Westphal
A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check passes ct->timeout has been altered to reflect the absolute 'best before' date instead of a relative time. Step 3 will therefore not remove the entry. Without this change to nf_ct_should_gc() we could still get this sequence: 1. Check if entry has expired. 2. Obtain a reference. 3. Call nf_ct_should_gc() to double-check step 1: 4 - entry is still observed as expired 5 - meanwhile, ct->timeout is corrected to absolute value on other CPU and confirm bit gets set 6 - confirm bit is seen 7 - valid entry is removed again First do check 6), then 4) so the gc expiry check always picks up either confirmed bit unset (entry gets skipped) or expiry re-check failure for re-inited conntrack objects. This change cannot be backported to releases before 5.19. Without commit 8a75a2c17410 ("netfilter: conntrack: remove unconfirmed list") |= IPS_CONFIRMED line cannot be moved without further changes. Cc: Razvan Cojocaru <rzvncj@gmail.com> Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/ Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/ Fixes: 1397af5bfd7d ("netfilter: conntrack: remove the percpu dying list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-17net: fix segmentation after TCP/UDP fraglist GROFelix Fietkau
Since "net: gro: use cb instead of skb->network_header", the skb network header is no longer set in the GRO path. This breaks fraglist segmentation, which relies on ip_hdr()/tcp_hdr() to check for address/port changes. Fix this regression by selectively setting the network header for merged segment skbs. Fixes: 186b1ea73ad8 ("net: gro: use cb instead of skb->network_header") Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250705150622.10699-1-nbd@nbd.name Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17gpiolib: devres: release GPIOs in devm_gpiod_put_array()André Draszik
devm_gpiod_put_array() is meant to undo the effects of devm_gpiod_get_array() - in particular, it should release the GPIOs contained in the array acquired with the latter. It is meant to be the resource-managed version of gpiod_put_array(), and it should behave similar to the non-array version devm_gpiod_put(). Since commit d1d52c6622a6 ("gpiolib: devres: Finish the conversion to use devm_add_action()") it doesn't do that anymore, it just removes the devres action and frees associated memory, but it doesn't actually release the GPIOs. Fix by switching from devm_remove_action() to devm_release_action(), which will in addition invoke the action to release the GPIOs. Fixes: d1d52c6622a6 ("gpiolib: devres: Finish the conversion to use devm_add_action()") Signed-off-by: André Draszik <andre.draszik@linaro.org> Reported-by: Wattson CI <wattson-external@google.com> Reported-by: Samuel Wu <wusamuel@google.com> Link: https://lore.kernel.org/r/20250715-gpiolib-devres-put-array-fix-v1-1-970d82a8c887@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-07-16ipv6: mcast: Delay put pmc->idev in mld_del_delrec()Yue Haibing
pmc->idev is still used in ip6_mc_clear_src(), so as mld_clear_delrec() does, the reference should be put after ip6_mc_clear_src() return. Fixes: 63ed8de4be81 ("mld: add mc_lock for protecting per-interface mld data") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250714141957.3301871-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16Merge branch 's390-bpf-fix-bpf_arch_text_poke-with-new_addr-null-again'Alexei Starovoitov
Ilya Leoshkevich says: ==================== s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL again This series fixes a regression causing perf on s390 to trigger a kernel panic. Patch 1 fixes the issue, patch 2 adds a test to make sure this doesn't happen again. ==================== Link: https://patch.msgid.link/20250716194524.48109-1-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16selftests/bpf: Stress test attaching a BPF prog to another BPF progIlya Leoshkevich
Add a test that invokes a BPF prog in a loop, while concurrently attaching and detaching another BPF prog to and from it. This helps identifying race conditions in bpf_arch_text_poke(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250716194524.48109-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL againIlya Leoshkevich
Commit 7ded842b356d ("s390/bpf: Fix bpf_plt pointer arithmetic") has accidentally removed the critical piece of commit c730fce7c70c ("s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL"), causing intermittent kernel panics in e.g. perf's on_switch() prog to reappear. Restore the fix and add a comment. Fixes: 7ded842b356d ("s390/bpf: Fix bpf_plt pointer arithmetic") Cc: stable@vger.kernel.org Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250716194524.48109-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16sched/ext: Prevent update_locked_rq() calls with NULL rqBreno Leitao
Avoid invoking update_locked_rq() when the runqueue (rq) pointer is NULL in the SCX_CALL_OP and SCX_CALL_OP_RET macros. Previously, calling update_locked_rq(NULL) with preemption enabled could trigger the following warning: BUG: using __this_cpu_write() in preemptible [00000000] This happens because __this_cpu_write() is unsafe to use in preemptible context. rq is NULL when an ops invoked from an unlocked context. In such cases, we don't need to store any rq, since the value should already be NULL (unlocked). Ensure that update_locked_rq() is only called when rq is non-NULL, preventing calling __this_cpu_write() on preemptible context. Suggested-by: Peter Zijlstra <peterz@infradead.org> Fixes: 18853ba782bef ("sched_ext: Track currently locked rq") Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v6.15
2025-07-16Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-07-15 (ixgbe, fm10k, i40e, ice) Arnd Bergmann resolves compile issues with large NR_CPUS for ixgbe, fm10k, and i40e. For ice: Dave adds a NULL check for LAG netdev. Michal corrects a pointer check in debugfs initialization. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: check correct pointer in fwlog debugfs ice: add NULL check in eswitch lag check ethernet: intel: fix building with large NR_CPUS ==================== Link: https://patch.msgid.link/20250715202948.3841437-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net: airoha: fix potential use-after-free in airoha_npu_get()Alok Tiwari
np->name was being used after calling of_node_put(np), which releases the node and can lead to a use-after-free bug. Previously, of_node_put(np) was called unconditionally after of_find_device_by_node(np), which could result in a use-after-free if pdev is NULL. This patch moves of_node_put(np) after the error check to ensure the node is only released after both the error and success cases are handled appropriately, preventing potential resource issues. Fixes: 23290c7bc190 ("net: airoha: Introduce Airoha NPU support") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250715143102.3458286-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16net/mlx5: Correctly set gso_size when LRO is usedChristoph Paasch
gso_size is expected by the networking stack to be the size of the payload (thus, not including ethernet/IP/TCP-headers). However, cqe_bcnt is the full sized frame (including the headers). Dividing cqe_bcnt by lro_num_seg will then give incorrect results. For example, running a bpftrace higher up in the TCP-stack (tcp_event_data_recv), we commonly have gso_size set to 1450 or 1451 even though in reality the payload was only 1448 bytes. This can have unintended consequences: - In tcp_measure_rcv_mss() len will be for example 1450, but. rcv_mss will be 1448 (because tp->advmss is 1448). Thus, we will always recompute scaling_ratio each time an LRO-packet is received. - In tcp_gro_receive(), it will interfere with the decision whether or not to flush and thus potentially result in less gro'ed packets. So, we need to discount the protocol headers from cqe_bcnt so we can actually divide the payload by lro_num_seg to get the real gso_size. v2: - Use "(unsigned char *)tcp + tcp->doff * 4 - skb->data)" to compute header-len (Tariq Toukan <tariqt@nvidia.com>) - Improve commit-message (Gal Pressman <gal@nvidia.com>) Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Signed-off-by: Christoph Paasch <cpaasch@openai.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250715-cpaasch-pf-925-investigate-incorrect-gso_size-on-cx-7-nic-v2-1-e06c3475f3ac@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16bcachefs: Fix bch2_maybe_casefold() when CONFIG_UTF8=nKent Overstreet
maybe_casefold() shouldn't have been nooped, just bch2_casefold(). Fixes: 94426e4201fb ("bcachefs: opts.casefold_disabled") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-16bcachefs: Fix build when CONFIG_UNICODE=nKent Overstreet
94426e4201fb, which added the killswitch for casefolding, accidentally removed some of the ifdefs we need to avoid build errors. It appears we need better build testing for different configurations, it took two weeks for the robots to catch this one. Fixes: 94426e4201fb ("bcachefs: opts.casefold_disabled") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-16bcachefs: Fix reference to invalid bucket in copygcKent Overstreet
Use bch2_dev_bucket_tryget() instead of bch2_dev_tryget() before checking the bucket bitmap. Reported-by: syzbot+3168625f36f4a539237e@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-16bcachefs: Don't build aux search tree when still repairing nodeKent Overstreet
bch2_btree_node_drop_keys_outside_node() will (re)build aux search trees, because it's also called by topology repair. bch2_btree_node_read_done() was calling it before validating individual keys; invalid ones have to be dropped. If we call drop_keys_outside_node() first, then bch2_bset_build_aux_tree() doesn't run because the node already has an aux search tree - which was invalidated by the repair. Reported-by: syzbot+c5e7a66b3b23ae65d44f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-16bcachefs: Tweak threshold for allocator triggering discardsKent Overstreet
The allocator path has a "if we're really low on free buckets, check if we should issue discards" - tweak this to also trigger discards if more than 1/128th of the device is in need_discard state. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-16bcachefs: Fix triggering of discard by the journal pathKent Overstreet
It becomes possible to do discards after a journal flush, which naturally the journal code is reponsible for. A prior refactoring seems to have broken this - which went unnoticed because the foreground allocator path can also trigger discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-07-16drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resumeEeli Haapalainen
Commit 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") made the ring pointer always to be reset on resume from suspend. This caused compute rings to fail since the reset was done without also resetting it for the firmware. Reset wptr on the GPU to avoid a disconnect between the driver and firmware wptr. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911 Fixes: 42cdf6f687da ("drm/amdgpu/gfx8: always restore kcq MQDs") Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2becafc319db3d96205320f31cc0de4ee5a93747) Cc: stable@vger.kernel.org
2025-07-16drm/amdgpu: Increase reset counter only on successLijo Lazar
Increment the reset counter only if soft recovery succeeded. This is consistent with a ring hard reset behaviour where counter gets incremented only if hard reset succeeded. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 25c314aa3ec3d30e4ee282540e2096b5c66a2437) Cc: stable@vger.kernel.org
2025-07-16drm/radeon: Do not hold console lock during resumeThomas Zimmermann
The function radeon_resume_kms() acquires the console lock. It is inconsistent, as it depends on the notify_client argument. That lock then covers a number of suspend operations that are unrelated to the console. Remove the calls to console_lock() and console_unlock() from the radeon function. The console lock is only required by DRM's fbdev emulation, which acquires it as necessary. Also fixes a possible circular dependency between the console lock and the client-list mutex, where the mutex is supposed to be taken first. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fff8e0504499a929f26e2fb7cf7e2c9854e37b91)
2025-07-16drm/radeon: Do not hold console lock while suspending clientsThomas Zimmermann
The radeon driver holds the console lock while suspending in-kernel DRM clients. This creates a circular dependency with the client-list mutex, which is supposed to be acquired first. Reported when combining radeon with another DRM driver. Therefore, do not take the console lock in radeon, but let the fbdev DRM client acquire the lock when needed. This is what all other DRM drivers so. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reported-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Closes: https://lore.kernel.org/dri-devel/0a087cfd-bd4c-48f1-aa2f-4a3b12593935@oss.qualcomm.com/ Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 612ec7c69d04cb58beb1332c2806da9f2f47a3ae)
2025-07-16drm/amd/display: Disable CRTC degamma LUT for DCN401Melissa Wen
In DCN401 pre-blending degamma LUT isn't affecting cursor as in previous DCN version. As this is not the behavior close to what is expected for CRTC degamma LUT, disable CRTC degamma LUT property in this HW. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4176 --- When enabling HDR on KDE, it takes the first CRTC 1D LUT available and apply a color transformation (Gamma 2.2 -> PQ). AMD driver usually advertises a CRTC degamma LUT as the first CRTC 1D LUT, but it's actually applied pre-blending. In previous HW version, it seems to work fine because the 1D LUT was applied to cursor too, but DCN401 presents a different behavior and the 1D LUT isn't affecting the hardware cursor. To address the wrong gamma on cursor with HDR (see the link), I came up with this patch that disables CRTC degamma LUT in this hw, since it presents a different behavior than others. With this KDE sees CRTC regamma LUT as the first post-blending 1D LUT available. This is actually more consistent with AMD color pipeline. It was tested by the reporter, since I don't have the HW available for local testing and debugging. Melissa --- Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 340231cdceec2c45995d773a358ca3c341f151aa) Cc: stable@vger.kernel.org
2025-07-16drm/amd/display: Free memory allocationClayton King
[WHY] Free memory to avoid memory leak Reviewed-by: Joshua Aberback <joshua.aberback@amd.com> Signed-off-by: Clayton King <clayton.king@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fa699acb8e9be2341ee318077fa119acc7d5f329) Cc: stable@vger.kernel.org
2025-07-16Merge tag 'probes-fixes-v6.16-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: - fprobe-event: The @params variable was being used in an error path without being initialized. The fix to return an error code. * tag 'probes-fixes-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/probes: Avoid using params uninitialized in parse_btf_arg()
2025-07-16Bluetooth: btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant ↵Zijun Hu
without board ID For GF variant of WCN6855 without board ID programmed btusb_generate_qca_nvm_name() will chose wrong NVM 'qca/nvm_usb_00130201.bin' to download. Fix by choosing right NVM 'qca/nvm_usb_00130201_gf.bin'. Also simplify NVM choice logic of btusb_generate_qca_nvm_name(). Fixes: d6cba4e6d0e2 ("Bluetooth: btusb: Add support using different nvm for variant WCN6855 controller") Signed-off-by: Zijun Hu <zijun.hu@oss.qualcomm.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_dev: replace 'quirks' integer by 'quirk_flags' bitmapChristian Eggers
The 'quirks' member already ran out of bits on some platforms some time ago. Replace the integer member by a bitmap in order to have enough bits in future. Replace raw bit operations by accessor macros. Fixes: ff26b2dd6568 ("Bluetooth: Add quirk for broken READ_VOICE_SETTING") Fixes: 127881334eaa ("Bluetooth: Add quirk for broken READ_PAGE_SCAN_TYPE") Suggested-by: Pauli Virtanen <pav@iki.fi> Tested-by: Ivan Pravdin <ipravdin.official@gmail.com> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_core: add missing braces when using macro parametersChristian Eggers
Macro parameters should always be put into braces when accessing it. Fixes: 4fc9857ab8c6 ("Bluetooth: hci_sync: Add check simultaneous roles support") Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_core: fix typos in macrosChristian Eggers
The provided macro parameter is named 'dev' (rather than 'hdev', which may be a variable on the stack where the macro is used). Fixes: a9a830a676a9 ("Bluetooth: hci_event: Fix sending HCI_OP_READ_ENC_KEY_SIZE") Fixes: 6126ffabba6b ("Bluetooth: Introduce HCI_CONN_FLAG_DEVICE_PRIVACY device flag") Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeoutLuiz Augusto von Dentz
This replaces the usage of HCI_ERROR_REMOTE_USER_TERM, which as the name suggest is to indicate a regular disconnection initiated by an user, with HCI_ERROR_AUTH_FAILURE to indicate the session has timeout thus any pairing shall be considered as failed. Fixes: 1e91c29eb60c ("Bluetooth: Use hci_disconnect for immediate disconnection from SMP") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: SMP: If an unallowed command is received consider it a failureLuiz Augusto von Dentz
If a command is received while a bonding is ongoing consider it a pairing failure so the session is cleanup properly and the device is disconnected immediately instead of continuing with other commands that may result in the session to get stuck without ever completing such as the case bellow: > ACL Data RX: Handle 2048 flags 0x02 dlen 21 SMP: Identity Information (0x08) len 16 Identity resolving key[16]: d7e08edef97d3e62cd2331f82d8073b0 > ACL Data RX: Handle 2048 flags 0x02 dlen 21 SMP: Signing Information (0x0a) len 16 Signature key[16]: 1716c536f94e843a9aea8b13ffde477d Bluetooth: hci0: unexpected SMP command 0x0a from XX:XX:XX:XX:XX:XX > ACL Data RX: Handle 2048 flags 0x02 dlen 12 SMP: Identity Address Information (0x09) len 7 Address: XX:XX:XX:XX:XX:XX (Intel Corporate) While accourding to core spec 6.1 the expected order is always BD_ADDR first first then CSRK: When using LE legacy pairing, the keys shall be distributed in the following order: LTK by the Peripheral EDIV and Rand by the Peripheral IRK by the Peripheral BD_ADDR by the Peripheral CSRK by the Peripheral LTK by the Central EDIV and Rand by the Central IRK by the Central BD_ADDR by the Central CSRK by the Central When using LE Secure Connections, the keys shall be distributed in the following order: IRK by the Peripheral BD_ADDR by the Peripheral CSRK by the Peripheral IRK by the Central BD_ADDR by the Central CSRK by the Central According to the Core 6.1 for commands used for key distribution "Key Rejected" can be used: '3.6.1. Key distribution and generation A device may reject a distributed key by sending the Pairing Failed command with the reason set to "Key Rejected". Fixes: b28b4943660f ("Bluetooth: Add strict checks for allowed SMP PDUs") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: btintel: Check if controller is ISO capable on ↵Luiz Augusto von Dentz
btintel_classify_pkt_type Due to what seem to be a bug with variant version returned by some firmwares the code may set hdev->classify_pkt_type with btintel_classify_pkt_type when in fact the controller doesn't even support ISO channels feature but may use the handle range expected from a controllers that does causing the packets to be reclassified as ISO causing several bugs. To fix the above btintel_classify_pkt_type will attempt to check if the controller really supports ISO channels and in case it doesn't don't reclassify even if the handle range is considered to be ISO, this is considered safer than trying to fix the specific controller/firmware version as that could change over time and causing similar problems in the future. Link: https://bugzilla.kernel.org/show_bug.cgi?id=219553 Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2100565 Link: https://github.com/StarLabsLtd/firmware/issues/180 Fixes: f25b7fd36cc3 ("Bluetooth: Add vendor-specific packet classification for ISO data") Cc: stable@vger.kernel.org Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Sean Rhodes <sean@starlabs.systems>
2025-07-16Bluetooth: hci_sync: fix connectable extended advertising when using static ↵Alessandro Gasbarroni
random address Currently, the connectable flag used by the setup of an extended advertising instance drives whether we require privacy when trying to pass a random address to the advertising parameters (Own Address). If privacy is not required, then it automatically falls back to using the controller's public address. This can cause problems when using controllers that do not have a public address set, but instead use a static random address. e.g. Assume a BLE controller that does not have a public address set. The controller upon powering is set with a random static address by default by the kernel. < HCI Command: LE Set Random Address (0x08|0x0005) plen 6 Address: E4:AF:26:D8:3E:3A (Static) > HCI Event: Command Complete (0x0e) plen 4 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) Setting non-connectable extended advertisement parameters in bluetoothctl mgmt add-ext-adv-params -r 0x801 -x 0x802 -P 2M -g 1 correctly sets Own address type as Random < HCI Command: LE Set Extended Advertising Parameters (0x08|0x0036) plen 25 ... Own address type: Random (0x01) Setting connectable extended advertisement parameters in bluetoothctl mgmt add-ext-adv-params -r 0x801 -x 0x802 -P 2M -g -c 1 mistakenly sets Own address type to Public (which causes to use Public Address 00:00:00:00:00:00) < HCI Command: LE Set Extended Advertising Parameters (0x08|0x0036) plen 25 ... Own address type: Public (0x00) This causes either the controller to emit an Invalid Parameters error or to mishandle the advertising. This patch makes sure that we use the already set static random address when requesting a connectable extended advertising when we don't require privacy and our public address is not set (00:00:00:00:00:00). Fixes: 3fe318ee72c5 ("Bluetooth: move hci_get_random_address() to hci_sync") Signed-off-by: Alessandro Gasbarroni <alex.gasbarroni@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: Fix null-ptr-deref in l2cap_sock_resume_cb()Kuniyuki Iwashima
syzbot reported null-ptr-deref in l2cap_sock_resume_cb(). [0] l2cap_sock_resume_cb() has a similar problem that was fixed by commit 1bff51ea59a9 ("Bluetooth: fix use-after-free error in lock_sock_nested()"). Since both l2cap_sock_kill() and l2cap_sock_resume_cb() are executed under l2cap_sock_resume_cb(), we can avoid the issue simply by checking if chan->data is NULL. Let's not access to the killed socket in l2cap_sock_resume_cb(). [0]: BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:82 [inline] BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline] BUG: KASAN: null-ptr-deref in l2cap_sock_resume_cb+0xb4/0x17c net/bluetooth/l2cap_sock.c:1711 Write of size 8 at addr 0000000000000570 by task kworker/u9:0/52 CPU: 1 UID: 0 PID: 52 Comm: kworker/u9:0 Not tainted 6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Workqueue: hci0 hci_rx_work Call trace: show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:501 (C) __dump_stack+0x30/0x40 lib/dump_stack.c:94 dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120 print_report+0x58/0x84 mm/kasan/report.c:524 kasan_report+0xb0/0x110 mm/kasan/report.c:634 check_region_inline mm/kasan/generic.c:-1 [inline] kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:189 __kasan_check_write+0x20/0x30 mm/kasan/shadow.c:37 instrument_atomic_write include/linux/instrumented.h:82 [inline] clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline] l2cap_sock_resume_cb+0xb4/0x17c net/bluetooth/l2cap_sock.c:1711 l2cap_security_cfm+0x524/0xea0 net/bluetooth/l2cap_core.c:7357 hci_auth_cfm include/net/bluetooth/hci_core.h:2092 [inline] hci_auth_complete_evt+0x2e8/0xa4c net/bluetooth/hci_event.c:3514 hci_event_func net/bluetooth/hci_event.c:7511 [inline] hci_event_packet+0x650/0xe9c net/bluetooth/hci_event.c:7565 hci_rx_work+0x320/0xb18 net/bluetooth/hci_core.c:4070 process_one_work+0x7e8/0x155c kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3321 [inline] worker_thread+0x958/0xed8 kernel/workqueue.c:3402 kthread+0x5fc/0x75c kernel/kthread.c:464 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:847 Fixes: d97c899bde33 ("Bluetooth: Introduce L2CAP channel callback for resuming") Reported-by: syzbot+e4d73b165c3892852d22@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/686c12bd.a70a0220.29fe6c.0b13.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16riscv: uaccess: Fix -Wuninitialized and -Wshadow in __put_user_nocheckNathan Chancellor
After a recent change in clang to strengthen uninitialized warnings [1], there is a warning from val being uninitialized in __put_user_nocheck when called from futex_put_value(): kernel/futex/futex.h:326:18: warning: variable 'val' is uninitialized when used within its own initialization [-Wuninitialized] 326 | unsafe_put_user(val, to, Efault); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~ arch/riscv/include/asm/uaccess.h:464:21: note: expanded from macro 'unsafe_put_user' 464 | __put_user_nocheck(x, (ptr), label) | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~ arch/riscv/include/asm/uaccess.h:314:36: note: expanded from macro '__put_user_nocheck' 314 | __inttype(x) val = (__inttype(x))x; \ | ~~~ ^ While not on by default, -Wshadow flags the same mistake: kernel/futex/futex.h:326:2: warning: declaration shadows a local variable [-Wshadow] 326 | unsafe_put_user(val, to, Efault); | ^ arch/riscv/include/asm/uaccess.h:464:2: note: expanded from macro 'unsafe_put_user' 464 | __put_user_nocheck(x, (ptr), label) | ^ arch/riscv/include/asm/uaccess.h:314:16: note: expanded from macro '__put_user_nocheck' 314 | __inttype(x) val = (__inttype(x))x; \ | ^ kernel/futex/futex.h:320:48: note: previous declaration is here 320 | static __always_inline int futex_put_value(u32 val, u32 __user *to) | ^ Use a three underscore prefix for the val variable in __put_user_nocheck to avoid clashing with either val or __val, which are both used within the put_user macros, clearing up all warnings. Closes: https://github.com/ClangBuiltLinux/linux/issues/2109 Fixes: ca1a66cdd685 ("riscv: uaccess: do not do misaligned accesses in get/put_user()") Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250715-riscv-uaccess-fix-self-init-val-v1-1-82b8e911f120@kernel.org Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-07-16io_uring/poll: fix POLLERR handlingPavel Begunkov
8c8492ca64e7 ("io_uring/net: don't retry connect operation on EPOLLERR") is a little dirty hack that 1) wrongfully assumes that POLLERR equals to a failed request, which breaks all POLLERR users, e.g. all error queue recv interfaces. 2) deviates the connection request behaviour from connect(2), and 3) racy and solved at a wrong level. Nothing can be done with 2) now, and 3) is beyond the scope of the patch. At least solve 1) by moving the hack out of generic poll handling into io_connect(). Cc: stable@vger.kernel.org Fixes: 8c8492ca64e79 ("io_uring/net: don't retry connect operation on EPOLLERR") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3dc89036388d602ebd84c28e5042e457bdfc952b.1752682444.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-16riscv: Stop supporting static ftraceAlexandre Ghiti
Now that DYNAMIC_FTRACE was introduced, there is no need to support static ftrace as it is way less performant. This simplifies the code and prevents build failures as reported by kernel test robot when !DYNAMIC_FTRACE. Also make sure that FUNCTION_TRACER can only be selected if DYNAMIC_FTRACE is supported (we have a dependency on the toolchain). Co-developed-by: chenmiao <chenmiao.ku@gmail.com> Signed-off-by: chenmiao <chenmiao.ku@gmail.com> Fixes: b2137c3b6d7a ("riscv: ftrace: prepare ftrace for atomic code patching") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506191949.o3SMu8Zn-lkp@intel.com/ Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250716-dev-alex-static_ftrace-v1-1-ba5d2b6fc9c0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-07-16riscv: traps_misaligned: properly sign extend value in misaligned load handlerAndreas Schwab
Add missing cast to signed long. Signed-off-by: Andreas Schwab <schwab@suse.de> Fixes: 956d705dd279 ("riscv: Unaligned load/store handling for M_MODE") Tested-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/mvmikk0goil.fsf@suse.de Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-07-16riscv: Enable interrupt during exception handlingNam Cao
force_sig_fault() takes a spinlock, which is a sleeping lock with CONFIG_PREEMPT_RT=y. However, exception handling calls force_sig_fault() with interrupt disabled, causing a sleeping in atomic context warning. This can be reproduced using userspace programs such as: int main() { asm ("ebreak"); } or int main() { asm ("unimp"); } There is no reason that interrupt must be disabled while handling exceptions from userspace. Enable interrupt while handling user exceptions. This also has the added benefit of avoiding unnecessary delays in interrupt handling. Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry") Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250625085630.3649485-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-07-16riscv: ftrace: Properly acquire text_mutex to fix a race conditionAlexandre Ghiti
As reported by lockdep, some patching was done without acquiring text_mutex, so there could be a race when mapping the page to patch since we use the same fixmap entry. Reported-by: Han Gao <rabenda.cn@gmail.com> Reported-by: Vivian Wang <wangruikang@iscas.ac.cn> Reported-by: Yao Zi <ziyao@disroot.org> Closes: https://lore.kernel.org/linux-riscv/aGODMpq7TGINddzM@pie.lan/ Tested-by: Yao Zi <ziyao@disroot.org> Tested-by: Han Gao <rabenda.cn@gmail.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250711-alex-fixes-v2-1-d85a5438da6c@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-07-16ACPI: RISC-V: Remove unnecessary CPPC debug messageSunil V L
The presence or absence of the CPPC SBI extension is currently logged on every boot. This message is not particularly useful and can clutter the boot log. Remove this debug message to reduce noise during boot. This change has no functional impact. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Tested-by: Drew Fustini <fustini@kernel.org> Link: https://lore.kernel.org/r/20250711140013.3043463-1-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-07-16riscv: Stop considering R_RISCV_NONE as bad relocationsAlexandre Ghiti
Even though those relocations should not be present in the final vmlinux, there are a lot of them. And since those relocations are considered "bad", they flood the compilation output which may hide some legitimate bad relocations. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Ron Economos <re@w6rz.net> Link: https://lore.kernel.org/r/20250710-dev-alex-riscv_none_bad_relocs_v1-v1-1-758f2fcc6e75@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-07-16nvmem: layouts: u-boot-env: remove crc32 endianness conversionMichael C. Pratt
On 11 Oct 2022, it was reported that the crc32 verification of the u-boot environment failed only on big-endian systems for the u-boot-env nvmem layout driver with the following error. Invalid calculated CRC32: 0x88cd6f09 (expected: 0x096fcd88) This problem has been present since the driver was introduced, and before it was made into a layout driver. The suggested fix at the time was to use further endianness conversion macros in order to have both the stored and calculated crc32 values to compare always represented in the system's endianness. This was not accepted due to sparse warnings and some disagreement on how to handle the situation. Later on in a newer revision of the patch, it was proposed to use cpu_to_le32() for both values to compare instead of le32_to_cpu() and store the values as __le32 type to remove compilation errors. The necessity of this is based on the assumption that the use of crc32() requires endianness conversion because the algorithm uses little-endian, however, this does not prove to be the case and the issue is unrelated. Upon inspecting the current kernel code, there already is an existing use of le32_to_cpu() in this driver, which suggests there already is special handling for big-endian systems, however, it is big-endian systems that have the problem. This, being the only functional difference between architectures in the driver combined with the fact that the suggested fix was to use the exact same endianness conversion for the values brings up the possibility that it was not necessary to begin with, as the same endianness conversion for two values expected to be the same is expected to be equivalent to no conversion at all. After inspecting the u-boot environment of devices of both endianness and trying to remove the existing endianness conversion, the problem is resolved in an equivalent way as the other suggested fixes. Ultimately, it seems that u-boot is agnostic to endianness at least for the purpose of environment variables. In other words, u-boot reads and writes the stored crc32 value with the same endianness that the crc32 value is calculated with in whichever endianness a certain architecture runs on. Therefore, the u-boot-env driver does not need to convert endianness. Remove the usage of endianness macros in the u-boot-env driver, and change the type of local variables to maintain the same return type. If there is a special situation in the case of endianness, it would be a corner case and should be handled by a unique "compatible". Even though it is not necessary to use endianness conversion macros here, it may be useful to use them in the future for consistent error printing. Fixes: d5542923f200 ("nvmem: add driver handling U-Boot environment variables") Reported-by: INAGAKI Hiroshi <musashino.open@gmail.com> Link: https://lore.kernel.org/all/20221011024928.1807-1-musashino.open@gmail.com Cc: stable@vger.kernel.org Signed-off-by: "Michael C. Pratt" <mcpratt@pm.me> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250716144210.4804-1-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16misc: amd-sbi: Explicitly clear in/out arg "mb_in_out"Akshay Gupta
- New AMD processor will support different input/output for same command. - In some scenarios the input value is not cleared, which will be added to output before reporting the data. - Clearing input explicitly will be a cleaner and safer approach. Reviewed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Akshay Gupta <akshay.gupta@amd.com> Link: https://lore.kernel.org/r/20250716110729.2193725-3-akshay.gupta@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>