Age | Commit message (Collapse) | Author |
|
/proc/vmstat contains events and stats, events can only grow, but stats
can grow and shrink.
vmstat has the following:
-------------------------
NR_VM_ZONE_STAT_ITEMS: per-zone stats
NR_VM_NUMA_EVENT_ITEMS: per-numa events
NR_VM_NODE_STAT_ITEMS: per-numa stats
NR_VM_WRITEBACK_STAT_ITEMS: system-wide background-writeback and
dirty-throttling tresholds.
NR_VM_EVENT_ITEMS: system-wide events
-------------------------
Rename NR_VM_WRITEBACK_STAT_ITEMS to NR_VM_STAT_ITEMS, to track the
system-wide stats, we are going to add per-page metadata stats to this
category in the next patch.
Also delete unused writeback_stat_name().
Link: https://lkml.kernel.org/r/20240809191020.1142142-2-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240808213437.682006-3-pasha.tatashin@soleen.com
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Suggested-by: Yosry Ahmed <yosryahmed@google.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zhijian <lizhijian@fujitsu.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yi Zhang <yi.zhang@redhat.com>
Cc: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Fixes for memmap accounting", v4.
Memmap accounting provides us with observability of how much memory is
used for per-page metadata: i.e. "struct page"'s and "struct page_ext".
It also provides with information of how much was allocated using
boot allocator (i.e. not part of MemTotal), and how much was allocated
using buddy allocated (i.e. part of MemTotal).
This small series fixes a few problems that were discovered with the
original patch.
This patch (of 3):
When we fail to allocate the mmemmap in alloc_vmemmap_page_list(), do not
account any already-allocated pages: we're going to free all them before
we return from the function.
Link: https://lkml.kernel.org/r/20240809191020.1142142-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240808213437.682006-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240808213437.682006-2-pasha.tatashin@soleen.com
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zhijian <lizhijian@fujitsu.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We recently made GUP's common page table walking code to also walk hugetlb
VMAs without most hugetlb special-casing, preparing for the future of
having less hugetlb-specific page table walking code in the codebase.
Turns out that we missed one page table locking detail: page table locking
for hugetlb folios that are not mapped using a single PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.
However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
This issue can be reproduced [1], for example triggering:
[ 3105.936100] ------------[ cut here ]------------
[ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188
[ 3105.944634] Modules linked in: [...]
[ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1
[ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024
[ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3105.991108] pc : try_grab_folio+0x11c/0x188
[ 3105.994013] lr : follow_page_pte+0xd8/0x430
[ 3105.996986] sp : ffff80008eafb8f0
[ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43
[ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48
[ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978
[ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001
[ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000
[ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000
[ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0
[ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080
[ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000
[ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000
[ 3106.047957] Call trace:
[ 3106.049522] try_grab_folio+0x11c/0x188
[ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0
[ 3106.055527] follow_page_mask+0x1a0/0x2b8
[ 3106.058118] __get_user_pages+0xf0/0x348
[ 3106.060647] faultin_page_range+0xb0/0x360
[ 3106.063651] do_madvise+0x340/0x598
Let's make huge_pte_lockptr() effectively use the same PT locks as any
core-mm page table walker would. Add ptep_lockptr() to obtain the PTE
page table lock using a pte pointer -- unfortunately we cannot convert
pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables
we can have with CONFIG_HIGHPTE.
Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such
that when e.g., CONFIG_PGTABLE_LEVELS==2 with
PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected. Document
why that works.
There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
folio being mapped using two PTE page tables. While hugetlb wants to take
the PMD table lock, core-mm would grab the PTE table lock of one of both
PTE page tables. In such corner cases, we have to make sure that both
locks match, which is (fortunately!) currently guaranteed for 8xx as it
does not support SMP and consequently doesn't use split PT locks.
[1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
Link: https://lkml.kernel.org/r/20240801204748.99107-1-david@redhat.com
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
is_madv_discard did its check wrong. MADV_ flags are not bitwise,
they're normal sequential numbers. So, for instance:
behavior & (/* ... */ | MADV_REMOVE)
tagged both MADV_REMOVE and MADV_RANDOM (bit 0 set) as discard
operations.
As a result the kernel could erroneously block certain madvises (e.g
MADV_RANDOM or MADV_HUGEPAGE) on sealed VMAs due to them sharing bits
with blocked MADV operations (e.g REMOVE or WIPEONFORK).
This is obviously incorrect, so use a switch statement instead.
Link: https://lkml.kernel.org/r/20240807173336.2523757-1-pedro.falcato@gmail.com
Link: https://lkml.kernel.org/r/20240807173336.2523757-2-pedro.falcato@gmail.com
Fixes: 8be7258aad44 ("mseal: add mseal syscall")
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Tested-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes
Mediatek DRM Fixes - 20240805
1. Set sensible cursor width/height values to fix crash
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240810084605.3435-1-chunkuang.hu@kernel.org
|
|
tools/testing/selftests/drivers/net/ is not listed under
networking entries. Add it to NETWORKING DRIVERS.
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240814142832.3473685-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver currently blindly deletes its cache of RSS cotexts and
ntuple filters when the ethtool channel count is changing. It also
deletes the ntuple filters cache when the default indirection table
is changing.
The core will not allow ethtool channels to drop below any that
have been configured as ntuple destinations since this commit from 2022:
47f3ecf4763d ("ethtool: Fail number of channels change when it conflicts with rxnfc")
So there is absolutely no need to delete the ntuple filters and
RSS contexts when changing ethtool channels.
It is also unnecessary to delete ntuple filters when the default
RSS indirection table is changing.
Remove bnxt_clear_usr_fltrs() and bnxt_clear_rss_ctxis() from the
ethtool ops and change them to static functions.
This bug will cause confusion to the end user and causes failure when
running the rss_ctx.py selftest.
Fixes: 1018319f949c ("bnxt_en: Invalidate user filters when needed")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20240725111912.7bc17cf6@kernel.org/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240814225429.199280-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During suspend/resume the following BUG was hit:
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:99!
Internal error: Oops - BUG: 0 [#1] SMP ARM
Modules linked in: bluetooth ecdh_generic ecc libaes
CPU: 1 PID: 1282 Comm: rtcwake Not tainted
6.10.0-rc3-00732-gc8bd1f7f3e61 #15240
Hardware name: Generic DT based system
PC is at dql_completed+0x270/0x2cc
LR is at __free_old_xmit+0x120/0x198
pc : [<c07ffa54>] lr : [<c0c42bf4>] psr: 80000013
...
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 43a4406a DAC: 00000051
...
Process rtcwake (pid: 1282, stack limit = 0xfbc21278)
Stack: (0xe0805e80 to 0xe0806000)
...
Call trace:
dql_completed from __free_old_xmit+0x120/0x198
__free_old_xmit from free_old_xmit+0x44/0xe4
free_old_xmit from virtnet_poll_tx+0x88/0x1b4
virtnet_poll_tx from __napi_poll+0x2c/0x1d4
__napi_poll from net_rx_action+0x140/0x2b4
net_rx_action from handle_softirqs+0x11c/0x350
handle_softirqs from call_with_stack+0x18/0x20
call_with_stack from do_softirq+0x48/0x50
do_softirq from __local_bh_enable_ip+0xa0/0xa4
__local_bh_enable_ip from virtnet_open+0xd4/0x21c
virtnet_open from virtnet_restore+0x94/0x120
virtnet_restore from virtio_device_restore+0x110/0x1f4
virtio_device_restore from dpm_run_callback+0x3c/0x100
dpm_run_callback from device_resume+0x12c/0x2a8
device_resume from dpm_resume+0x12c/0x1e0
dpm_resume from dpm_resume_end+0xc/0x18
dpm_resume_end from suspend_devices_and_enter+0x1f0/0x72c
suspend_devices_and_enter from pm_suspend+0x270/0x2a0
pm_suspend from state_store+0x68/0xc8
state_store from kernfs_fop_write_iter+0x10c/0x1cc
kernfs_fop_write_iter from vfs_write+0x2b0/0x3dc
vfs_write from ksys_write+0x5c/0xd4
ksys_write from ret_fast_syscall+0x0/0x54
Exception stack(0xe8bf1fa8 to 0xe8bf1ff0)
...
---[ end trace 0000000000000000 ]---
After virtnet_napi_enable() is called, the following path is hit:
__napi_poll()
-> virtnet_poll()
-> virtnet_poll_cleantx()
-> netif_tx_wake_queue()
That wakes the TX queue and allows skbs to be submitted and accounted by
BQL counters.
Then netdev_tx_reset_queue() is called that resets BQL counters and
eventually leads to the BUG in dql_completed().
Move virtnet_napi_tx_enable() what does BQL counters reset before RX
napi enable to avoid the issue.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/netdev/e632e378-d019-4de7-8f13-07c572ab37a9@samsung.com/
Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20240814122500.1710279-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
- Validate user fence during creation (Brost)
- Fix use after free when client stats are captured (Umesh)
- SRIOV fixes (Michal)
- Runtime PM fixes (Brost)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zr4KWF5nM1YvnT8H@intel.com
|
|
Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup, when the iommu is enabled:
kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
Fix this by using the non-coherent allocator instead, I think there
might be a better answer to this, but it involve ripping up some of
APIs using sg lists.
Cc: stable@vger.kernel.org
Fixes: 2541626cfb79 ("drm/nouveau/acr: use common falcon HS FW code for ACR FWs")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240815201923.632803-1-airlied@gmail.com
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
panel:
- dt-bindings style fixes
panel-orientation:
- add quirk for Any Loki Max
- add quirk for Any Loki Zero
rockchip:
- inno-hdmi: fix infoframe upload
v3d:
- fix OOB access in v3d_csd_job_run()
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240815131751.GA151031@linux.fritz.box
|
|
Lockdep reported a warning in Linux version 6.6:
[ 414.344659] ================================
[ 414.345155] WARNING: inconsistent lock state
[ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
[ 414.346221] --------------------------------
[ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
[ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[ 414.351204] {IN-SOFTIRQ-W} state was registered at:
[ 414.351751] lock_acquire+0x18d/0x460
[ 414.352218] _raw_spin_lock_irqsave+0x39/0x60
[ 414.352769] __wake_up_common_lock+0x22/0x60
[ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0
[ 414.353829] sbitmap_queue_clear+0xdd/0x270
[ 414.354338] blk_mq_put_tag+0xdf/0x170
[ 414.354807] __blk_mq_free_request+0x381/0x4d0
[ 414.355335] blk_mq_free_request+0x28b/0x3e0
[ 414.355847] __blk_mq_end_request+0x242/0xc30
[ 414.356367] scsi_end_request+0x2c1/0x830
[ 414.345155] WARNING: inconsistent lock state
[ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
[ 414.346221] --------------------------------
[ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
[ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[ 414.351204] {IN-SOFTIRQ-W} state was registered at:
[ 414.351751] lock_acquire+0x18d/0x460
[ 414.352218] _raw_spin_lock_irqsave+0x39/0x60
[ 414.352769] __wake_up_common_lock+0x22/0x60
[ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0
[ 414.353829] sbitmap_queue_clear+0xdd/0x270
[ 414.354338] blk_mq_put_tag+0xdf/0x170
[ 414.354807] __blk_mq_free_request+0x381/0x4d0
[ 414.355335] blk_mq_free_request+0x28b/0x3e0
[ 414.355847] __blk_mq_end_request+0x242/0xc30
[ 414.356367] scsi_end_request+0x2c1/0x830
[ 414.356863] scsi_io_completion+0x177/0x1610
[ 414.357379] scsi_complete+0x12f/0x260
[ 414.357856] blk_complete_reqs+0xba/0xf0
[ 414.358338] __do_softirq+0x1b0/0x7a2
[ 414.358796] irq_exit_rcu+0x14b/0x1a0
[ 414.359262] sysvec_call_function_single+0xaf/0xc0
[ 414.359828] asm_sysvec_call_function_single+0x1a/0x20
[ 414.360426] default_idle+0x1e/0x30
[ 414.360873] default_idle_call+0x9b/0x1f0
[ 414.361390] do_idle+0x2d2/0x3e0
[ 414.361819] cpu_startup_entry+0x55/0x60
[ 414.362314] start_secondary+0x235/0x2b0
[ 414.362809] secondary_startup_64_no_verify+0x18f/0x19b
[ 414.363413] irq event stamp: 428794
[ 414.363825] hardirqs last enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200
[ 414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50
[ 414.365629] softirqs last enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2
[ 414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0
[ 414.367425]
other info that might help us debug this:
[ 414.368194] Possible unsafe locking scenario:
[ 414.368900] CPU0
[ 414.369225] ----
[ 414.369548] lock(&sbq->ws[i].wait);
[ 414.370000] <Interrupt>
[ 414.370342] lock(&sbq->ws[i].wait);
[ 414.370802]
*** DEADLOCK ***
[ 414.371569] 5 locks held by kworker/u10:3/1152:
[ 414.372088] #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0
[ 414.373180] #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0
[ 414.374384] #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00
[ 414.375342] #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[ 414.376377] #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0
[ 414.378607]
stack backtrace:
[ 414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6
[ 414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 414.381177] Workqueue: writeback wb_workfn (flush-253:0)
[ 414.381805] Call Trace:
[ 414.382136] <TASK>
[ 414.382429] dump_stack_lvl+0x91/0xf0
[ 414.382884] mark_lock_irq+0xb3b/0x1260
[ 414.383367] ? __pfx_mark_lock_irq+0x10/0x10
[ 414.383889] ? stack_trace_save+0x8e/0xc0
[ 414.384373] ? __pfx_stack_trace_save+0x10/0x10
[ 414.384903] ? graph_lock+0xcf/0x410
[ 414.385350] ? save_trace+0x3d/0xc70
[ 414.385808] mark_lock.part.20+0x56d/0xa90
[ 414.386317] mark_held_locks+0xb0/0x110
[ 414.386791] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 414.387320] lockdep_hardirqs_on_prepare+0x297/0x3f0
[ 414.387901] ? _raw_spin_unlock_irq+0x28/0x50
[ 414.388422] trace_hardirqs_on+0x58/0x100
[ 414.388917] _raw_spin_unlock_irq+0x28/0x50
[ 414.389422] __blk_mq_tag_busy+0x1d6/0x2a0
[ 414.389920] __blk_mq_get_driver_tag+0x761/0x9f0
[ 414.390899] blk_mq_dispatch_rq_list+0x1780/0x1ee0
[ 414.391473] ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
[ 414.392070] ? sbitmap_get+0x2b8/0x450
[ 414.392533] ? __blk_mq_get_driver_tag+0x210/0x9f0
[ 414.393095] __blk_mq_sched_dispatch_requests+0xd99/0x1690
[ 414.393730] ? elv_attempt_insert_merge+0x1b1/0x420
[ 414.394302] ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
[ 414.394970] ? lock_acquire+0x18d/0x460
[ 414.395456] ? blk_mq_run_hw_queue+0x637/0xa00
[ 414.395986] ? __pfx_lock_acquire+0x10/0x10
[ 414.396499] blk_mq_sched_dispatch_requests+0x109/0x190
[ 414.397100] blk_mq_run_hw_queue+0x66e/0xa00
[ 414.397616] blk_mq_flush_plug_list.part.17+0x614/0x2030
[ 414.398244] ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10
[ 414.398897] ? writeback_sb_inodes+0x241/0xcc0
[ 414.399429] blk_mq_flush_plug_list+0x65/0x80
[ 414.399957] __blk_flush_plug+0x2f1/0x530
[ 414.400458] ? __pfx___blk_flush_plug+0x10/0x10
[ 414.400999] blk_finish_plug+0x59/0xa0
[ 414.401467] wb_writeback+0x7cc/0x920
[ 414.401935] ? __pfx_wb_writeback+0x10/0x10
[ 414.402442] ? mark_held_locks+0xb0/0x110
[ 414.402931] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 414.403462] ? lockdep_hardirqs_on_prepare+0x297/0x3f0
[ 414.404062] wb_workfn+0x2b3/0xcf0
[ 414.404500] ? __pfx_wb_workfn+0x10/0x10
[ 414.404989] process_scheduled_works+0x432/0x13f0
[ 414.405546] ? __pfx_process_scheduled_works+0x10/0x10
[ 414.406139] ? do_raw_spin_lock+0x101/0x2a0
[ 414.406641] ? assign_work+0x19b/0x240
[ 414.407106] ? lock_is_held_type+0x9d/0x110
[ 414.407604] worker_thread+0x6f2/0x1160
[ 414.408075] ? __kthread_parkme+0x62/0x210
[ 414.408572] ? lockdep_hardirqs_on_prepare+0x297/0x3f0
[ 414.409168] ? __kthread_parkme+0x13c/0x210
[ 414.409678] ? __pfx_worker_thread+0x10/0x10
[ 414.410191] kthread+0x33c/0x440
[ 414.410602] ? __pfx_kthread+0x10/0x10
[ 414.411068] ret_from_fork+0x4d/0x80
[ 414.411526] ? __pfx_kthread+0x10/0x10
[ 414.411993] ret_from_fork_asm+0x1b/0x30
[ 414.412489] </TASK>
When interrupt is turned on while a lock holding by spin_lock_irq it
throws a warning because of potential deadlock.
blk_mq_prep_dispatch_rq
blk_mq_get_driver_tag
__blk_mq_get_driver_tag
__blk_mq_alloc_driver_tag
blk_mq_tag_busy -> tag is already busy
// failed to get driver tag
blk_mq_mark_tag_wait
spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait)
__add_wait_queue(wq, wait) -> wait queue active
blk_mq_get_driver_tag
__blk_mq_tag_busy
-> 1) tag must be idle, which means there can't be inflight IO
spin_lock_irq(&tags->lock) -> lock B (hctx->tags)
spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally
-> 2) context must be preempt by IO interrupt to trigger deadlock.
As shown above, the deadlock is not possible in theory, but the warning
still need to be fixed.
Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.
Fixes: 4f1731df60f9 ("blk-mq: fix potential io hang by wrong 'wake_batch'")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240815024736.2040971-1-lilingfeng@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.11-2024-08-14:
amdgpu:
- Fix MES ring buffer overflow
- DCN 3.5 fix
- DCN 3.2.1 fix
- DP MST fix
- Cursor fixes
- JPEG fixes
- Context ops validation
- MES 12 fixes
- VCN 5.0 fix
- HDP fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240814213846.1331827-1-alexander.deucher@amd.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"The usual header file sync-ups and one more build fix:
- Add README file to explain why we copy the headers
- Sync UAPI and other header files with kernel source
- Fix build on MIPS 32-bit"
* tag 'perf-tools-fixes-for-v6.11-2024-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf daemon: Fix the build on 32-bit architectures
tools/include: Sync arm64 headers with the kernel sources
tools/include: Sync x86 headers with the kernel sources
tools/include: Sync filesystem headers with the kernel sources
tools/include: Sync network socket headers with the kernel sources
tools/include: Sync uapi/asm-generic/unistd.h with the kernel sources
tools/include: Sync uapi/sound/asound.h with the kernel sources
tools/include: Sync uapi/linux/perf.h with the kernel sources
tools/include: Sync uapi/linux/kvm.h with the kernel sources
tools/include: Sync uapi/drm/i915_drm.h with the kernel sources
perf tools: Add tools/include/uapi/README
|
|
Commit 9f9bef9bc5c6 ("smb: smb2pdu.h: Avoid -Wflex-array-member-not-at-end
warnings") introduced tagged `struct create_context_hdr`. We want to
ensure that when new members need to be added to the flexible structure,
they are always included within this tagged struct.
So, we use `static_assert()` to ensure that the memory layout for
both the flexible structure and the tagged struct is the same after
any changes.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable@vger.kernel.org
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Reported-by: abartlet@samba.org
Reported-by: Kevin Ottens <kevin.ottens@enioka.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.11
Pull MD fix from Song:
"This patch fixes a potential data corruption in degraded raid0 array
with slow (WriteMostly) drives. This issue was introduced in upstream
6.9 kernel."
* tag 'md-6.11-20240815' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/raid1: Fix data corruption for degraded array with slow disk
|
|
read_balance() will avoid reading from slow disks as much as possible,
however, if valid data only lands in slow disks, and a new normal disk
is still in recovery, unrecovered data can be read:
raid1_read_request
read_balance
raid1_should_read_first
-> return false
choose_best_rdev
-> normal disk is not recovered, return -1
choose_bb_rdev
-> missing the checking of recovery, return the normal disk
-> read unrecovered data
Root cause is that the checking of recovery is missing in
choose_bb_rdev(). Hence add such checking to fix the problem.
Also fix similar problem in choose_slow_rdev().
Cc: stable@vger.kernel.org
Fixes: 9f3ced792203 ("md/raid1: factor out choose_bb_rdev() from read_balance()")
Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
Reported-and-tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Closes: https://lore.kernel.org/all/9952f532-2554-44bf-b906-4880b2e88e3a@o2.pl/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240803091137.3197008-1-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
|
|
Clang static checker (scan-build) warning:
cifsglob.h:line 890, column 3
Access to field 'ops' results in a dereference of a null pointer.
Commit 519be989717c ("cifs: Add a tracepoint to track credits involved in
R/W requests") adds a check for 'rdata->server', and let clang throw this
warning about NULL dereference.
When 'rdata->credits.value != 0 && rdata->server == NULL' happens,
add_credits_and_wake_if() will call rdata->server->ops->add_credits().
This will cause NULL dereference problem. Add a check for 'rdata->server'
to avoid NULL dereference.
Cc: stable@vger.kernel.org
Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Evan Green <evan@rivosinc.com> says:
The CPUPERF0 hwprobe key was documented and identified in code as
a bitmask value, but its contents were an enum. This produced
incorrect behavior in conjunction with the WHICH_CPUS hwprobe flag.
The first patch in this series fixes the bitmask/enum problem by
creating a new hwprobe key that returns the same data, but is
properly described as a value instead of a bitmask. The second patch
renames the value definitions in preparation for adding vector misaligned
access info. As of this version, the old defines are kept in place to
maintain source compatibility with older userspace programs.
* b4-shazam-merge:
RISC-V: hwprobe: Add SCALAR to misaligned perf defines
RISC-V: hwprobe: Add MISALIGNED_PERF key
Link: https://lore.kernel.org/r/20240809214444.3257596-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The out-of-bounds access is reported by UBSAN:
[ 0.000000] UBSAN: array-index-out-of-bounds in ../arch/riscv/kernel/vendor_extensions.c:41:66
[ 0.000000] index -1 is out of range for type 'riscv_isavendorinfo [32]'
[ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.11.0-rc2ubuntu-defconfig #2
[ 0.000000] Hardware name: riscv-virtio,qemu (DT)
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff94e078ba>] dump_backtrace+0x32/0x40
[ 0.000000] [<ffffffff95c83c1a>] show_stack+0x38/0x44
[ 0.000000] [<ffffffff95c94614>] dump_stack_lvl+0x70/0x9c
[ 0.000000] [<ffffffff95c94658>] dump_stack+0x18/0x20
[ 0.000000] [<ffffffff95c8bbb2>] ubsan_epilogue+0x10/0x46
[ 0.000000] [<ffffffff95485a82>] __ubsan_handle_out_of_bounds+0x94/0x9c
[ 0.000000] [<ffffffff94e09442>] __riscv_isa_vendor_extension_available+0x90/0x92
[ 0.000000] [<ffffffff94e043b6>] riscv_cpufeature_patch_func+0xc4/0x148
[ 0.000000] [<ffffffff94e035f8>] _apply_alternatives+0x42/0x50
[ 0.000000] [<ffffffff95e04196>] apply_boot_alternatives+0x3c/0x100
[ 0.000000] [<ffffffff95e05b52>] setup_arch+0x85a/0x8bc
[ 0.000000] [<ffffffff95e00ca0>] start_kernel+0xa4/0xfb6
The dereferencing using cpu should actually not happen, so remove it.
Fixes: 23c996fc2bc1 ("riscv: Extend cpufeature.c to detect vendor extensions")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240814192619.276794-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Trusted keys unseal the key blob on load, but keep the sealed payload in
the blob field so that every subsequent read (export) will simply
convert this field to hex and send it to userspace.
With DCP-based trusted keys, we decrypt the blob encryption key (BEK)
in the Kernel due hardware limitations and then decrypt the blob payload.
BEK decryption is done in-place which means that the trusted key blob
field is modified and it consequently holds the BEK in plain text.
Every subsequent read of that key thus send the plain text BEK instead
of the encrypted BEK to userspace.
This issue only occurs when importing a trusted DCP-based key and
then exporting it again. This should rarely happen as the common use cases
are to either create a new trusted key and export it, or import a key
blob and then just use it without exporting it again.
Fix this by performing BEK decryption and encryption in a dedicated
buffer. Further always wipe the plain text BEK buffer to prevent leaking
the key via uninitialized memory.
Cc: stable@vger.kernel.org # v6.10+
Fixes: 2e8a0f40a39c ("KEYS: trusted: Introduce NXP DCP-backed trusted keys")
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
The DCP trusted key type uses the wrong helper function to store
the blob's payload length which can lead to the wrong byte order
being used in case this would ever run on big endian architectures.
Fix by using correct helper function.
Cc: stable@vger.kernel.org # v6.10+
Fixes: 2e8a0f40a39c ("KEYS: trusted: Introduce NXP DCP-backed trusted keys")
Suggested-by: Richard Weinberger <richard@nod.at>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405240610.fj53EK0q-lkp@intel.com/
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- gcc-plugins: randstruct: Remove GCC 4.7 or newer requirement
(Thorsten Blum)
- kallsyms: Clean up interaction with LTO suffixes (Song Liu)
- refcount: Report UAF for refcount_sub_and_test(0) when counter==0
(Petr Pavlu)
- kunit/overflow: Avoid misallocation of driver name (Ivan Orlov)
* tag 'hardening-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kallsyms: Match symbols exactly with CONFIG_LTO_CLANG
kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols
kunit/overflow: Fix UB in overflow_allocation_test
gcc-plugins: randstruct: Remove GCC 4.7 or newer requirement
refcount: Report UAF for refcount_sub_and_test(0) when counter==0
|
|
__btrfs_add_free_space_zoned() references and modifies bg's alloc_offset,
ro, and zone_unusable, but without taking the lock. It is mostly safe
because they monotonically increase (at least for now) and this function is
mostly called by a transaction commit, which is serialized by itself.
Still, taking the lock is a safer and correct option and I'm going to add a
change to reset zone_unusable while a block group is still alive. So, add
locking around the operations.
Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[REPORT]
There is a corruption report that btrfs refused to mount a fs that has
overlapping dev extents:
BTRFS error (device sdc): dev extent devid 4 physical offset 14263979671552 overlap with previous dev extent end 14263980982272
BTRFS error (device sdc): failed to verify dev extents against chunks: -117
BTRFS error (device sdc): open_ctree failed
[CAUSE]
The direct cause is very obvious, there is a bad dev extent item with
incorrect length.
With btrfs check reporting two overlapping extents, the second one shows
some clue on the cause:
ERROR: dev extent devid 4 offset 14263979671552 len 6488064 overlap with previous dev extent end 14263980982272
ERROR: dev extent devid 13 offset 2257707008000 len 6488064 overlap with previous dev extent end 2257707270144
ERROR: errors found in extent allocation tree or chunk allocation
The second one looks like a bitflip happened during new chunk
allocation:
hex(2257707008000) = 0x20da9d30000
hex(2257707270144) = 0x20da9d70000
diff = 0x00000040000
So it looks like a bitflip happened during new dev extent allocation,
resulting the second overlap.
Currently we only do the dev-extent verification at mount time, but if the
corruption is caused by memory bitflip, we really want to catch it before
writing the corruption to the storage.
Furthermore the dev extent items has the following key definition:
(<device id> DEV_EXTENT <physical offset>)
Thus we can not just rely on the generic key order check to make sure
there is no overlapping.
[ENHANCEMENT]
Introduce dedicated dev extent checks, including:
- Fixed member checks
* chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
* chunk_objectid should always be
BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
- Alignment checks
* chunk_offset should be aligned to sectorsize
* length should be aligned to sectorsize
* key.offset should be aligned to sectorsize
- Overlap checks
If the previous key is also a dev-extent item, with the same
device id, make sure we do not overlap with the previous dev extent.
Reported: Stefan N <stefannnau@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Unlink changes the link count on the target inode. POSIX mandates that
the ctime must also change when this occurs.
According to https://pubs.opengroup.org/onlinepubs/9699919799/functions/unlink.html:
"Upon successful completion, unlink() shall mark for update the last data
modification and last file status change timestamps of the parent
directory. Also, if the file's link count is not 0, the last file status
change timestamp of the file shall be marked for update."
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add link to the opengroup docs ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Add the __counted_by compiler attribute to the flexible array member
name to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wireless and netfilter
Current release - regressions:
- udp: fall back to software USO if IPv6 extension headers are
present
- wifi: iwlwifi: correctly lookup DMA address in SG table
Current release - new code bugs:
- eth: mlx5e: fix queue stats access to non-existing channels splat
Previous releases - regressions:
- eth: mlx5e: take state lock during tx timeout reporter
- eth: mlxbf_gige: disable RX filters until RX path initialized
- eth: igc: fix reset adapter logics when tx mode change
Previous releases - always broken:
- tcp: update window clamping condition
- netfilter:
- nf_queue: drop packets with cloned unconfirmed conntracks
- nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
- vsock: fix recursive ->recvmsg calls
- dsa: vsc73xx: fix MDIO bus access and PHY opera
- eth: gtp: pull network headers in gtp_dev_xmit()
- eth: igc: fix packet still tx after gate close by reducing i226 MAC
retry buffer
- eth: mana: fix RX buf alloc_size alignment and atomic op panic
- eth: hns3: fix a deadlock problem when config TC during resetting"
* tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
net: hns3: use correct release function during uninitialization
net: hns3: void array out of bound when loop tnl_num
net: hns3: fix a deadlock problem when config TC during resetting
net: hns3: use the user's cfg after reset
net: hns3: fix wrong use of semaphore up
selftests: net: lib: kill PIDs before del netns
pse-core: Conditionally set current limit during PI regulator registration
net: thunder_bgx: Fix netdev structure allocation
net: ethtool: Allow write mechanism of LPL and both LPL and EPL
vsock: fix recursive ->recvmsg calls
selftest: af_unix: Fix kselftest compilation warnings
netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
netfilter: nf_tables: Introduce nf_tables_getobj_single
netfilter: nf_tables: Audit log dump reset after the fact
selftests: netfilter: add test for br_netfilter+conntrack+queue combination
netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
netfilter: flowtable: initialise extack before use
netfilter: nfnetlink: Initialise extack before use in ACKs
netfilter: allow ipv6 fragments to arrive on different devices
tcp: Update window clamping condition
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Two regression fixes:
- fix atomisp support for ISP2400
- fix dvb-usb regression for TeVii s480 dual DVB-S2 S660 board"
* tag 'media/v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices
media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
|
|
-ENODEV is used to signify that there is no zap shader for the platform,
and the CPU can directly take the GPU out of secure mode. We want to
use this return code when there is no zap-shader node. But not when
there is, but without a firmware-name property. This case we want to
treat as-if the needed fw is not found.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/604564/
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
- Revert a recent change to sense data generation.
Sense data can be in either fixed format or descriptor format.
The D_SENSE bit in the Control mode page controls which format to
generate. All places but one respected the D_SENSE bit.
The recent change fixed the one place that didn't respect the D_SENSE
bit. However, it turns out that hdparm, hddtemp and udisks
(incorrectly) assumes sense data in descriptor format.
Therefore, even while the change was technically correct, revert it,
since even if these user space programs are fixed to (correctly) look
at the format type before parsing the data, older versions of these
tools will be around roughly forever.
* tag 'ata-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
|
|
hci_conn_params_add() never checks for a NULL value and could lead to a NULL
pointer dereference causing a crash.
Fixed by adding error handling in the function.
Cc: Stable <stable@kernel.org>
Fixes: 5157b8a503fa ("Bluetooth: Fix initializing conn_params in scan phase")
Signed-off-by: Griffin Kroah-Hartman <griffin@kroah.com>
Reported-by: Yiwei Zhang <zhan4630@purdue.edu>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
SMP initiator role shall be considered the one that initiates the
pairing procedure with SMP_CMD_PAIRING_REQ:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part H
page 1557:
Figure 2.1: LE pairing phases
Note that by sending SMP_CMD_SECURITY_REQ it doesn't change the role to
be Initiator.
Link: https://github.com/bluez/bluez/issues/567
Fixes: b28b4943660f ("Bluetooth: Add strict checks for allowed SMP PDUs")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Function hci_sched_le needs to update the respective counter variable
inplace other the likes of hci_quote_sent would attempt to use the
possible outdated value of conn->{le_cnt,acl_cnt}.
Link: https://github.com/bluez/bluez/issues/915
Fixes: 73d80deb7bdf ("Bluetooth: prioritizing data over HCI")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This inverts the LE State quirk so by default we assume the controllers
would report valid states rather than invalid which is how quirks
normally behave, also this would result in HCI command failing it the LE
States are really broken thus exposing the controllers that are really
broken in this respect.
Link: https://github.com/bluez/bluez/issues/584
Fixes: 220915857e29 ("Bluetooth: Adding driver and quirk defs for multi-role LE")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
With CONFIG_LTO_CLANG=y, the compiler may add .llvm.<hash> suffix to
function names to avoid duplication. APIs like kallsyms_lookup_name()
and kallsyms_on_each_match_symbol() tries to match these symbol names
without the .llvm.<hash> suffix, e.g., match "c_stop" with symbol
c_stop.llvm.17132674095431275852. This turned out to be problematic
for use cases that require exact match, for example, livepatch.
Fix this by making the APIs to match symbols exactly.
Also cleanup kallsyms_selftests accordingly.
Signed-off-by: Song Liu <song@kernel.org>
Fixes: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions")
Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20240807220513.3100483-3-song@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Cleaning up the symbols causes various issues afterwards. Let's sort
the list based on original name.
Signed-off-by: Song Liu <song@kernel.org>
Fixes: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20240807220513.3100483-2-song@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
The 'device_name' array doesn't exist out of the
'overflow_allocation_test' function scope. However, it is being used as
a driver name when calling 'kunit_driver_create' from
'kunit_device_register'. It produces the kernel panic with KASAN
enabled.
Since this variable is used in one place only, remove it and pass the
device name into kunit_device_register directly as an ascii string.
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: David Gow <davidgow@google.com>
Link: https://lore.kernel.org/r/20240815000431.401869-1-ivan.orlov0322@gmail.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
This reverts commit bab2f5e8fd5d2f759db26b78d9db57412888f187.
Joel reported that this commit breaks userspace and stops sensors in
SDM845 from working. Also breaks other qcom SoC devices running postmarketOS.
Cc: stable <stable@kernel.org>
Cc: Ekansh Gupta <quic_ekangupt@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reported-by: Joel Selvaraj <joelselvaraj.oss@gmail.com>
Link: https://lore.kernel.org/r/9a9f5646-a554-4b65-8122-d212bb665c81@umsystem.edu
Signed-off-by: Griffin Kroah-Hartman <griffin@kroah.com>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Fixes: bab2f5e8fd5d ("misc: fastrpc: Restrict untrusted app to attach to privileged PD")
Link: https://lore.kernel.org/r/20240815094920.8242-1-griffin@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
re-enumerating full-speed devices after a failed address device command
can trigger a NULL pointer dereference.
Full-speed devices may need to reconfigure the endpoint 0 Max Packet Size
value during enumeration. Usb core calls usb_ep0_reinit() in this case,
which ends up calling xhci_configure_endpoint().
On Panther point xHC the xhci_configure_endpoint() function will
additionally check and reserve bandwidth in software. Other hosts do
this in hardware
If xHC address device command fails then a new xhci_virt_device structure
is allocated as part of re-enabling the slot, but the bandwidth table
pointers are not set up properly here.
This triggers the NULL pointer dereference the next time usb_ep0_reinit()
is called and xhci_configure_endpoint() tries to check and reserve
bandwidth
[46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd
[46710.713699] usb 3-1: Device not responding to setup address.
[46710.917684] usb 3-1: Device not responding to setup address.
[46711.125536] usb 3-1: device not accepting address 5, error -71
[46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008
[46711.125600] #PF: supervisor read access in kernel mode
[46711.125603] #PF: error_code(0x0000) - not-present page
[46711.125606] PGD 0 P4D 0
[46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1
[46711.125620] Hardware name: Gigabyte Technology Co., Ltd.
[46711.125623] Workqueue: usb_hub_wq hub_event [usbcore]
[46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c
Fix this by making sure bandwidth table pointers are set up correctly
after a failed address device command, and additionally by avoiding
checking for bandwidth in cases like this where no actual endpoints are
added or removed, i.e. only context for default control endpoint 0 is
evaluated.
Reported-by: Karel Balej <balejk@matfyz.cz>
Closes: https://lore.kernel.org/linux-usb/D3CKQQAETH47.1MUO22RTCH2O3@matfyz.cz/
Cc: stable@vger.kernel.org
Fixes: 651aaf36a7d7 ("usb: xhci: Handle USB transaction error on address command")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20240815141117.2702314-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Avoid GT TLB invalidation timeouts by holding a PM ref when
invalidations are inflight.
v2:
- Drop PM ref before signaling fence (CI)
v3:
- Move invalidation_fence_signal helper in tlb timeout to previous
patch (Matthew Auld)
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-4-matthew.brost@intel.com
(cherry picked from commit 0a382f9bc5dc4744a33970a5ed4df8f9c702ee9e)
Requires: 46209ce5287b ("drm/xe: Add xe_gt_tlb_invalidation_fence_init
helper")
Requires: 0e414ab036e0 ("drm/xe: Drop xe_gt_tlb_invalidation_wait")
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Having two methods to wait on GT TLB invalidations is not ideal. Remove
xe_gt_tlb_invalidation_wait and only use GT TLB invalidation fences.
In addition to two methods being less than ideal, once GT TLB
invalidations are coalesced the seqno cannot be assigned during
xe_gt_tlb_invalidation_ggtt/range. Thus xe_gt_tlb_invalidation_wait
would not have a seqno to wait one. A fence however can be armed and
later signaled.
v3:
- Add explaination about coalescing to commit message
v4:
- Don't put dma fence if defined on stack (CI)
v5:
- Initialize ret to zero (CI)
v6:
- Use invalidation_fence_signal helper in tlb timeout (Matthew Auld)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-3-matthew.brost@intel.com
(cherry picked from commit 61ac035361ae555ee5a17a7667fe96afdde3d59a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Other layers should not be touching struct xe_gt_tlb_invalidation_fence
directly, add helper for initialization.
v2:
- Add dma_fence_get and list init to xe_gt_tlb_invalidation_fence_init
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-2-matthew.brost@intel.com
(cherry picked from commit a522b285c6b4b611406d59612a8d7241714d2e31)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
When validating VF config on the media GT, we may wrongly report
that VF is already partially configured on it, as we consider GGTT
and LMEM provisioning done on the primary GT (since both GGTT and
LMEM are tile-level resources, not a GT-level).
This will cause skipping a VF auto-provisioning on the media-GT and
in result will block a VF from successfully initialize that GT.
Fix that by considering GGTT and LMEM configurations only when
checking if a VF provisioning is complete, and omit GGTT and LMEM
when reporting empty/partial provisioning.
Fixes: 234670cea9a2 ("drm/xe/pf: Skip fair VFs provisioning if already provisioned")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240806180516.618-1-michal.wajdeczko@intel.com
(cherry picked from commit 5bdacb0907c1f531995b6ba47b832ac3a0182ae9)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Take PM ref when any G2H are outstanding, drop when none are
outstanding.
To safely ensure we have PM ref when in the GuC CT layer, a PM ref needs
to be held when scheduler messages are pending too.
v2:
- Add outer PM protections to xe_file_close (CI)
v3:
- Only take PM ref 0->1 and drop on 1->0 (Matthew Auld)
v4:
- Add assert to G2H increment function
v5:
- Rebase
v6:
- Declare xe as local variable in xe_file_close (CI)
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-5-matthew.brost@intel.com
(cherry picked from commit d930c19fdff3109e97b610fa10943b7602efcabd)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
We should use the number of actual entries stored in the runtime
register buffer, not the maximum number of entries that this buffer
can hold, otherwise bsearch() may fail and we may miss the data and
wrongly report unexpected access to some registers.
Fixes: 4edadc41a3a4 ("drm/xe/vf: Use register values obtained from the PF")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240718203155.486-1-michal.wajdeczko@intel.com
(cherry picked from commit ad16682db18f4414e53bba1ce0db75b08bdc4dff)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
xe_file_close triggers an asynchronous queue cleanup and then frees up
the xef object. Since queue cleanup flushes all pending jobs and the KMD
stores client usage stats into the xef object after jobs are flushed, we
see a use-after-free for the xef object. Resolve this by taking a
reference to xef from xe_exec_queue.
While at it, revert an earlier change that contained a partial work
around for this issue.
v2:
- Take a ref to xef even for the VM bind queue (Matt)
- Squash patches relevant to that fix and work around (Lucas)
v3: Fix typo (Lucas)
Fixes: ce62827bc294 ("drm/xe: Do not access xe file when updating exec queue run_ticks")
Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1908
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240718210548.3580382-5-umesh.nerlige.ramappa@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 2149ded63079449b8dddf9da38392632f155e6b5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Take a reference to xef when user creates the VM and put the reference
when user destroys the VM.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240718210548.3580382-4-umesh.nerlige.ramappa@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit a2387e69493df3de706f14e4573ee123d23d5d34)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add ref counting for xe_file.
v2:
- Add kernel doc for exported functions (Matt)
- Instead of xe_file_destroy, export the get/put helpers (Lucas)
v3: Fixup the kernel-doc format and description (Matt, Lucas)
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240718210548.3580382-3-umesh.nerlige.ramappa@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit ce8c161cbad43f4056451e541f7ae3471d0cca12)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|