Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The important core fix is another tweak to our discard discovery
issues. The off by 512 in logical block count seems bad, but in fact
the inline was only ever used in debug prints, which is why no-one
noticed"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Do not attempt to configure discard unless LBPME is set
scsi: MAINTAINERS: Add header files to SCSI SUBSYSTEM
scsi: ufs: qcom: Add UFSHCD_QUIRK_BROKEN_LSDBS_CAP for SM8550 SoC
scsi: ufs: core: Add a quirk for handling broken LSDBS field in controller capabilities register
scsi: core: Fix the return value of scsi_logical_block_count()
scsi: MAINTAINERS: Update HiSilicon SAS controller driver maintainer
|
|
Pull NFS client fixes from Anna Schumaker:
- Fix rpcrdma refcounting in xa_alloc
- Fix rpcrdma usage of XA_FLAGS_ALLOC
- Fix requesting FATTR4_WORD2_OPEN_ARGUMENTS
- Fix attribute bitmap decoder to handle a 3rd word
- Add reschedule points when returning delegations to avoid soft lockups
- Fix clearing layout segments in layoutreturn
- Avoid unnecessary rescanning of the per-server delegation list
* tag 'nfs-for-6.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Avoid unnecessary rescanning of the per-server delegation list
NFSv4: Fix clearing of layout segments in layoutreturn
NFSv4: Add missing rescheduling points in nfs_client_return_marked_delegations
nfs: fix bitmap decoder to handle a 3rd word
nfs: fix the fetch of FATTR4_OPEN_ARGUMENTS
rpcrdma: Trace connection registration and unregistration
rpcrdma: Use XA_FLAGS_ALLOC instead of XA_FLAGS_ALLOC1
rpcrdma: Device kref is over-incremented on error from xa_alloc
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes. xe and msm are the major groups, with
amdgpu/i915/nouveau having smaller bits. xe has a bunch of hw
workaround fixes that were found to be missing, so that is why there
are a bunch of scattered fixes, and one larger one. But overall size
doesn't look too out of the ordinary.
msm:
- virtual plane fixes:
- drop yuv on hw where not supported
- csc vs yuv format fix
- rotation fix
- fix fb cleanup on close
- reset phy before link training
- fix visual corruption at 4K
- fix NULL ptr crash on hotplug
- simplify debug macros
- sc7180 fix
- adreno firmware name error path fix
amdgpu:
- GFX10 firmware loading fix
- SDMA 5.2 fix
- Debugfs parameter validation fix
- eGPU hotplug fix
i915:
- fix HDCP timeouts
nouveau:
- fix SG_DEBUG crash
xe:
- Fix OA format masks which were breaking build with gcc-5
- Fix opregion leak (Lucas)
- Fix OA sysfs entry (Ashutosh)
- Fix VM dma-resv lock (Brost)
- Fix tile fini sequence (Brost)
- Prevent UAF around preempt fence (Auld)
- Fix DGFX display suspend/resume (Maarten)
- Many Xe/Xe2 critical workarounds (Auld, Ngai-Mint, Bommu, Tejas, Daniele)
- Fix devm/drmm issues (Daniele)
- Fix missing workqueue destroy in xe_gt_pagefault (Stuart)
- Drop HW fence pointer to HW fence ctx (Brost)
- Free job before xe_exec_queue_put (Brost)"
* tag 'drm-fixes-2024-08-24' of https://gitlab.freedesktop.org/drm/kernel: (35 commits)
drm/xe: Free job before xe_exec_queue_put
drm/xe: Drop HW fence pointer to HW fence ctx
drm/xe: Fix missing workqueue destroy in xe_gt_pagefault
drm/amdgpu: fix eGPU hotplug regression
drm/amdgpu: Validate TA binary size
drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1
drm/amdgpu: fixing rlc firmware loading failure issue
drm/xe/uc: Use devm to register cleanup that includes exec_queues
drm/xe: use devm instead of drmm for managed bo
drm/xe/xe2hpg: Add Wa_14021821874
drm/xe: fix WA 14018094691
drm/xe/xe2: Add Wa_15015404425
drm/xe/xe2: Make subsequent L2 flush sequential
drm/xe/xe2lpg: Extend workaround 14021402888
drm/xe/xe2lpm: Extend Wa_16021639441
drm/xe/bmg: implement Wa_16023588340
drm/xe/oa/uapi: Make bit masks unsigned
drm/xe/display: Make display suspend/resume work on discrete
drm/xe: prevent UAF around preempt fence
drm/xe: Fix tile fini sequence
...
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith
- Remove unused struct field (Nilay)
- Fix fabrics keep-alive teardown order (Ming)
- Write zeroes fixes (John)
* tag 'block-6.11-20240823' of git://git.kernel.dk/linux:
nvme: Remove unused field
nvme: move stopping keep-alive into nvme_uninit_ctrl()
block: Drop NULL check in bdev_write_zeroes_sectors()
block: Read max write zeroes once for __blkdev_issue_write_zeroes()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix backlight control on a Dell All In One system where a backlight
controller board is attached to a UART port and the dell-uart
backlight driver binds to it, but the backlight is actually controlled
by other means (Hans de Goede)"
* tag 'acpi-6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Add backlight=native quirk for Dell OptiPlex 7760 AIO
platform/x86: dell-uart-backlight: Use acpi_video_get_backlight_type()
ACPI: video: Add Dell UART backlight controller detection
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter.
Current release - regressions:
- virtio_net: avoid crash on resume - move netdev_tx_reset_queue()
call before RX napi enable
Current release - new code bugs:
- net/mlx5e: fix page leak and incorrect header release w/ HW GRO
Previous releases - regressions:
- udp: fix receiving fraglist GSO packets
- tcp: prevent refcount underflow due to concurrent execution of
tcp_sk_exit_batch()
Previous releases - always broken:
- ipv6: fix possible UAF when incrementing error counters on output
- ip6: tunnel: prevent merging of packets with different L2
- mptcp: pm: fix IDs not being reusable
- bonding: fix potential crashes in IPsec offload handling
- Bluetooth: HCI:
- MGMT: add error handling to pair_device() to avoid a crash
- invert LE State quirk to be opt-out rather then opt-in
- fix LE quote calculation
- drv: dsa: VLAN fixes for Ocelot driver
- drv: igb: cope with large MAX_SKB_FRAGS Kconfig settings
- drv: ice: fi Rx data path on architectures with PAGE_SIZE >= 8192
Misc:
- netpoll: do not export netpoll_poll_[disable|enable]()
- MAINTAINERS: update the list of networking headers"
* tag 'net-6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (82 commits)
s390/iucv: Fix vargs handling in iucv_alloc_device()
net: ovs: fix ovs_drop_reasons error
net: xilinx: axienet: Fix dangling multicast addresses
net: xilinx: axienet: Always disable promiscuous mode
MAINTAINERS: Mark JME Network Driver as Odd Fixes
MAINTAINERS: Add header files to NETWORKING sections
MAINTAINERS: Add limited globs for Networking headers
MAINTAINERS: Add net_tstamp.h to SOCKET TIMESTAMPING section
MAINTAINERS: Add sonet.h to ATM section of MAINTAINERS
octeontx2-af: Fix CPT AF register offset calculation
net: phy: realtek: Fix setting of PHY LEDs Mode B bit on RTL8211F
net: ngbe: Fix phy mode set to external phy
netfilter: flowtable: validate vlan header
bnxt_en: Fix double DMA unmapping for XDP_REDIRECT
ipv6: prevent possible UAF in ip6_xmit()
ipv6: fix possible UAF in ip6_finish_output2()
ipv6: prevent UAF in ip6_send_skb()
netpoll: do not export netpoll_poll_[disable|enable]()
selftests: mlxsw: ethtool_lanes: Source ethtool lib from correct path
udp: fix receiving fraglist GSO packets
...
|
|
syzkaller reported UAF in kcm_release(). [0]
The scenario is
1. Thread A builds a skb with MSG_MORE and sets kcm->seq_skb.
2. Thread A resumes building skb from kcm->seq_skb but is blocked
by sk_stream_wait_memory()
3. Thread B calls sendmsg() concurrently, finishes building kcm->seq_skb
and puts the skb to the write queue
4. Thread A faces an error and finally frees skb that is already in the
write queue
5. kcm_release() does double-free the skb in the write queue
When a thread is building a MSG_MORE skb, another thread must not touch it.
Let's add a per-sk mutex and serialise kcm_sendmsg().
[0]:
BUG: KASAN: slab-use-after-free in __skb_unlink include/linux/skbuff.h:2366 [inline]
BUG: KASAN: slab-use-after-free in __skb_dequeue include/linux/skbuff.h:2385 [inline]
BUG: KASAN: slab-use-after-free in __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline]
BUG: KASAN: slab-use-after-free in __skb_queue_purge include/linux/skbuff.h:3181 [inline]
BUG: KASAN: slab-use-after-free in kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691
Read of size 8 at addr ffff0000ced0fc80 by task syz-executor329/6167
CPU: 1 PID: 6167 Comm: syz-executor329 Tainted: G B 6.8.0-rc5-syzkaller-g9abbc24128bc #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Call trace:
dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x178/0x518 mm/kasan/report.c:488
kasan_report+0xd8/0x138 mm/kasan/report.c:601
__asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381
__skb_unlink include/linux/skbuff.h:2366 [inline]
__skb_dequeue include/linux/skbuff.h:2385 [inline]
__skb_queue_purge_reason include/linux/skbuff.h:3175 [inline]
__skb_queue_purge include/linux/skbuff.h:3181 [inline]
kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691
__sock_release net/socket.c:659 [inline]
sock_close+0xa4/0x1e8 net/socket.c:1421
__fput+0x30c/0x738 fs/file_table.c:376
____fput+0x20/0x30 fs/file_table.c:404
task_work_run+0x230/0x2e0 kernel/task_work.c:180
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0x618/0x1f64 kernel/exit.c:871
do_group_exit+0x194/0x22c kernel/exit.c:1020
get_signal+0x1500/0x15ec kernel/signal.c:2893
do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249
do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148
exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Allocated by task 6166:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_alloc_info+0x70/0x84 mm/kasan/generic.c:626
unpoison_slab_object mm/kasan/common.c:314 [inline]
__kasan_slab_alloc+0x74/0x8c mm/kasan/common.c:340
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3813 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc_node+0x204/0x4c0 mm/slub.c:3903
__alloc_skb+0x19c/0x3d8 net/core/skbuff.c:641
alloc_skb include/linux/skbuff.h:1296 [inline]
kcm_sendmsg+0x1d3c/0x2124 net/kcm/kcmsock.c:783
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
sock_sendmsg+0x220/0x2c0 net/socket.c:768
splice_to_socket+0x7cc/0xd58 fs/splice.c:889
do_splice_from fs/splice.c:941 [inline]
direct_splice_actor+0xec/0x1d8 fs/splice.c:1164
splice_direct_to_actor+0x438/0xa0c fs/splice.c:1108
do_splice_direct_actor fs/splice.c:1207 [inline]
do_splice_direct+0x1e4/0x304 fs/splice.c:1233
do_sendfile+0x460/0xb3c fs/read_write.c:1295
__do_sys_sendfile64 fs/read_write.c:1362 [inline]
__se_sys_sendfile64 fs/read_write.c:1348 [inline]
__arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1348
__invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Freed by task 6167:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_free_info+0x5c/0x74 mm/kasan/generic.c:640
poison_slab_object+0x124/0x18c mm/kasan/common.c:241
__kasan_slab_free+0x3c/0x78 mm/kasan/common.c:257
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2121 [inline]
slab_free mm/slub.c:4299 [inline]
kmem_cache_free+0x15c/0x3d4 mm/slub.c:4363
kfree_skbmem+0x10c/0x19c
__kfree_skb net/core/skbuff.c:1109 [inline]
kfree_skb_reason+0x240/0x6f4 net/core/skbuff.c:1144
kfree_skb include/linux/skbuff.h:1244 [inline]
kcm_release+0x104/0x4c8 net/kcm/kcmsock.c:1685
__sock_release net/socket.c:659 [inline]
sock_close+0xa4/0x1e8 net/socket.c:1421
__fput+0x30c/0x738 fs/file_table.c:376
____fput+0x20/0x30 fs/file_table.c:404
task_work_run+0x230/0x2e0 kernel/task_work.c:180
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0x618/0x1f64 kernel/exit.c:871
do_group_exit+0x194/0x22c kernel/exit.c:1020
get_signal+0x1500/0x15ec kernel/signal.c:2893
do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249
do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148
exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
The buggy address belongs to the object at ffff0000ced0fc80
which belongs to the cache skbuff_head_cache of size 240
The buggy address is located 0 bytes inside of
freed 240-byte region [ffff0000ced0fc80, ffff0000ced0fd70)
The buggy address belongs to the physical page:
page:00000000d35f4ae4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10ed0f
flags: 0x5ffc00000000800(slab|node=0|zone=2|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 05ffc00000000800 ffff0000c1cbf640 fffffdffc3423100 dead000000000004
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff0000ced0fb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff0000ced0fc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff0000ced0fc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff0000ced0fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
ffff0000ced0fd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b72d86aa5df17ce74c60
Tested-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240815220437.69511-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk fix from Petr Mladek:
- Do not block printk on non-panic CPUs when they are dumping
backtraces
* tag 'printk-for-6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk/panic: Allow cpu backtraces to be written into ringbuffer during panic
|
|
These new trace points record xarray indices and the time of
endpoint registration and unregistration, to co-ordinate with
device removal events.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Function bdev_get_queue() must not return NULL, so drop the check in
bdev_write_zeroes_sectors().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Link: https://lore.kernel.org/r/20240815163228.216051-3-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When building with gcc-5:
In function ‘decode_oa_format.isra.26’,
inlined from ‘xe_oa_set_prop_oa_format’ at drivers/gpu/drm/xe/xe_oa.c:1664:6:
././include/linux/compiler_types.h:510:38: error: call to ‘__compiletime_assert_1336’ declared with attribute error: FIELD_GET: mask is not constant
[...]
./include/linux/bitfield.h:155:3: note: in expansion of macro ‘__BF_FIELD_CHECK’
__BF_FIELD_CHECK(_mask, _reg, 0U, "FIELD_GET: "); \
^
drivers/gpu/drm/xe/xe_oa.c:1573:18: note: in expansion of macro ‘FIELD_GET’
u32 bc_report = FIELD_GET(DRM_XE_OA_FORMAT_MASK_BC_REPORT, fmt);
^
Fixes: b6fd51c62119 ("drm/xe/oa/uapi: Define and parse OA stream properties")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240729092634.2227611-1-geert+renesas@glider.be
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit f2881dfdaaa9ec873dbd383ef5512fc31e576cbb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Dell All In One (AIO) models released after 2017 use a backlight
controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501")
Name (_CID, EisaId ("PNP0501")
Commit 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
has added support for this, but I neglected to tie this into
acpi_video_get_backlight_type().
Now the first AIO has turned up which has not only the DSDT bits for this,
but also an actual controller attached to the UART, yet it is not using
this controller for backlight control.
Add support to acpi_video_get_backlight_type() for a new dell_uart
backlight type. So that the existing infra to override the backlight
control method on the commandline or with DMI quirks can be used.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Link: https://patch.msgid.link/20240814190159.15650-2-hdegoede@redhat.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc fixes from Greg KH:
"Here are some small char/misc fixes for 6.11-rc4 to resolve reported
problems. Included in here are:
- fastrpc revert of a change that broke userspace
- xillybus fixes for reported issues
Half of these have been in linux-next this week with no reported
problems, I don't know if the last bit of xillybus driver changes made
it in, but they are 'obviously correct' so will be safe :)"
* tag 'char-misc-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
char: xillybus: Check USB endpoints when probing device
char: xillybus: Refine workqueue handling
Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD"
char: xillybus: Don't destroy workqueue from work item running on it
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"16 hotfixes. All except one are for MM. 10 of these are cc:stable and
the others pertain to post-6.10 issues.
As usual with these merges, singletons and doubletons all over the
place, no identifiable-by-me theme. Please see the lovingly curated
changelogs to get the skinny"
* tag 'mm-hotfixes-stable-2024-08-17-19-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/migrate: fix deadlock in migrate_pages_batch() on large folios
alloc_tag: mark pages reserved during CMA activation as not tagged
alloc_tag: introduce clear_page_tag_ref() helper function
crash: fix riscv64 crash memory reserve dead loop
selftests: memfd_secret: don't build memfd_secret test on unsupported arches
mm: fix endless reclaim on machines with unaccepted memory
selftests/mm: compaction_test: fix off by one in check_compaction()
mm/numa: no task_numa_fault() call if PMD is changed
mm/numa: no task_numa_fault() call if PTE is changed
mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
mm: don't account memmap per-node
mm: add system wide stats items category
mm: don't account memmap on failure
mm/hugetlb: fix hugetlb vs. core-mm PT locking
mseal: fix is_madv_discard()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C core fix replacing IS_ENABLED() with IS_REACHABLE()
For host drivers, there are two fixes:
- Tegra I2C Controller: Addresses a potential double-locking issue
during probe. ACPI devices are not IRQ-safe when invoking runtime
suspend and resume functions, so the irq_safe flag should not be
set.
- Qualcomm GENI I2C Controller: Fixes an oversight in the exit path
of the runtime_resume() function, which was missed in the previous
release"
* tag 'i2c-for-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: tegra: Do not mark ACPI devices as irq safe
i2c: Use IS_REACHABLE() for substituting empty ACPI functions
i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- MGMT: Add error handling to pair_device()
- HCI: Invert LE State quirk to be opt-out rather then opt-in
- hci_core: Fix LE quote calculation
- SMP: Fix assumption of Central always being Initiator
* tag 'for-net-2024-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Add error handling to pair_device()
Bluetooth: SMP: Fix assumption of Central always being Initiator
Bluetooth: hci_core: Fix LE quote calculation
Bluetooth: HCI: Invert LE State quirk to be opt-out rather then opt-in
====================
Link: https://patch.msgid.link/20240815171950.1082068-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
capabilities register
'Legacy Queue & Single Doorbell Support (LSDBS)' field in the controller
capabilities register is supposed to report whether the legacy single
doorbell mode is supported in the controller or not. But some controllers
report '1' in this field which corresponds to 'LSDB not supported', but
they indeed support LSDB. So let's add a quirk to handle those controllers.
If the quirk is enabled by the controller driver, then LSDBS register field
will be ignored and legacy single doorbell mode is assumed to be enabled
always.
Tested-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240816-ufs-bug-fix-v3-1-e6fe0e18e2a3@linaro.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
scsi_logical_block_count() should return the block count of a given SCSI
command. The original implementation ended up shifting twice, leading to an
incorrect count being returned. Fix the conversion between bytes and
logical blocks.
Cc: stable@vger.kernel.org
Fixes: 6a20e21ae1e2 ("scsi: core: Add helper to return number of logical blocks in a request")
Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
Link: https://lore.kernel.org/r/20240813053534.7720-1-chaotian.jing@mediatek.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pull io_uring fixes from Jens Axboe:
- Fix a comment in the uapi header using the wrong member name (Caleb)
- Fix KCSAN warning for a debug check in sqpoll (me)
- Two more NAPI tweaks (Olivier)
* tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linux:
io_uring: fix user_data field name in comment
io_uring/sqpoll: annotate debug task == current with data_race()
io_uring/napi: remove duplicate io_napi_entry timeout assignation
io_uring/napi: check napi_enabled in io_napi_add() before proceeding
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Fix a Bang-bang thermal governor issue causing it to fail to reset the
state of cooling devices if they are 'on' to start with, but the
thermal zone temperature is always below the corresponding trip point
(Rafael Wysocki)"
* tag 'thermal-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: gov_bang_bang: Use governor_data to reduce overhead
thermal: gov_bang_bang: Add .manage() callback
thermal: gov_bang_bang: Split bang_bang_control()
thermal: gov_bang_bang: Call __thermal_cdev_update() directly
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix an issue related to the ACPI EC device handling that causes the
_REG control method to be evaluated for EC operation regions that are
not expected to be used.
This confuses the platform firmware and provokes various types of
misbehavior on some systems (Rafael Wysocki)"
* tag 'acpi-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: EC: Evaluate _REG outside the EC scope more carefully
ACPICA: Add a depth argument to acpi_execute_reg_methods()
Revert "ACPI: EC: Evaluate orphan _REG under EC device"
|
|
io_uring_cqe's user_data field refers to `sqe->data`, but io_uring_sqe
does not have a data field. Fix the comment to say `sqe->user_data`.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://github.com/axboe/liburing/pull/1206
Link: https://lore.kernel.org/r/20240816181526.3642732-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- Bring back a lost return statement in io-page-fault code
- Remove an unused function declaration
* tag 'iommu-fixes-v6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu: Remove unused declaration iommu_sva_unbind_gpasid()
iommu: Restore lost return in iommu_report_device_fault()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All small fixes, mostly for usual suspects, HD-audio and USB-audio
device-specific fixes / quirks. The Cirrus codec support took the
update of SPI header as well. Other than that, there is a regression
fix in the sanity check of ALSA timer code"
* tag 'sound-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/tas2781: Use correct endian conversion
ALSA: usb-audio: Support Yamaha P-125 quirk entry
ALSA: hda: cs35l41: Remove redundant call to hda_cs_dsp_control_remove()
ALSA: hda: cs35l56: Remove redundant call to hda_cs_dsp_control_remove()
ALSA: hda/tas2781: fix wrong calibrated data order
ALSA: usb-audio: Add delay quirk for VIVO USB-C-XE710 HEADSET
ALSA: hda/realtek: Add support for new HP G12 laptops
ALSA: hda/realtek: Fix noise from speakers on Lenovo IdeaPad 3 15IAU7
ALSA: timer: Relax start tick time check for slave timer elements
spi: Add empty versions of ACPI functions
|
|
After running once, the for_each_trip_desc() loop in
bang_bang_manage() is pure needless overhead because it is not going to
make any changes unless a new cooling device has been bound to one of
the trips in the thermal zone or the system is resuming from sleep.
For this reason, make bang_bang_manage() set governor_data for the
thermal zone and check it upfront to decide whether or not it needs to
do anything.
However, governor_data needs to be reset in some cases to let
bang_bang_manage() know that it should walk the trips again, so add an
.update_tz() callback to the governor and make the core additionally
invoke it during system resume.
To avoid affecting the other users of that callback unnecessarily, add
a special notification reason for system resume, THERMAL_TZ_RESUME, and
also pass it to __thermal_zone_device_update() called during system
resume for consistency.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
|
|
I was revisiting the topic of 802.1ad treatment in the Ocelot switch [0]
and realized that not only is its basic VLAN classification pipeline
improper for offloading vlan_protocol 802.1ad bridges, but also improper
for offloading regular 802.1Q bridges already.
Namely, 802.1ad-tagged traffic should be treated as VLAN-untagged by
bridged ports, but this switch treats it as if it was 802.1Q-tagged with
the same VID as in the 802.1ad header. This is markedly different to
what the Linux bridge expects; see the "other_tpid()" function in
tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh.
An idea came to me that the VCAP IS1 TCAM is more powerful than I'm
giving it credit for, and that it actually overwrites the classified VID
before the VLAN Table lookup takes place. In other words, it can be
used even to save a packet from being dropped on ingress due to VLAN
membership.
Add a sophisticated TCAM rule hardcoded into the driver to force the
switch to behave like a Linux bridge with vlan_filtering 1 vlan_protocol
802.1Q.
Regarding the lifetime of the filter: eventually the bridge will
disappear, and vlan_filtering on the port will be restored to 0 for
standalone mode. Then the filter will be deleted.
[0]: https://lore.kernel.org/netdev/20201009122947.nvhye4hvcha3tljh@skbuf/
Fixes: 7142529f1688 ("net: mscc: ocelot: add VLAN filtering")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Through code analysis, I realized that the ds->untag_bridge_pvid logic
is contradictory - see the newly added FIXME above the kernel-doc for
dsa_software_untag_vlan_unaware_bridge().
Moreover, for the Felix driver, I need something very similar, but which
is actually _not_ contradictory: untag the bridge PVID on RX, but for
VLAN-aware bridges. The existing logic does it for VLAN-unaware bridges.
Since I don't want to change the functionality of drivers which were
supposedly properly tested with the ds->untag_bridge_pvid flag, I have
introduced a new one: ds->untag_vlan_aware_bridge_pvid, and I have
refactored the DSA reception code into a common path for both flags.
TODO: both flags should be unified under a single ds->software_vlan_untag,
which users of both current flags should set. This is not something that
can be carried out right away. It needs very careful examination of all
drivers which make use of this functionality, since some of them
actually get this wrong in the first place.
For example, commit 9130c2d30c17 ("net: dsa: microchip: ksz8795: Use
software untagging on CPU port") uses this in a driver which has
ds->configure_vlan_while_not_filtering = true. The latter mechanism has
been known for many years to be broken by design:
https://lore.kernel.org/netdev/CABumfLzJmXDN_W-8Z=p9KyKUVi_HhS7o_poBkeKHS2BkAiyYpw@mail.gmail.com/
and we have the situation of 2 bugs canceling each other. There is no
private VLAN, and the port follows the PVID of the VLAN-unaware bridge.
So, it's kinda ok for that driver to use the ds->untag_bridge_pvid
mechanism, in a broken way.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As explained by Horatiu Vultur in commit 603ead96582d ("net: sparx5: Add
spinlock for frame transmission from CPU") which is for a similar
hardware design, multiple CPUs can simultaneously perform injection
or extraction. There are only 2 register groups for injection and 2
for extraction, and the driver only uses one of each. So we'd better
serialize access using spin locks, otherwise frame corruption is
possible.
Note that unlike in sparx5, FDMA in ocelot does not have this issue
because struct ocelot_fdma_tx_ring already contains an xmit_lock.
I guess this is mostly a problem for NXP LS1028A, as that is dual core.
I don't think VSC7514 is. So I'm blaming the commit where LS1028A (aka
the felix DSA driver) started using register-based packet injection and
extraction.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
register injection
Problem description
-------------------
On an NXP LS1028A (felix DSA driver) with the following configuration:
- ocelot-8021q tagging protocol
- VLAN-aware bridge (with STP) spanning at least swp0 and swp1
- 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700
- ptp4l on swp0.700 and swp1.700
we see that the ptp4l instances do not see each other's traffic,
and they all go to the grand master state due to the
ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition.
Jumping to the conclusion for the impatient
-------------------------------------------
There is a zero-day bug in the ocelot switchdev driver in the way it
handles VLAN-tagged packet injection. The correct logic already exists in
the source code, in function ocelot_xmit_get_vlan_info() added by commit
5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit").
But it is used only for normal NPI-based injection with the DSA "ocelot"
tagging protocol. The other injection code paths (register-based and
FDMA-based) roll their own wrong logic. This affects and was noticed on
the DSA "ocelot-8021q" protocol because it uses register-based injection.
By moving ocelot_xmit_get_vlan_info() to a place that's common for both
the DSA tagger and the ocelot switch library, it can also be called from
ocelot_port_inject_frame() in ocelot.c.
We need to touch the lines with ocelot_ifh_port_set()'s prototype
anyway, so let's rename it to something clearer regarding what it does,
and add a kernel-doc. ocelot_ifh_set_basic() should do.
Investigation notes
-------------------
Debugging reveals that PTP event (aka those carrying timestamps, like
Sync) frames injected into swp0.700 (but also swp1.700) hit the wire
with two VLAN tags:
00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc
~~~~~~~~~~~
00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00
~~~~~~~~~~~
00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03
00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00
00000040: 00 00
The second (unexpected) VLAN tag makes felix_check_xtr_pkt() ->
ptp_classify_raw() fail to see these as PTP packets at the link
partner's receiving end, and return PTP_CLASS_NONE (because the BPF
classifier is not written to expect 2 VLAN tags).
The reason why packets have 2 VLAN tags is because the transmission
code treats VLAN incorrectly.
Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX
feature. Therefore, at xmit time, all VLANs should be in the skb head,
and none should be in the hwaccel area. This is done by:
static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb,
netdev_features_t features)
{
if (skb_vlan_tag_present(skb) &&
!vlan_hw_offload_capable(features, skb->vlan_proto))
skb = __vlan_hwaccel_push_inside(skb);
return skb;
}
But ocelot_port_inject_frame() handles things incorrectly:
ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb));
void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op)
{
(...)
if (vlan_tag)
ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
(...)
}
The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head
is by calling:
static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
{
skb->vlan_present = 0;
}
which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get().
This means that ocelot, when it calls skb_vlan_tag_get(), sees
(and uses) a residual skb->vlan_tci, while the same VLAN tag is
_already_ in the skb head.
The trivial fix for double VLAN headers is to replace the content of
ocelot_ifh_port_set() with:
if (skb_vlan_tag_present(skb))
ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
but this would not be correct either, because, as mentioned,
vlan_hw_offload_capable() is false for us, so we'd be inserting dead
code and we'd always transmit packets with VID=0 in the injection frame
header.
I can't actually test the ocelot switchdev driver and rely exclusively
on code inspection, but I don't think traffic from 8021q uppers has ever
been injected properly, and not double-tagged. Thus I'm blaming the
introduction of VLAN fields in the injection header - early driver code.
As hinted at in the early conclusion, what we _want_ to happen for
VLAN transmission was already described once in commit 5ca721c54d86
("net: dsa: tag_ocelot: set the classified VLAN during xmit").
ocelot_xmit_get_vlan_info() intends to ensure that if the port through
which we're transmitting is under a VLAN-aware bridge, the outer VLAN
tag from the skb head is stripped from there and inserted into the
injection frame header (so that the packet is processed in hardware
through that actual VLAN). And in all other cases, the packet is sent
with VID=0 in the injection frame header, since the port is VLAN-unaware
and has logic to strip this VID on egress (making it invisible to the
wire).
Fixes: 08d02364b12f ("net: mscc: fix the injection header")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In several cases we are freeing pages which were not allocated using
common page allocators. For such cases, in order to keep allocation
accounting correct, we should clear the page tag to indicate that the page
being freed is expected to not have a valid allocation tag. Introduce
clear_page_tag_ref() helper function to be used for this.
Link: https://lkml.kernel.org/r/20240813150758.855881-1-surenb@google.com
Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [6.10]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Fix invalid access to pgdat during hot-remove operation:
ndctl users reported a GPF when trying to destroy a namespace:
$ ndctl destroy-namespace all -r all -f
Segmentation fault
dmesg:
Oops: general protection fault, probably for
non-canonical address 0xdffffc0000005650: 0000 [#1] PREEMPT SMP KASAN
PTI
KASAN: probably user-memory-access in range
[0x000000000002b280-0x000000000002b287]
CPU: 26 UID: 0 PID: 1868 Comm: ndctl Not tainted 6.11.0-rc1 #1
Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS
2.20.1 09/13/2023
RIP: 0010:mod_node_page_state+0x2a/0x110
cxl-test users report a GPF when trying to unload the test module:
$ modrpobe -r cxl-test
dmesg
BUG: unable to handle page fault for address: 0000000000004200
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 UID: 0 PID: 1076 Comm: modprobe Tainted: G O N 6.11.0-rc1 #197
Tainted: [O]=OOT_MODULE, [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/15
RIP: 0010:mod_node_page_state+0x6/0x90
Currently, when memory is hot-plugged or hot-removed the accounting is
done based on the assumption that memmap is allocated from the same node
as the hot-plugged/hot-removed memory, which is not always the case.
In addition, there are challenges with keeping the node id of the memory
that is being remove to the time when memmap accounting is actually
performed: since this is done after remove_pfn_range_from_zone(), and
also after remove_memory_block_devices(). Meaning that we cannot use
pgdat nor walking though memblocks to get the nid.
Given all of that, account the memmap overhead system wide instead.
For this we are going to be using global atomic counters, but given that
memmap size is rarely modified, and normally is only modified either
during early boot when there is only one CPU, or under a hotplug global
mutex lock, therefore there is no need for per-cpu optimizations.
Also, while we are here rename nr_memmap to nr_memmap_pages, and
nr_memmap_boot to nr_memmap_boot_pages to be self explanatory that the
units are in page count.
[pasha.tatashin@soleen.com: address a few nits from David Hildenbrand]
Link: https://lkml.kernel.org/r/20240809191020.1142142-4-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240809191020.1142142-4-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240808213437.682006-4-pasha.tatashin@soleen.com
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@mail.gmail.com
Reported-by: Alison Schofield <alison.schofield@intel.com>
Closes: https://lore.kernel.org/linux-mm/Zq0tPd2h6alFz8XF@aschofie-mobl2/#t
Tested-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Fan Ni <fan.ni@samsung.com>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zhijian <lizhijian@fujitsu.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
/proc/vmstat contains events and stats, events can only grow, but stats
can grow and shrink.
vmstat has the following:
-------------------------
NR_VM_ZONE_STAT_ITEMS: per-zone stats
NR_VM_NUMA_EVENT_ITEMS: per-numa events
NR_VM_NODE_STAT_ITEMS: per-numa stats
NR_VM_WRITEBACK_STAT_ITEMS: system-wide background-writeback and
dirty-throttling tresholds.
NR_VM_EVENT_ITEMS: system-wide events
-------------------------
Rename NR_VM_WRITEBACK_STAT_ITEMS to NR_VM_STAT_ITEMS, to track the
system-wide stats, we are going to add per-page metadata stats to this
category in the next patch.
Also delete unused writeback_stat_name().
Link: https://lkml.kernel.org/r/20240809191020.1142142-2-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240808213437.682006-3-pasha.tatashin@soleen.com
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Suggested-by: Yosry Ahmed <yosryahmed@google.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zhijian <lizhijian@fujitsu.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yi Zhang <yi.zhang@redhat.com>
Cc: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We recently made GUP's common page table walking code to also walk hugetlb
VMAs without most hugetlb special-casing, preparing for the future of
having less hugetlb-specific page table walking code in the codebase.
Turns out that we missed one page table locking detail: page table locking
for hugetlb folios that are not mapped using a single PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.
However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
This issue can be reproduced [1], for example triggering:
[ 3105.936100] ------------[ cut here ]------------
[ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188
[ 3105.944634] Modules linked in: [...]
[ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1
[ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024
[ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3105.991108] pc : try_grab_folio+0x11c/0x188
[ 3105.994013] lr : follow_page_pte+0xd8/0x430
[ 3105.996986] sp : ffff80008eafb8f0
[ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43
[ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48
[ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978
[ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001
[ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000
[ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000
[ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0
[ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080
[ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000
[ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000
[ 3106.047957] Call trace:
[ 3106.049522] try_grab_folio+0x11c/0x188
[ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0
[ 3106.055527] follow_page_mask+0x1a0/0x2b8
[ 3106.058118] __get_user_pages+0xf0/0x348
[ 3106.060647] faultin_page_range+0xb0/0x360
[ 3106.063651] do_madvise+0x340/0x598
Let's make huge_pte_lockptr() effectively use the same PT locks as any
core-mm page table walker would. Add ptep_lockptr() to obtain the PTE
page table lock using a pte pointer -- unfortunately we cannot convert
pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables
we can have with CONFIG_HIGHPTE.
Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such
that when e.g., CONFIG_PGTABLE_LEVELS==2 with
PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected. Document
why that works.
There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
folio being mapped using two PTE page tables. While hugetlb wants to take
the PMD table lock, core-mm would grab the PTE table lock of one of both
PTE page tables. In such corner cases, we have to make sure that both
locks match, which is (fortunately!) currently guaranteed for 8xx as it
does not support SMP and consequently doesn't use split PT locks.
[1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
Link: https://lkml.kernel.org/r/20240801204748.99107-1-david@redhat.com
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- gcc-plugins: randstruct: Remove GCC 4.7 or newer requirement
(Thorsten Blum)
- kallsyms: Clean up interaction with LTO suffixes (Song Liu)
- refcount: Report UAF for refcount_sub_and_test(0) when counter==0
(Petr Pavlu)
- kunit/overflow: Avoid misallocation of driver name (Ivan Orlov)
* tag 'hardening-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kallsyms: Match symbols exactly with CONFIG_LTO_CLANG
kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols
kunit/overflow: Fix UB in overflow_allocation_test
gcc-plugins: randstruct: Remove GCC 4.7 or newer requirement
refcount: Report UAF for refcount_sub_and_test(0) when counter==0
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wireless and netfilter
Current release - regressions:
- udp: fall back to software USO if IPv6 extension headers are
present
- wifi: iwlwifi: correctly lookup DMA address in SG table
Current release - new code bugs:
- eth: mlx5e: fix queue stats access to non-existing channels splat
Previous releases - regressions:
- eth: mlx5e: take state lock during tx timeout reporter
- eth: mlxbf_gige: disable RX filters until RX path initialized
- eth: igc: fix reset adapter logics when tx mode change
Previous releases - always broken:
- tcp: update window clamping condition
- netfilter:
- nf_queue: drop packets with cloned unconfirmed conntracks
- nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
- vsock: fix recursive ->recvmsg calls
- dsa: vsc73xx: fix MDIO bus access and PHY opera
- eth: gtp: pull network headers in gtp_dev_xmit()
- eth: igc: fix packet still tx after gate close by reducing i226 MAC
retry buffer
- eth: mana: fix RX buf alloc_size alignment and atomic op panic
- eth: hns3: fix a deadlock problem when config TC during resetting"
* tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
net: hns3: use correct release function during uninitialization
net: hns3: void array out of bound when loop tnl_num
net: hns3: fix a deadlock problem when config TC during resetting
net: hns3: use the user's cfg after reset
net: hns3: fix wrong use of semaphore up
selftests: net: lib: kill PIDs before del netns
pse-core: Conditionally set current limit during PI regulator registration
net: thunder_bgx: Fix netdev structure allocation
net: ethtool: Allow write mechanism of LPL and both LPL and EPL
vsock: fix recursive ->recvmsg calls
selftest: af_unix: Fix kselftest compilation warnings
netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
netfilter: nf_tables: Introduce nf_tables_getobj_single
netfilter: nf_tables: Audit log dump reset after the fact
selftests: netfilter: add test for br_netfilter+conntrack+queue combination
netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
netfilter: flowtable: initialise extack before use
netfilter: nfnetlink: Initialise extack before use in ACKs
netfilter: allow ipv6 fragments to arrive on different devices
tcp: Update window clamping condition
...
|
|
This inverts the LE State quirk so by default we assume the controllers
would report valid states rather than invalid which is how quirks
normally behave, also this would result in HCI command failing it the LE
States are really broken thus exposing the controllers that are really
broken in this respect.
Link: https://github.com/bluez/bluez/issues/584
Fixes: 220915857e29 ("Bluetooth: Adding driver and quirk defs for multi-role LE")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This reverts commit bab2f5e8fd5d2f759db26b78d9db57412888f187.
Joel reported that this commit breaks userspace and stops sensors in
SDM845 from working. Also breaks other qcom SoC devices running postmarketOS.
Cc: stable <stable@kernel.org>
Cc: Ekansh Gupta <quic_ekangupt@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reported-by: Joel Selvaraj <joelselvaraj.oss@gmail.com>
Link: https://lore.kernel.org/r/9a9f5646-a554-4b65-8122-d212bb665c81@umsystem.edu
Signed-off-by: Griffin Kroah-Hartman <griffin@kroah.com>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Fixes: bab2f5e8fd5d ("misc: fastrpc: Restrict untrusted app to attach to privileged PD")
Link: https://lore.kernel.org/r/20240815094920.8242-1-griffin@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
After a vsock socket has been added to a BPF sockmap, its prot->recvmsg
has been replaced with vsock_bpf_recvmsg(). Thus the following
recursiion could happen:
vsock_bpf_recvmsg()
-> __vsock_recvmsg()
-> vsock_connectible_recvmsg()
-> prot->recvmsg()
-> vsock_bpf_recvmsg() again
We need to fix it by calling the original ->recvmsg() without any BPF
sockmap logic in __vsock_recvmsg().
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Reported-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com
Tested-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com
Cc: Bobby Eshleman <bobby.eshleman@bytedance.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240812022153.86512-1-xiyou.wangcong@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Pull kvm fixes from Paolo Bonzini:
"s390:
- Fix failure to start guests with kvm.use_gisa=0
- Panic if (un)share fails to maintain security.
ARM:
- Use kvfree() for the kvmalloc'd nested MMUs array
- Set of fixes to address warnings in W=1 builds
- Make KVM depend on assembler support for ARMv8.4
- Fix for vgic-debug interface for VMs without LPIs
- Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest
- Minor code / comment cleanups for configuring PAuth traps
- Take kvm->arch.config_lock to prevent destruction / initialization
race for a vCPU's CPUIF which may lead to a UAF
x86:
- Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
- Fix smatch issues
- Small cleanups
- Make x2APIC ID 100% readonly
- Fix typo in uapi constant
Generic:
- Use synchronize_srcu_expedited() on irqfd shutdown"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG
KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
KVM: eventfd: Use synchronize_srcu_expedited() on shutdown
KVM: selftests: Add a testcase to verify x2APIC is fully readonly
KVM: x86: Make x2APIC ID 100% readonly
KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())
KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()
KVM: SVM: Fix an error code in sev_gmem_post_populate()
KVM: SVM: Fix uninitialized variable bug
KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface
KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list
KVM: arm64: Tidying up PAuth code in KVM
KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI
KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain
s390/uv: Panic for set and remove shared access UVC errors
KVM: s390: fix validity interception issue when gisa is switched off
docs: KVM: Fix register ID of SPSR_FIQ
KVM: arm64: vgic: fix unexpected unlock sparse warnings
KVM: arm64: fix kdoc warnings in W=1 builds
KVM: arm64: fix override-init warnings in W=1 builds
...
|
|
Replace IS_ENABLED() with IS_REACHABLE() to substitute empty stubs for:
i2c_acpi_get_i2c_resource()
i2c_acpi_client_count()
i2c_acpi_find_bus_speed()
i2c_acpi_new_device_by_fwnode()
i2c_adapter *i2c_acpi_find_adapter_by_handle()
i2c_acpi_waive_d0_probe()
commit f17c06c6608a ("i2c: Fix conditional for substituting empty ACPI
functions") partially fixed this conditional to depend on CONFIG_I2C,
but used IS_ENABLED(), which is wrong since CONFIG_I2C is tristate.
CONFIG_ACPI is boolean but let's also change it to use IS_REACHABLE()
to future-proof it against becoming tristate.
Somehow despite testing various combinations of CONFIG_I2C and CONFIG_ACPI
we missed the combination CONFIG_I2C=m, CONFIG_ACPI=y.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: f17c06c6608a ("i2c: Fix conditional for substituting empty ACPI functions")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408141333.gYnaitcV-lkp@intel.com/
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
|
"INVALID" is misspelt in "SEV_RET_INAVLID_CONFIG". Since this is part of
the UAPI, keep the current definition and add a new one with the fix.
Fix-suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Amit Shah <amit.shah@amd.com>
Message-ID: <20240814083113.21622-1-amit@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
directly emulate instructions for ES/SNP, and instead the guest must
explicitly request emulation. Unless the guest explicitly requests
emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
SPTE, with the subsequent #NPF being reflected into the guest as a #VC.
But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
because except for ES/SNP, doing so requires setting reserved bits in the
SPTE, i.e. the SPTE can't be readable while also generating a #VC on
writes. Because KVM never creates MMIO SPTEs and jumps directly to
emulation, the guest never gets a #VC. And since KVM simply resumes the
guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
into an infinite #NPF loop if the vCPU attempts to write read-only memory.
Disallow read-only memory for all VMs with protected state, i.e. for
upcoming TDX VMs as well as ES/SNP VMs. For TDX, it's actually possible
to support read-only memory, as TDX uses EPT Violation #VE to reflect the
fault into the guest, e.g. KVM could configure read-only SPTEs with RX
protections and SUPPRESS_VE=0. But there is no strong use case for
supporting read-only memslots on TDX, e.g. the main historical usage is
to emulate option ROMs, but TDX disallows executing from shared memory.
And if someone comes along with a legitimate, strong use case, the
restriction can always be lifted for TDX.
Don't bother trying to retroactively apply the restriction to SEV-ES
VMs that are created as type KVM_X86_DEFAULT_VM. Read-only memslots can't
possibly work for SEV-ES, i.e. disallowing such memslots is really just
means reporting an error to userspace instead of silently hanging vCPUs.
Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
isn't worth the marginal benefit it would provide userspace.
Fixes: 26c44aa9e076 ("KVM: SEV: define VM types for SEV and SEV-ES")
Fixes: 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support")
Cc: Peter Gonda <pgonda@google.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Cc: Ackerly Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240809190319.1710470-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"VFS:
- Fix the name of file lease slab cache. When file leases were split
out of file locks the name of the file lock slab cache was used for
the file leases slab cache as well.
- Fix a type in take_fd() helper.
- Fix infinite directory iteration for stable offsets in tmpfs.
- When the icache is pruned all reclaimable inodes are marked with
I_FREEING and other processes that try to lookup such inodes will
block.
But some filesystems like ext4 can trigger lookups in their inode
evict callback causing deadlocks. Ext4 does such lookups if the
ea_inode feature is used whereby a separate inode may be used to
store xattrs.
Introduce I_LRU_ISOLATING which pins the inode while its pages are
reclaimed. This avoids inode deletion during inode_lru_isolate()
avoiding the deadlock and evict is made to wait until
I_LRU_ISOLATING is done.
netfs:
- Fault in smaller chunks for non-large folio mappings for
filesystems that haven't been converted to large folios yet.
- Fix the CONFIG_NETFS_DEBUG config option. The config option was
renamed a short while ago and that introduced two minor issues.
First, it depended on CONFIG_NETFS whereas it wants to depend on
CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter
does. Second, the documentation for the config option wasn't fixed
up.
- Revert the removal of the PG_private_2 writeback flag as ceph is
using it and fix how that flag is handled in netfs.
- Fix DIO reads on 9p. A program watching a file on a 9p mount
wouldn't see any changes in the size of the file being exported by
the server if the file was changed directly in the source
filesystem. Fix this by attempting to read the full size specified
when a DIO read is requested.
- Fix a NULL pointer dereference bug due to a data race where a
cachefiles cookies was retired even though it was still in use.
Check the cookie's n_accesses counter before discarding it.
nsfs:
- Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as
the kernel is writing to userspace.
pidfs:
- Prevent the creation of pidfds for kthreads until we have a
use-case for it and we know the semantics we want. It also confuses
userspace why they can get pidfds for kthreads.
squashfs:
- Fix an unitialized value bug reported by KMSAN caused by a
corrupted symbolic link size read from disk. Check that the
symbolic link size is not larger than expected"
* tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
Squashfs: sanity check symbolic link size
9p: Fix DIO read through netfs
vfs: Don't evict inode under the inode lru traversing context
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
file: fix typo in take_fd() comment
pidfd: prevent creation of pidfds for kthreads
netfs: clean up after renaming FSCACHE_DEBUG config
libfs: fix infinite directory reads for offset dir
nsfs: fix ioctl declaration
fs/netfs/fscache_cookie: add missing "n_accesses" check
filelock: fix name of file_lease slab cache
netfs: Fault in smaller chunks for non-large folio mappings
|
|
commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
to ringbuffer") disabled non-panic CPUs to further write messages to
ringbuffer after panicked.
Since the commit, non-panicked CPU's are not allowed to write to
ring buffer after panicked and CPU backtrace which is triggered
after panicked to sample non-panicked CPUs' backtrace no longer
serves its function as it has nothing to print.
Fix the issue by allowing non-panicked CPUs to write into ringbuffer
while CPU backtrace is in flight.
Fixes: 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing to ringbuffer")
Signed-off-by: Ryo Takakura <takakura@valinux.co.jp>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240812072703.339690-1-takakura@valinux.co.jp
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
The inode reclaiming process(See function prune_icache_sb) collects all
reclaimable inodes and mark them with I_FREEING flag at first, at that
time, other processes will be stuck if they try getting these inodes
(See function find_inode_fast), then the reclaiming process destroy the
inodes by function dispose_list(). Some filesystems(eg. ext4 with
ea_inode feature, ubifs with xattr) may do inode lookup in the inode
evicting callback function, if the inode lookup is operated under the
inode lru traversing context, deadlock problems may happen.
Case 1: In function ext4_evict_inode(), the ea inode lookup could happen
if ea_inode feature is enabled, the lookup process will be stuck
under the evicting context like this:
1. File A has inode i_reg and an ea inode i_ea
2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea
3. Then, following three processes running like this:
PA PB
echo 2 > /proc/sys/vm/drop_caches
shrink_slab
prune_dcache_sb
// i_reg is added into lru, lru->i_ea->i_reg
prune_icache_sb
list_lru_walk_one
inode_lru_isolate
i_ea->i_state |= I_FREEING // set inode state
inode_lru_isolate
__iget(i_reg)
spin_unlock(&i_reg->i_lock)
spin_unlock(lru_lock)
rm file A
i_reg->nlink = 0
iput(i_reg) // i_reg->nlink is 0, do evict
ext4_evict_inode
ext4_xattr_delete_inode
ext4_xattr_inode_dec_ref_all
ext4_xattr_inode_iget
ext4_iget(i_ea->i_ino)
iget_locked
find_inode_fast
__wait_on_freeing_inode(i_ea) ----→ AA deadlock
dispose_list // cannot be executed by prune_icache_sb
wake_up_bit(&i_ea->i_state)
Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file
deleting process holds BASEHD's wbuf->io_mutex while getting the
xattr inode, which could race with inode reclaiming process(The
reclaiming process could try locking BASEHD's wbuf->io_mutex in
inode evicting function), then an ABBA deadlock problem would
happen as following:
1. File A has inode ia and a xattr(with inode ixa), regular file B has
inode ib and a xattr.
2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa
3. Then, following three processes running like this:
PA PB PC
echo 2 > /proc/sys/vm/drop_caches
shrink_slab
prune_dcache_sb
// ib and ia are added into lru, lru->ixa->ib->ia
prune_icache_sb
list_lru_walk_one
inode_lru_isolate
ixa->i_state |= I_FREEING // set inode state
inode_lru_isolate
__iget(ib)
spin_unlock(&ib->i_lock)
spin_unlock(lru_lock)
rm file B
ib->nlink = 0
rm file A
iput(ia)
ubifs_evict_inode(ia)
ubifs_jnl_delete_inode(ia)
ubifs_jnl_write_inode(ia)
make_reservation(BASEHD) // Lock wbuf->io_mutex
ubifs_iget(ixa->i_ino)
iget_locked
find_inode_fast
__wait_on_freeing_inode(ixa)
| iput(ib) // ib->nlink is 0, do evict
| ubifs_evict_inode
| ubifs_jnl_delete_inode(ib)
↓ ubifs_jnl_write_inode
ABBA deadlock ←-----make_reservation(BASEHD)
dispose_list // cannot be executed by prune_icache_sb
wake_up_bit(&ixa->i_state)
Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING
to pin the inode in memory while inode_lru_isolate() reclaims its pages
instead of using ordinary inode reference. This way inode deletion
cannot be triggered from inode_lru_isolate() thus avoiding the deadlock.
evict() is made to wait for I_LRU_ISOLATING to be cleared before
proceeding with inode cleanup.
Link: https://lore.kernel.org/all/37c29c42-7685-d1f0-067d-63582ffac405@huaweicloud.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219022
Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
Fixes: 7959cf3a7506 ("ubifs: journal: Handle xattrs like files")
Cc: stable@vger.kernel.org
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20240809031628.1069873-1-chengzhihao@huaweicloud.com
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
A subsequent change will need to pass a depth argument to
acpi_execute_reg_methods(), so prepare that function for it.
No intentional functional changes.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/8451567.NyiUUSuA9g@rjwysocki.net
|
|
This reverts commit 0e6b6dedf168 ("Revert "ACPI: EC: Evaluate orphan
_REG under EC device") because the problem addressed by it will be
addressed differently in what follows.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/3236716.5fSG56mABF@rjwysocki.net
|
|
doorbell rings
After napi_complete_done() is called when NAPI is polling in the current
process context, another NAPI may be scheduled and start running in
softirq on another CPU and may ring the doorbell before the current CPU
does. When combined with unnecessary rings when there is no need to arm
the CQ, it triggers error paths in the hardware.
This patch fixes this by calling napi_complete_done() after doorbell
rings. It limits the number of unnecessary rings when there is
no need to arm. MANA hardware specifies that there must be one doorbell
ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as
soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical
workloads, the 4 CQ wraparounds proves to be big enough that it rarely
exceeds this limit before all the napi weight is consumed.
To implement this, add a per-CQ counter cq->work_done_since_doorbell,
and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
Cc: stable@vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Commit 0c9f17877891 ("iommu: Remove guest pasid related interfaces and definitions")
removed the implementation but leave declaration.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20240808140619.2498535-1-yuehaibing@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The commit f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT")
fixed a NULL pointer dereference panic, but didn't fix the issue that
fails to update attached freplace prog to prog_array map.
Since commit 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility"),
freplace prog and its target prog are able to tail call each other.
And the commit 3aac1ead5eb6 ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach")
sets prog->aux->dst_prog as NULL after attaching freplace prog to its
target prog.
After loading freplace the prog_array's owner type is BPF_PROG_TYPE_SCHED_CLS.
Then, after attaching freplace its prog->aux->dst_prog is NULL.
Then, while updating freplace in prog_array the bpf_prog_map_compatible()
incorrectly returns false because resolve_prog_type() returns
BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS.
After this patch the resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS
and update to prog_array can succeed.
Fixes: f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT")
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20240728114612.48486-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|