Age | Commit message (Collapse) | Author |
|
A low attr::freq value cannot be set via IOC_PERIOD on some platforms.
The perf_event_check_period() introduced in:
81ec3f3c4c4d ("perf/x86: Add check_period PMU callback")
was intended to check the period, rather than the frequency.
A low frequency may be mistakenly rejected by limit_period().
Fix it.
Fixes: 81ec3f3c4c4d ("perf/x86: Add check_period PMU callback")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250117151913.3043942-2-kan.liang@linux.intel.com
Closes: https://lore.kernel.org/lkml/20250115154949.3147-1-ravi.bangoria@amd.com/
|
|
Perf doesn't work at low frequencies:
$ perf record -e cpu_core/instructions/ppp -F 120
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (cpu_core/instructions/ppp).
"dmesg | grep -i perf" may provide additional information.
The limit_period() check avoids a low sampling period on a counter. It
doesn't intend to limit the frequency.
The check in the x86_pmu_hw_config() should be limited to non-freq mode.
The attr.sample_period and attr.sample_freq are union. The
attr.sample_period should not be used to indicate the frequency mode.
Fixes: c46e665f0377 ("perf/x86: Add INST_RETIRED.ALL workarounds")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250117151913.3043942-1-kan.liang@linux.intel.com
Closes: https://lore.kernel.org/lkml/20250115154949.3147-1-ravi.bangoria@amd.com/
|
|
Typically, SoundWire MIC and PCH DMIC will not coexist. However, we may
want to use both of them in some special cases. Add a warning to let
users know that SoundWire MIC and PCH DMIC are both present and they
could overwrite it with kernel params.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://patch.msgid.link/20250225093716.67240-3-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Currently, we assume that the PCH DMIC pins are pin-muxed with SoundWire
links. However, we do see a HW design that use PCH DMIC along with 3
SoundWire links. Remove the check now.
With this change the PCM DMIC will be presented if it is reported by the
BIOS irrespective of whether there are SDW links present or not.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://patch.msgid.link/20250225093716.67240-2-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Jason has been helping as reviewer for this area already, and has
contributed various features directly, notably BPF timestamping.
Also extend coverage to all timestamping tests, including those new
with BPF timestamping.
Link: https://lore.kernel.org/netdev/20250220072940.99994-1-kerneljasonxing@gmail.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250222172839.642079-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We found an issue when using bpf_redirect with ipvs NAT mode after
commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
the same name space"). Particularly, we use bpf_redirect to return
the skb directly back to the netif it comes from, i.e., xnet is
false in skb_scrub_packet(), and then ipvs_property is preserved
and SNAT is skipped in the rx path.
ipvs_property has been already cleared when netns is changed in
commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
SKB net namespace changed"). This patch just clears it in spite of
netns.
Fixes: 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed")
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Link: https://patch.msgid.link/20250222033518.126087-1-lulie@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
xfs_buf_stale already set b_lru_ref to 0, and thus prevents the buffer
from moving to the LRU. Remove the duplicate check.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
The buffer cache keeps a bt_io_count per-CPU counter to track all
in-flight I/O, which is used to ensure no I/O is in flight when
unmounting the file system.
For most I/O we already keep track of inflight I/O at higher levels:
- for synchronous I/O (xfs_buf_read/xfs_bwrite/xfs_buf_delwri_submit),
the caller has a reference and waits for I/O completions using
xfs_buf_iowait
- for xfs_buf_delwri_submit_nowait the only caller (AIL writeback)
tracks the log items that the buffer attached to
This only leaves only xfs_buf_readahead_map as a submitter of
asynchronous I/O that is not tracked by anything else. Replace the
bt_io_count per-cpu counter with a more specific bt_readahead_count
counter only tracking readahead I/O. This allows to simply increment
it when submitting readahead I/O and decrementing it when it completed,
and thus simplify xfs_buf_rele and remove the needed for the
XBF_NO_IOACCT flags and the XFS_BSTATE_IN_FLIGHT buffer state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
xfs_buf_readahead_map is the only caller of xfs_buf_read_map and thus
_xfs_buf_read that is not synchronous. Split it from xfs_buf_read_map
so that the asynchronous path is self-contained and the now purely
synchronous xfs_buf_read_map / _xfs_buf_read implementation can be
simplified.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Currently all metadata I/O completions happen in the m_buf_workqueue
workqueue. But for synchronous I/O (i.e. all buffer reads) there is no
need for that, as there always is a called in process context that is
waiting for the I/O. Factor out the guts of xfs_buf_ioend into a
separate helper and call it from xfs_buf_iowait to avoid a double
an extra context switch to the workqueue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
params->total_weight is not initialized during bind and not updated when
the bound cdev changes. The cooling device weight will not be used due
to the uninitialized total_weight, until an update via sysfs is
triggered.
The bound cdevs are updated during thermal zone registration, where each
cooling device will be bound to the thermal zone one by one, but
power_allocator_bind() can be called without an additional cdev update
when manually changing the policy of a thermal zone via sysfs.
Add a new function to handle weight update logic, including updating
total_weight, and call it when bind, weight changes, and cdev updates to
ensure total_weight is always correct.
Fixes: a3cd6db4cc2e ("thermal: gov_power_allocator: Support new update callback of weights")
Signed-off-by: Yu-Che Cheng <giver@chromium.org>
Link: https://patch.msgid.link/20250222-fix-power-allocator-weight-v2-1-a94de86b685a@chromium.org
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Since thermal_of_should_bind() terminates the loop after processing
the first child found in cooling-maps, it will never match more than
one cdev to a given trip point which is incorrect, as there may be
cooling-maps associating one trip point with multiple cooling devices.
Address this by letting the loop continue until either all
children have been processed or a matching one has been found.
To avoid adding conditionals or goto statements, put the loop in
question into a separate function and make that function return
right away after finding a matching cooling-maps entry.
Fixes: 94c6110b0b13 ("thermal/of: Use the .should_bind() thermal zone callback")
Link: https://lore.kernel.org/linux-pm/20250219-fix-thermal-of-v1-1-de36e7a590c4@chromium.org/
Reported-by: Yu-Che Cheng <giver@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Yu-Che Cheng <giver@chromium.org>
Tested-by: Yu-Che Cheng <giver@chromium.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/2788228.mvXUDI8C0e@rjwysocki.net
|
|
Combine 'else' and 'if' conditional statements onto a single line and drop
unrequired braces, as is standard coding style.
The code had been like this since commit c3b0e880bbfa ("iomap: support
REQ_OP_ZONE_APPEND").
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250224154538.548028-1-john.g.garry@oracle.com
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
As per dt-bindings the property is called vddio-supply, so use the
correct name in the driver instead of iovdd. The datasheet also calls
the supply 'VDDIO'.
Fixes: 44362279bdd4 ("Input: add core support for Goodix Berlin Touchscreen IC")
Cc: stable@vger.kernel.org
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250103-goodix-berlin-fixes-v1-2-b014737b08b2@fairphone.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
In the statement above AVDD gets enabled, and not IOVDD, so fix this
copy-paste mistake.
Fixes: 44362279bdd4 ("Input: add core support for Goodix Berlin Touchscreen IC")
Reported-by: Jens Reidel <adrian@travitia.xyz>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250103-goodix-berlin-fixes-v1-1-b014737b08b2@fairphone.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Add support for imagis IST3038H, which seems mostly compatible with
IST3038C except that it reports a different chip ID value.
Tested on samsung,j5y17lte.
Signed-off-by: Andras Sebok <sebokandris2009@gmail.com>
Link: https://lore.kernel.org/r/20250224090354.102903-2-sebokandris2009@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
When an Icelake or Sapphire Rapids CPU isn't providing the maximum and
critical thresholds for particular DIMM the driver should return an
error to the userspace instead of giving it stale (best case) or wrong
(the structure contains all zeros after kzalloc() call) data.
The issue can be reproduced by binding the peci driver while the host is
fully booted and idle, this makes PECI interaction unreliable enough.
Fixes: 73bc1b885dae ("hwmon: peci: Add dimmtemp driver")
Fixes: 621995b6d795 ("hwmon: (peci/dimmtemp) Add Sapphire Rapids support")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Reviewed-by: Iwona Winiarska <iwona.winiarska@intel.com>
Link: https://lore.kernel.org/r/20250123122003.6010-1-fercerpav@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
IST3038H is a touchscreen IC which seems mostly compatible with IST3038C
except that it reports a different chip ID value.
Signed-off-by: Andras Sebok <sebokandris2009@gmail.com>
Link: https://lore.kernel.org/r/20250224090354.102903-4-sebokandris2009@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for cacheinfo DT probing to avoid reading non-boolean
properties as booleans.
- A fix for cpufeature to use bitmap_equal() instead of memcmp(), so
unused bits are ignored.
- Fixes for cmpxchg and futex cmpxchg that properly encode the sign
extension requirements on inline asm, which results in spurious
successes. This manifests in at least inode_set_ctime_current, but is
likely just a disaster waiting to happen.
- A fix for the rseq selftests, which was using an invalid constraint.
- A pair of fixes for signal frame size handling:
- We were reserving space for an extra empty extension context
header on systems with extended signal context, thus resulting in
unnecessarily large allocations.
- We weren't properly checking for available extensions before
calculating the signal stack size, which resulted in undersized
stack allocations on some systems (at least those with T-Head
custom vectors).
Also, we've added Alex as a reviewer. He's been helping out a ton
lately, thanks!
* tag 'riscv-for-linus-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
MAINTAINERS: Add myself as a riscv reviewer
riscv: signal: fix signal_minsigstksz
riscv: signal: fix signal frame size
rseq/selftests: Fix riscv rseq_offset_deref_addv inline asm
riscv/futex: sign extend compare value in atomic cmpxchg
riscv/atomic: Do proper sign extension also for unsigned in arch_cmpxchg
riscv: cpufeature: use bitmap_equal() instead of memcmp()
riscv: cacheinfo: Use of_property_present() for non-boolean properties
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:
- dm-vdo: add missing spin_lock_init
- dm-integrity: divide-by-zero fix
- dm-integrity: do not report unused entries in the table line
* tag 'for-6.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm vdo: add missing spin_lock_init
dm-integrity: Do not emit journal configuration in DM table for Inline mode
dm-integrity: Avoid divide by zero in table status in Inline mode
|
|
The PCI P2PDMA code will register the CMB block to the memory
hot-plugging subsystem, which have an alignment requirement. Memory
blocks that do not satisfy this alignment requirement (usually 2MB) will
lead to a WARNING from memory hotplugging.
Verify the CMB block's address and size against the alignment and only
try to send CMB blocks compatible with it to prevent this warning.
Tested on Intel DC D4502 SSD, which has a 512K CMB block that is too
small for memory hotplugging (thus PCI P2PDMA).
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
CMB decoding should get disabled when the CMB block isn't successfully
registered to P2P DMA subsystem.
Clean up the CMBMSC register in this error handling codepath to disable
CMB decoding (and CMBLOC/CMBSZ registers).
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
nvme_tcp_poll() may race with the send path error handler because
it may complete the request while it is actively being polled for
completion, resulting in a UAF panic [1]:
We should make sure to stop polling when we see an error when
trying to read from the socket. Hence make sure to propagate the
error so that the block layer breaks the polling cycle.
[1]:
--
[35665.692310] nvme nvme2: failed to send request -13
[35665.702265] nvme nvme2: unsupported pdu type (3)
[35665.702272] BUG: kernel NULL pointer dereference, address: 0000000000000000
[35665.702542] nvme nvme2: queue 1 receive failed: -22
[35665.703209] #PF: supervisor write access in kernel mode
[35665.703213] #PF: error_code(0x0002) - not-present page
[35665.703214] PGD 8000003801cce067 P4D 8000003801cce067 PUD 37e6f79067 PMD 0
[35665.703220] Oops: 0002 [#1] SMP PTI
[35665.703658] nvme nvme2: starting error recovery
[35665.705809] Hardware name: Inspur aaabbb/YZMB-00882-104, BIOS 4.1.26 09/22/2022
[35665.705812] Workqueue: kblockd blk_mq_requeue_work
[35665.709172] RIP: 0010:_raw_spin_lock+0xc/0x30
[35665.715788] Call Trace:
[35665.716201] <TASK>
[35665.716613] ? show_trace_log_lvl+0x1c1/0x2d9
[35665.717049] ? show_trace_log_lvl+0x1c1/0x2d9
[35665.717457] ? blk_mq_request_bypass_insert+0x2c/0xb0
[35665.717950] ? __die_body.cold+0x8/0xd
[35665.718361] ? page_fault_oops+0xac/0x140
[35665.718749] ? blk_mq_start_request+0x30/0xf0
[35665.719144] ? nvme_tcp_queue_rq+0xc7/0x170 [nvme_tcp]
[35665.719547] ? exc_page_fault+0x62/0x130
[35665.719938] ? asm_exc_page_fault+0x22/0x30
[35665.720333] ? _raw_spin_lock+0xc/0x30
[35665.720723] blk_mq_request_bypass_insert+0x2c/0xb0
[35665.721101] blk_mq_requeue_work+0xa5/0x180
[35665.721451] process_one_work+0x1e8/0x390
[35665.721809] worker_thread+0x53/0x3d0
[35665.722159] ? process_one_work+0x390/0x390
[35665.722501] kthread+0x124/0x150
[35665.722849] ? set_kthread_struct+0x50/0x50
[35665.723182] ret_from_fork+0x1f/0x30
Reported-by: Zhang Guanghui <zhang.guanghui@cestc.cn>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Test XDP and HDS interaction. While at it add a test for using the IOCTL,
as that turned out to be the real culprit.
Testing bnxt:
# NETIF=eth0 ./ksft-net-drv/drivers/net/hds.py
KTAP version 1
1..12
ok 1 hds.get_hds
ok 2 hds.get_hds_thresh
ok 3 hds.set_hds_disable # SKIP disabling of HDS not supported by the device
ok 4 hds.set_hds_enable
ok 5 hds.set_hds_thresh_zero
ok 6 hds.set_hds_thresh_max
ok 7 hds.set_hds_thresh_gt
ok 8 hds.set_xdp
ok 9 hds.enabled_set_xdp
ok 10 hds.ioctl
ok 11 hds.ioctl_set_xdp
ok 12 hds.ioctl_enabled_set_xdp
# Totals: pass:11 fail:0 xfail:0 xpass:0 skip:1 error:0
and netdevsim:
# ./ksft-net-drv/drivers/net/hds.py
KTAP version 1
1..12
ok 1 hds.get_hds
ok 2 hds.get_hds_thresh
ok 3 hds.set_hds_disable
ok 4 hds.set_hds_enable
ok 5 hds.set_hds_thresh_zero
ok 6 hds.set_hds_thresh_max
ok 7 hds.set_hds_thresh_gt
ok 8 hds.set_xdp
ok 9 hds.enabled_set_xdp
ok 10 hds.ioctl
ok 11 hds.ioctl_set_xdp
ok 12 hds.ioctl_enabled_set_xdp
# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
Netdevsim needs a sane default for tx/rx ring size.
ethtool 6.11 is needed for the --disable-netlink option.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250221025141.1132944-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The legacy ioctl path does not have support for extended attributes.
So we issue a GET to fetch the current settings from the driver,
in an attempt to keep them unchanged. HDS is a bit "special" as
the GET only returns on/off while the SET takes a "ternary" argument
(on/off/default). If the driver was in the "default" setting -
executing the ioctl path binds it to on or off, even tho the user
did not intend to change HDS config.
Factor the relevant logic out of the netlink code and reuse it.
Fixes: 87c8f8496a05 ("bnxt_en: add support for tcp-data-split ethtool command")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Daniel Xu <dxu@dxuuu.xyz>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250221025141.1132944-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently we treat EFAULT from hmm_range_fault() as a non-fatal error
when called from xe_vm_userptr_pin() with the idea that we want to avoid
killing the entire vm and chucking an error, under the assumption that
the user just did an unmap or something, and has no intention of
actually touching that memory from the GPU. At this point we have
already zapped the PTEs so any access should generate a page fault, and
if the pin fails there also it will then become fatal.
However it looks like it's possible for the userptr vma to still be on
the rebind list in preempt_rebind_work_func(), if we had to retry the
pin again due to something happening in the caller before we did the
rebind step, but in the meantime needing to re-validate the userptr and
this time hitting the EFAULT.
This explains an internal user report of hitting:
[ 191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe]
[ 191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738690] Call Trace:
[ 191.738692] <TASK>
[ 191.738694] ? show_regs+0x69/0x80
[ 191.738698] ? __warn+0x93/0x1a0
[ 191.738703] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738759] ? report_bug+0x18f/0x1a0
[ 191.738764] ? handle_bug+0x63/0xa0
[ 191.738767] ? exc_invalid_op+0x19/0x70
[ 191.738770] ? asm_exc_invalid_op+0x1b/0x20
[ 191.738777] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738834] ? ret_from_fork_asm+0x1a/0x30
[ 191.738849] bind_op_prepare+0x105/0x7b0 [xe]
[ 191.738906] ? dma_resv_reserve_fences+0x301/0x380
[ 191.738912] xe_pt_update_ops_prepare+0x28c/0x4b0 [xe]
[ 191.738966] ? kmemleak_alloc+0x4b/0x80
[ 191.738973] ops_execute+0x188/0x9d0 [xe]
[ 191.739036] xe_vm_rebind+0x4ce/0x5a0 [xe]
[ 191.739098] ? trace_hardirqs_on+0x4d/0x60
[ 191.739112] preempt_rebind_work_func+0x76f/0xd00 [xe]
Followed by NPD, when running some workload, since the sg was never
actually populated but the vma is still marked for rebind when it should
be skipped for this special EFAULT case. This is confirmed to fix the
user report.
v2 (MattB):
- Move earlier.
v3 (MattB):
- Update the commit message to make it clear that this indeed fixes the
issue.
Fixes: 521db22a1d70 ("drm/xe: Invalidate userptr VMA on page pin fault")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-5-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 6b93cb98910c826c2e2004942f8b060311e43618)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
On error restore anything still on the pin_list back to the invalidation
list on error. For the actual pin, so long as the vma is tracked on
either list it should get picked up on the next pin, however it looks
possible for the vma to get nuked but still be present on this per vm
pin_list leading to corruption. An alternative might be then to instead
just remove the link when destroying the vma.
v2:
- Also add some asserts.
- Keep the overzealous locking so that we are consistent with the docs;
updating the docs and related bits will be done as a follow up.
Fixes: ed2bdf3b264d ("drm/xe/vm: Subclass userptr vmas")
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221143840.167150-4-matthew.auld@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 4e37e928928b730de9aa9a2f5dc853feeebc1742)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Marek has graciously offered to maintain the dma-mapping tree.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Joel will go back to maintain configfs alone on a time permitting basis.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We triggered the following crash in syzkaller tests:
BUG: Bad page state in process syz.7.38 pfn:1eff3
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1eff3
flags: 0x3fffff00004004(referenced|reserved|node=0|zone=1|lastcpupid=0x1fffff)
raw: 003fffff00004004 ffffe6c6c07bfcc8 ffffe6c6c07bfcc8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000fffffffe 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_unref_page_prepare+0x401/0x500
free_unref_page+0x6d/0x1b0
uprobe_write_opcode+0x460/0x8e0
install_breakpoint.part.0+0x51/0x80
register_for_each_vma+0x1d9/0x2b0
__uprobe_register+0x245/0x300
bpf_uprobe_multi_link_attach+0x29b/0x4f0
link_create+0x1e2/0x280
__sys_bpf+0x75f/0xac0
__x64_sys_bpf+0x1a/0x30
do_syscall_64+0x56/0x100
entry_SYSCALL_64_after_hwframe+0x78/0xe2
BUG: Bad rss-counter state mm:00000000452453e0 type:MM_FILEPAGES val:-1
The following syzkaller test case can be used to reproduce:
r2 = creat(&(0x7f0000000000)='./file0\x00', 0x8)
write$nbd(r2, &(0x7f0000000580)=ANY=[], 0x10)
r4 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file0\x00', 0x42, 0x0)
mmap$IORING_OFF_SQ_RING(&(0x7f0000ffd000/0x3000)=nil, 0x3000, 0x0, 0x12, r4, 0x0)
r5 = userfaultfd(0x80801)
ioctl$UFFDIO_API(r5, 0xc018aa3f, &(0x7f0000000040)={0xaa, 0x20})
r6 = userfaultfd(0x80801)
ioctl$UFFDIO_API(r6, 0xc018aa3f, &(0x7f0000000140))
ioctl$UFFDIO_REGISTER(r6, 0xc020aa00, &(0x7f0000000100)={{&(0x7f0000ffc000/0x4000)=nil, 0x4000}, 0x2})
ioctl$UFFDIO_ZEROPAGE(r5, 0xc020aa04, &(0x7f0000000000)={{&(0x7f0000ffd000/0x1000)=nil, 0x1000}})
r7 = bpf$PROG_LOAD(0x5, &(0x7f0000000140)={0x2, 0x3, &(0x7f0000000200)=ANY=[@ANYBLOB="1800000000120000000000000000000095"], &(0x7f0000000000)='GPL\x00', 0x7, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, @fallback=0x30, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, @void, @value}, 0x94)
bpf$BPF_LINK_CREATE_XDP(0x1c, &(0x7f0000000040)={r7, 0x0, 0x30, 0x1e, @val=@uprobe_multi={&(0x7f0000000080)='./file0\x00', &(0x7f0000000100)=[0x2], 0x0, 0x0, 0x1}}, 0x40)
The cause is that zero pfn is set to the PTE without increasing the RSS
count in mfill_atomic_pte_zeropage() and the refcount of zero folio does
not increase accordingly. Then, the operation on the same pfn is performed
in uprobe_write_opcode()->__replace_page() to unconditional decrease the
RSS count and old_folio's refcount.
Therefore, two bugs are introduced:
1. The RSS count is incorrect, when process exit, the check_mm() report
error "Bad rss-count".
2. The reserved folio (zero folio) is freed when folio->refcount is zero,
then free_pages_prepare->free_page_is_bad() report error
"Bad page state".
There is more, the following warning could also theoretically be triggered:
__replace_page()
-> ...
-> folio_remove_rmap_pte()
-> VM_WARN_ON_FOLIO(is_zero_folio(folio), folio)
Considering that uprobe hit on the zero folio is a very rare case, just
reject zero old folio immediately after get_user_page_vma_remote().
[ mingo: Cleaned up the changelog ]
Fixes: 7396fa818d62 ("uprobes/core: Make background page replacement logic account for rss_stat counters")
Fixes: 2b1444983508 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints")
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250224031149.1598949-1-tongtiangen@huawei.com
|
|
The sorting of VMAs by size in commit 7d442a33bfe8 ("binfmt_elf: Dump
smaller VMAs first in ELF cores") breaks elfutils[1]. Instead, sort
based on the setting of the new sysctl, core_sort_vma, which defaults
to 0, no sorting.
Reported-by: Michael Stapelberg <michael@stapelberg.ch>
Closes: https://lore.kernel.org/all/20250218085407.61126-1-michael@stapelberg.de/ [1]
Fixes: 7d442a33bfe8 ("binfmt_elf: Dump smaller VMAs first in ELF cores")
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Some tools like KGraphViewer interpret the "ON" nodes not having an
explicitly set fill colour as them being entirely black, which obscures
the text on them and looks funny. In fact, I thought they were off for
the longest time. Comparing to the output of the `dot` tool, I assume
they are supposed to be white.
Instead of speclawyering over who's in the wrong and must immediately
atone for their wickedness at the altar of RFC2119, just be explicit
about it, set the fillcolor to white, and nobody gets confused.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Link: https://patch.msgid.link/20250221-dapm-graph-node-colour-v1-1-514ed0aa7069@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
If stream names of DAI driver are duplicated there'll be warnings when
machine driver tries to add widgets on a route:
[ 8.831335] fsl-asoc-card sound-wm8960: ASoC: sink widget CPU-Playback overwritten
[ 8.839917] fsl-asoc-card sound-wm8960: ASoC: source widget CPU-Capture overwritten
Use different stream names to avoid such warnings.
DAI names in AUDMIX are also updated accordingly.
Fixes: 15c958390460 ("ASoC: fsl_sai: Add separate DAI for transmitter and receiver")
Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com>
Link: https://patch.msgid.link/20250217010437.258621-1-chancel.liu@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
perf_event_swap_task_ctx_data(). vmcore shows that two lists have the same
perf_event_pmu_context, but not in the same order.
The problem is that the order of pmu_ctx_list for the parent is impacted by
the time when an event/PMU is added. While the order for a child is
impacted by the event order in the pinned_groups and flexible_groups. So
the order of pmu_ctx_list in the parent and child may be different.
To fix this problem, insert the perf_event_pmu_context to its proper place
after iteration of the pmu_ctx_list.
The follow testcase can trigger above warning:
# perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
# perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out
test.c
void main() {
int count = 0;
pid_t pid;
printf("%d running\n", getpid());
sleep(30);
printf("running\n");
pid = fork();
if (pid == -1) {
printf("fork error\n");
return;
}
if (pid == 0) {
while (1) {
count++;
}
} else {
while (1) {
count++;
}
}
}
The testcase first opens an LBR event, so it will allocate task_ctx_data,
and then open tracepoint and software events, so the parent context will
have 3 different perf_event_pmu_contexts. On inheritance, child ctx will
insert the perf_event_pmu_context in another order and the warning will
trigger.
[ mingo: Tidied up the changelog. ]
Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250122073356.1824736-1-luogengkun@huaweicloud.com
|
|
into HEAD
KVM/riscv fixes for 6.14, take #1
- Fix hart status check in SBI HSM extension
- Fix hart suspend_type usage in SBI HSM extension
- Fix error returned by SBI IPI and TIME extensions for
unsupported function IDs
- Fix suspend_type usage in SBI SUSP extension
- Remove unnecessary vcpu kick after injecting interrupt
via IMSIC guest file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.14, take #3
- Fix TCR_EL2 configuration to not use the ASID in TTBR1_EL2
and not mess-up T1SZ/PS by using the HCR_EL2.E2H==0 layout.
- Bring back the VMID allocation to the vcpu_load phase, ensuring
that we only setup VTTBR_EL2 once on VHE. This cures an ugly
race that would lead to running with an unallocated VMID.
|
|
The perf_iterate_ctx() function performs RCU list traversal but
currently lacks RCU read lock protection. This causes lockdep warnings
when running perf probe with unshare(1) under CONFIG_PROVE_RCU_LIST=y:
WARNING: suspicious RCU usage
kernel/events/core.c:8168 RCU-list traversed in non-reader section!!
Call Trace:
lockdep_rcu_suspicious
? perf_event_addr_filters_apply
perf_iterate_ctx
perf_event_exec
begin_new_exec
? load_elf_phdrs
load_elf_binary
? lock_acquire
? find_held_lock
? bprm_execve
bprm_execve
do_execveat_common.isra.0
__x64_sys_execve
do_syscall_64
entry_SYSCALL_64_after_hwframe
This protection was previously present but was removed in commit
bd2756811766 ("perf: Rewrite core context handling"). Add back the
necessary rcu_read_lock()/rcu_read_unlock() pair around
perf_iterate_ctx() call in perf_event_exec().
[ mingo: Use scoped_guard() as suggested by Peter ]
Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250117-fix_perf_rcu-v1-1-13cb9210fc6a@debian.org
|
|
Add a rudimentary test for validating KVM's handling of L1 hypervisor
intercepts during instruction emulation on behalf of L2. To minimize
complexity and avoid overlap with other tests, only validate KVM's
handling of instructions that L1 wants to intercept, i.e. that generate a
nested VM-Exit. Full testing of emulation on behalf of L2 is better
achieved by running existing (forced) emulation tests in a VM, (although
on VMX, getting L0 to emulate on #UD requires modifying either L1 KVM to
not intercept #UD, or modifying L0 KVM to prioritize L0's exception
intercepts over L1's intercepts, as is done by KVM for SVM).
Since emulation should never be successful, i.e. L2 always exits to L1,
dynamically generate the L2 code stream instead of adding a helper for
each instruction. Doing so requires hand coding instruction opcodes, but
makes it significantly easier for the test to compute the expected "next
RIP" and instruction length.
Link: https://lore.kernel.org/r/20250201015518.689704-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When emulating an instruction on behalf of L2 that L1 wants to intercept,
generate a nested VM-Exit instead of injecting a #UD into L2. Now that
(most of) the necessary information is available, synthesizing a VM-Exit
isn't terribly difficult.
Punt on decoding the ModR/M for descriptor table exits for now. There is
no evidence that any hypervisor intercepts descriptor table accesses *and*
uses the EXIT_QUALIFICATION to expedite emulation, i.e. it's not worth
delaying basic support for.
To avoid doing more harm than good, e.g. by putting L2 into an infinite
or effectively corrupting its code stream, inject #UD if the instruction
length is nonsensical.
Link: https://lore.kernel.org/r/20250201015518.689704-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Rework the nested VM-Exit helper to take the instruction length as a
parameter, and convert nested_vmx_vmexit() into a "default" wrapper that
grabs the length from vmcs02 as appropriate. This will allow KVM to set
the correct instruction length when synthesizing a nested VM-Exit when
emulating an instruction that L1 wants to intercept.
No functional change intended, as the path to prepare_vmcs12()'s reading
of vmcs02.VM_EXIT_INSTRUCTION_LEN is gated on the same set of conditions
as the VMREAD in the new nested_vmx_vmexit().
Link: https://lore.kernel.org/r/20250201015518.689704-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add a #define to capture x86's architecturally defined max instruction
length instead of open coding the literal in a variety of places.
No functional change intended.
Link: https://lore.kernel.org/r/20250201015518.689704-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When checking for intercept when emulating an instruction on behalf of L2,
pass the emulator's view of the RIP of the instruction being emulated to
vendor code. Unlike SVM, which communicates the next RIP on VM-Exit,
VMX communicates the length of the instruction that generated the VM-Exit,
i.e. requires the current and next RIPs.
Note, unless userspace modifies RIP during a userspace exit that requires
completion, kvm_rip_read() will contain the same information. Pass the
emulator's view largely out of a paranoia, and because there is no
meaningful cost in doing so.
Link: https://lore.kernel.org/r/20250201015518.689704-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When checking for intercept when emulating an instruction on behalf of L2,
forward the source and destination operand types to vendor code so that
VMX can synthesize the correct EXIT_QUALIFICATION for port I/O VM-Exits.
Link: https://lore.kernel.org/r/20250201015518.689704-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Refactor the handling of port I/O interception checks when emulating on
behalf of L2 in anticipation of synthesizing a nested VM-Exit to L1
instead of injecting a #UD into L2.
No functional change intended.
Link: https://lore.kernel.org/r/20250201015518.689704-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Extend VMX's nested intercept logic for emulated instructions to handle
HLT interception, primarily for testing purposes. Failure to allow
emulation of HLT isn't all that interesting, as emulating HLT while L2 is
active either requires forced emulation (and no #UD intercept in L1), TLB
games in the guest to coerce KVM into emulating the wrong instruction, or
a bug elsewhere in KVM.
E.g. without commit 47ef3ef843c0 ("KVM: VMX: Handle event vectoring
error in check_emulate_instruction()"), KVM can end up trying to emulate
HLT if RIP happens to point at a HLT when a vectored event arrives with
L2's IDT pointing at emulated MMIO.
Note, vmx_check_intercept() is still broken when L1 wants to intercept an
instruction, as KVM injects a #UD instead of synthesizing a nested VM-Exit.
That issue extends far beyond HLT, punt on it for now.
Link: https://lore.kernel.org/r/20250201015518.689704-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Return X86EMUL_CONTINUE instead X86EMUL_UNHANDLEABLE when emulating RDPID
on behalf of L2 and L1 _does_ expose RDPID/RDTSCP to L2. When RDPID
emulation was added by commit fb6d4d340e05 ("KVM: x86: emulate RDPID"),
KVM incorrectly allowed emulation by default. Commit 07721feee46b ("KVM:
nVMX: Don't emulate instructions in guest mode") fixed that flaw, but
missed that RDPID emulation was relying on the common return path to allow
emulation on behalf of L2.
Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode")
Link: https://lore.kernel.org/r/20250201015518.689704-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Set "next_rip" in the emulation interception info passed to vendor code
using the emulator context's "_eip", not "eip". "eip" holds RIP from the
start of emulation, i.e. the RIP of the instruction that's being emulated,
whereas _eip tracks the context's current position in decoding the code
stream, which at the time of the intercept checks is effectively the RIP
of the next instruction.
Passing the current RIP as next_rip causes SVM to stuff the wrong value
value into vmcb12->control.next_rip if a nested VM-Exit is generated, i.e.
if L1 wants to intercept the instruction, and could result in L1 putting
L2 into an infinite loop due to restarting L2 with the same RIP over and
over.
Fixes: 8a76d7f25f8f ("KVM: x86: Add x86 callback for intercept check")
Link: https://lore.kernel.org/r/20250201015518.689704-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When emulating PAUSE on behalf of L2, check for interception in vmcs12 by
looking at primary execution controls, not secondary execution controls.
Checking for PAUSE_EXITING in secondary execution controls effectively
results in KVM looking for BUS_LOCK_DETECTION, which KVM doesn't expose to
L1, i.e. is always off in vmcs12, and ultimately results in KVM failing to
"intercept" PAUSE.
Because KVM doesn't handle interception during emulation correctly on VMX,
i.e. the "fixed" code is still quite broken, and not intercepting PAUSE is
relatively benign, for all intents and purposes the bug means that L2 gets
to live when it would otherwise get an unexpected #UD.
Fixes: 4984563823f0 ("KVM: nVMX: Emulate NOPs in L2, and PAUSE if it's not intercepted")
Link: https://lore.kernel.org/r/20250201015518.689704-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The ES8328 codec driver, which is also used for the ES8388 chip that
appears to have an identical register map, claims that the output can
either take the route from DAC->Mixer->Output or through DAC->Output
directly. To the best of what I could find, this is not true, and
creates problems.
Without DACCONTROL17 bit index 7 set for the left channel, as well as
DACCONTROL20 bit index 7 set for the right channel, I cannot get any
analog audio out on Left Out 2 and Right Out 2 respectively, despite the
DAPM routes claiming that this should be possible. Furthermore, the same
is the case for Left Out 1 and Right Out 1, showing that those two don't
have a direct route from DAC to output bypassing the mixer either.
Those control bits toggle whether the DACs are fed (stale bread?) into
their respective mixers. If one "unmutes" the mixer controls in
alsamixer, then sure, the audio output works, but if it doesn't work
without the mixer being fed the DAC input then evidently it's not a
direct output from the DAC.
ES8328/ES8388 are seemingly not alone in this. ES8323, which uses a
separate driver for what appears to be a very similar register map,
simply flips those two bits on in its probe function, and then pretends
there is no power management whatsoever for the individual controls.
Fair enough.
My theory as to why nobody has noticed this up to this point is that
everyone just assumes it's their fault when they had to unmute an
additional control in ALSA.
Fix this in the es8328 driver by removing the erroneous direct route,
then get rid of the playback switch controls and have those bits tied to
the mixer's widget instead, which until now had no register to play
with.
Fixes: 567e4f98922c ("ASoC: add es8328 codec driver")
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Link: https://patch.msgid.link/20250222-es8328-route-bludgeoning-v1-1-99bfb7fb22d9@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
|