Age | Commit message (Collapse) | Author |
|
For TRANS2 QUERY_PATH_INFO request when the path does not exist, the
Windows NT SMB server returns error response STATUS_OBJECT_NAME_NOT_FOUND
or ERRDOS/ERRbadfile without the SMBFLG_RESPONSE flag set. Similarly it
returns STATUS_DELETE_PENDING when the file is being deleted. And looks
like that any error response from TRANS2 QUERY_PATH_INFO does not have
SMBFLG_RESPONSE flag set.
So relax check in check_smb_hdr() for detecting if the packet is response
for this special case.
This change fixes stat() operation against Windows NT SMB servers and also
all operations which depends on -ENOENT result from stat like creat() or
mkdir().
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Validate the SMB1 query reparse point response per [MS-CIFS] section
2.2.7.2 NT_TRANSACT_IOCTL.
NT_TRANSACT_IOCTL response contains one word long setup data after which is
ByteCount member. So check that SetupCount is 1 before trying to read and
use ByteCount member.
Output setup data contains ReturnedDataLen member which is the output
length of executed IOCTL command by remote system. So check that output was
not truncated before transferring over network.
Change MaxSetupCount of NT_TRANSACT_IOCTL request from 4 to 1 as io_rsp
structure already expects one word long output setup data. This should
prevent server sending incompatible structure (in case it would be extended
in future, which is unlikely).
Change MaxParameterCount of NT_TRANSACT_IOCTL request from 2 to 0 as
NT IOCTL does not have any documented output parameters and this function
does not parse any output parameters at all.
Fixes: ed3e0a149b58 ("smb: client: implement ->query_reparse_point() for SMB1")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
[MS-CIFS] specification in section 2.2.4.53.1 where is described
SMB_COM_SESSION_SETUP_ANDX Request, for SessionKey field says:
The client MUST set this field to be equal to the SessionKey field in
the SMB_COM_NEGOTIATE Response for this SMB connection.
Linux SMB client currently set this field to zero. This is working fine
against Windows NT SMB servers thanks to [MS-CIFS] product behavior <94>:
Windows NT Server ignores the client's SessionKey.
For compatibility with [MS-CIFS], set this SessionKey field in Session
Setup Request to value retrieved from Negotiate response.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
SMB1 Session Setup NTLMSSP Request in non-UNICODE mode is similar to
UNICODE mode, just strings are encoded in ASCII and not in UTF-16.
With this change it is possible to setup SMB1 session with NTLM
authentication in non-UNICODE mode with Windows SMB server.
This change fixes mounting SMB1 servers with -o nounicode mount option
together with -o sec=ntlmssp mount option (which is the default sec=).
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The MT6357 PMIC contains the same RTC as MT6358 which allows to add
support for it trivially by just complementing the list of compatibles.
Signed-off-by: Alexandre Mergnat <amergnat@baylibre.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Tested-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/20250428-rtc-mt6357-v1-1-31f673b0a723@baylibre.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
While the RTC framework intends to only handle dates after 1970 for
consumers, time conversion must also work for earlier dates to cover
e.g. storing dates beyond an RTC's range_max. This is most relevant for
the rtc-mt6397 driver that has
range_min = RTC_TIMESTAMP_BEGIN_1900;
range_max = mktime64(2027, 12, 31, 23, 59, 59);
and so needs working support for timestamps in 1900 starting in less than
three years.
So shift the tested interval of timestamps to also cover years 1900 to
1970.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Alexandre Mergnat <amergnat@baylibre.com>
Link: https://lore.kernel.org/r/20250428-enable-rtc-v4-5-2b2f7e3f9349@baylibre.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
To cover calculation of the time and wday in the rtc kunit test also check
tm_hour, tm_min, tm_sec and tm_wday of the rtc_time calculated by
rtc_time64_to_tm().
There are no surprises, the two tests making use of
rtc_time64_to_tm_test_date_range() continue to succeed.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Alexandre Mergnat <amergnat@baylibre.com>
Link: https://lore.kernel.org/r/20250428-enable-rtc-v4-4-2b2f7e3f9349@baylibre.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
This is easier to handle because you can just consult date(1) to convert
between a seconds-since-1970 value and a date string:
$ date --utc -d @3661
Thu Jan 1 01:01:01 AM UTC 1970
$ date -d "Jan 1 12:00:00 AM UTC 1900" +%s
-2208988800
The intended side effect is that this prepares the test for dates before
1970. The division of a negative value by 86400 doesn't result in the
desired days-since-1970 value as e.g. secs=-82739 should map to days=-1.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Alexandre Mergnat <amergnat@baylibre.com>
Link: https://lore.kernel.org/r/20250428-enable-rtc-v4-3-2b2f7e3f9349@baylibre.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
The comparison
rtc->start_secs > rtc->range_max
has a signed left-hand side and an unsigned right-hand side.
So the comparison might become true for negative start_secs which is
interpreted as a (possibly very large) positive value.
As a negative value can never be bigger than an unsigned value
the correct representation of the (mathematical) comparison
rtc->start_secs > rtc->range_max
in C is:
rtc->start_secs >= 0 && rtc->start_secs > rtc->range_max
Use that to fix the offset calculation currently used in the
rtc-mt6397 driver.
Fixes: 989515647e783 ("rtc: Add one offset seconds to expand RTC range")
Signed-off-by: Alexandre Mergnat <amergnat@baylibre.com>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/20250428-enable-rtc-v4-2-2b2f7e3f9349@baylibre.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Conversion of dates before 1970 is still relevant today because these
dates are reused on some hardwares to store dates bigger than the
maximal date that is representable in the device's native format.
This prominently and very soon affects the hardware covered by the
rtc-mt6397 driver that can only natively store dates in the interval
1900-01-01 up to 2027-12-31. So to store the date 2028-01-01 00:00:00
to such a device, rtc_time64_to_tm() must do the right thing for
time=-2208988800.
Signed-off-by: Alexandre Mergnat <amergnat@baylibre.com>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/20250428-enable-rtc-v4-1-2b2f7e3f9349@baylibre.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
When the regmap framework was introduced to this driver,
the PCF8563_REG_AMN register within the set_alarm function was
incorrectly changed to PCF8563_REG_SC.
The PCF8563_REG_SC register is the seconds register.
This caused alarm values to be written to the seconds register
when an alarm was set. Which means the alarm would not trigger
as expected and the seconds register would be overwritten
with an incorrect value.
Signed-off-by: Troy Mitchell <troymitchell988@gmail.com>
Link: https://lore.kernel.org/r/20250531-pcf8563-fix-alarm-v2-1-cac4b1716167@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
When using the SCMP mode instead of SUBU, this RTC can also support
other input frequencies than 32768Hz. Also, upcoming SoCs will only
support SCMP.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20250526095801.35781-8-wsa+renesas@sang-engineering.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Datasheet says that the controller must be disabled before setting up
either SUBU or SCMP. This did not matter so far because the driver only
supported SUBU which was the default, too. It is good practice to follow
datasheet recommendations, though. It will also be needed because SCMP
mode will be added in a later patch.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20250526095801.35781-7-wsa+renesas@sang-engineering.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
The external crystal can be a second clock input. It is needed for the
SCMP counting method which allows using crystals different than 32768Hz.
It is also needed for an upcoming SoC which only supports the SCMP
method.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20250526095801.35781-6-wsa+renesas@sang-engineering.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
page is checked for null in __build_path_from_dentry_optional_prefix
when tcon->origin_fullpath is not set. However, the check is missing when
it is set.
Add a check to prevent a potential NULL pointer dereference.
Signed-off-by: Ruben Devos <devosruben6@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Ihor Solodrai reported selftest 'btf_tag/btf_type_tag_percpu_vmlinux_helper'
failure ([1]) during 6.16 merge window. The failure log:
...
7: (15) if r0 == 0x0 goto pc+1 ; R0=ptr_css_rstat_cpu()
; *(volatile int *)rstat; @ btf_type_tag_percpu.c:68
8: (61) r1 = *(u32 *)(r0 +0)
cannot access ptr member updated_children with moff 0 in struct css_rstat_cpu with off 0 size 4
Two changes are needed. First, 'struct cgroup_rstat_cpu' needs to be
replaced with 'struct css_rstat_cpu' to be consistent with new data
structure. Second, layout of 'css_rstat_cpu' is changed compared
to 'cgroup_rstat_cpu'. The first member becomes a pointer so
the bpf prog needs to do 8-byte load instead of 4-byte load.
[1] https://lore.kernel.org/bpf/6f688f2e-7d26-423a-9029-d1b1ef1c938a@linux.dev/
Cc: Ihor Solodrai <ihor.solodrai@linux.dev>
Cc: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: JP Kobryn <inwardvessel@gmail.com>
Link: https://lore.kernel.org/r/20250529201151.1787575-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
On linux-next, build for bpf selftest displays an error due to
mismatch in the expected function signature of bpf_testmod_test_read
and bpf_testmod_test_write.
Commit 97d06802d10a ("sysfs: constify bin_attribute argument of bin_attribute::read/write()")
changed the required type for struct bin_attribute to const struct bin_attribute.
To resolve the error, update corresponding signature for the callback.
Fixes: 97d06802d10a ("sysfs: constify bin_attribute argument of bin_attribute::read/write()")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/e915da49-2b9a-4c4c-a34f-877f378129f6@linux.ibm.com/
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Link: https://lore.kernel.org/r/20250512091108.2015615-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- randstruct: gcc-plugin: Fix attribute addition with GCC 15
- ubsan: integer-overflow: depend on BROKEN to keep this out of CI
- overflow: Introduce __DEFINE_FLEX for having no initializer
- wifi: iwlwifi: mld: Work around Clang loop unrolling bug
[ Take two after a jump scare due to some repo rewriting by 'b4' - Linus ]
* tag 'hardening-v6.16-rc1-fix1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
randstruct: gcc-plugin: Fix attribute addition
overflow: Introduce __DEFINE_FLEX for having no initializer
ubsan: integer-overflow: depend on BROKEN to keep this out of CI
wifi: iwlwifi: mld: Work around Clang loop unrolling bug
|
|
git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- Add watchdog timer for the NXP S32 platform
- Add driver for Intel OC WDT
- Add exynos990-wdt
- Various other fixes and improvements
* tag 'linux-watchdog-6.16-rc1' of git://www.linux-watchdog.org/linux-watchdog: (22 commits)
watchdog: iTCO_wdt: Update the heartbeat value after clamping timeout
watchdog: Add driver for Intel OC WDT
watchdog: arm_smc_wdt: get wdt status through SMCWD_GET_TIMELEFT
watchdog: iTCO: Drop driver-internal locking
watchdog: apple: set max_hw_heartbeat_ms instead of max_timeout
watchdog: qcom: introduce the device data for IPQ5424 watchdog device
dt-bindings: watchdog: renesas,wdt: Document RZ/V2N (R9A09G056) support
watchdog: lenovo_se30_wdt: Fix possible devm_ioremap() NULL pointer dereference in lenovo_se30_wdt_probe()
watchdog: s3c2410_wdt: Add exynos990-wdt compatible data
dt-bindings: watchdog: samsung-wdt: Add exynos990-wdt compatible
dt-bindings: watchdog: Add rk3562 compatible
dt-bindings: watchdog: fsl,scu-wdt: Document imx8qm
watchdog: Add the Watchdog Timer for the NXP S32 platform
dt-bindings: watchdog: Add NXP Software Watchdog Timer
watchdog: Correct kerneldoc warnings
watchdog: stm32: Fix wakeup source leaks on device unbind
watchdog: Do not enable by default during compile testing
watchdog: cros-ec: Avoid -Wflex-array-member-not-at-end warning
watchdog: da9052_wdt: respect TWDMIN
watchdog: da9052_wdt: do not disable wdt during probe
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"There is not much this this, mostly fixes around interrupt and IBI
handling:
- mipi-i3c-hci: interrupt handling fixes
- svc: i.MX94 and i.MX95 support, IBI handling fixes"
* tag 'i3c/for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: controllers do not need to depend on I3C
i3c: master: svc: switch to bulk clk API for flexible clock support
dt-bindings: i3c: silvaco,i3c-master: add i.MX94 and i.MX95 I3C
i3c: master: svc: skip address resend on repeat START
i3c: master: svc: Emit STOP asap in the IBI transaction
i3c: master: svc: Receive IBI requests in interrupt context
i3c: mipi-i3c-hci: Move unexpected INTR_STATUS print before IO handler
i3c: mipi-i3c-hci: Change name of INTR_STATUS bit 11
i3c: mipi-i3c-hci: Clear INTR_STATUS unconditionally
i3c: mipi-i3c-hci: Fix handling status of i3c_hci_irq_handler()
i3c: mipi-i3c-hci: Allow only relevant INTR_STATUS bit updates
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
"Limit a register write width in altera_edac to avoid hw errors"
* tag 'edac_urgent_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/altera: Use correct write width with the INTTEST register
|
|
Pull OpenRISC updates from Stafford Horne:
"Just a few documentation updates from the community:
- Device tree documentation conversion from txt to yaml
- Documentation addition to help users getting started with initramfs
on OpenRISC
* tag 'for-linus' of https://github.com/openrisc/linux:
dt-bindings: interrupt-controller: Convert openrisc,ompic to DT schema
dt-bindings: interrupt-controller: Convert opencores,or1k-pic to DT schema
Documentation:openrisc: Add build instructions with initramfs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"Fix building with gcc-15, formatting fix on unaligned warnings and
replace __ASSEMBLY__ with __ASSEMBLER__ in headers"
* tag 'parisc-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc/unaligned: Fix hex output to show 8 hex chars
parisc: fix building with gcc-15
parisc: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
parisc: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
|
|
The CephFS kernel driver forgets to set the filesystem magic signature in
its superblock. As a result, IMA policy rules based on fsmagic matching do
not apply as intended. This causes a major performance regression in Talos
Linux [1] when mounting CephFS volumes, such as when deploying Rook Ceph
[2]. Talos Linux ships a hardened kernel with the following IMA policy
(irrelevant lines omitted):
[...]
dont_measure fsmagic=0xc36400 # CEPH_SUPER_MAGIC
[...]
measure func=FILE_CHECK mask=^MAY_READ euid=0
measure func=FILE_CHECK mask=^MAY_READ uid=0
[...]
Currently, IMA compares 0xc36400 == 0x0 for CephFS files, resulting in all
files opened with O_RDONLY or O_RDWR getting measured with SHA512 on every
open(2):
10 69990c87e8af323d47e2d6ae4... ima-ng sha512:<hash> /data/cephfs/test-file
Since O_WRONLY is rare, this results in an order of magnitude lower
performance than expected for practically all file operations. Properly
setting CEPH_SUPER_MAGIC in the CephFS superblock resolves the regression.
Tests performed on a 3x replicated Ceph v19.3.0 cluster across three
i5-7200U nodes each equipped with one Micron 7400 MAX M.2 disk (BlueStore)
and Gigabit ethernet, on Talos Linux v1.10.2:
FS-Mark 3.3
Test: 500 Files, Empty
Files/s > Higher Is Better
6.12.27-talos . 16.6 |====
+twelho patch . 208.4 |====================================================
FS-Mark 3.3
Test: 500 Files, 1KB Size
Files/s > Higher Is Better
6.12.27-talos . 15.6 |=======
+twelho patch . 118.6 |====================================================
FS-Mark 3.3
Test: 500 Files, 32 Sub Dirs, 1MB Size
Files/s > Higher Is Better
6.12.27-talos . 12.7 |===============
+twelho patch . 44.7 |=====================================================
IO500 [3] 2fcd6d6 results (benchmarks within variance omitted):
| IO500 benchmark | 6.12.27-talos | +twelho patch | Speedup |
|-------------------|----------------|----------------|-----------|
| mdtest-easy-write | 0.018524 kIOPS | 1.135027 kIOPS | 6027.33 % |
| mdtest-hard-write | 0.018498 kIOPS | 0.973312 kIOPS | 5161.71 % |
| ior-easy-read | 0.064727 GiB/s | 0.155324 GiB/s | 139.97 % |
| mdtest-hard-read | 0.018246 kIOPS | 0.780800 kIOPS | 4179.29 % |
This applies outside of synthetic benchmarks as well, for example, the time
to rsync a 55 MiB directory with ~12k of mostly small files drops from an
unusable 10m5s to a reasonable 26s (23x the throughput).
[1]: https://www.talos.dev/
[2]: https://www.talos.dev/v1.10/kubernetes-guides/configuration/ceph-with-rook/
[3]: https://github.com/IO500/io500
Cc: stable@vger.kernel.org
Signed-off-by: Dennis Marttinen <twelho@welho.tech>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The ceph/export.c contains very confusing logic of file handle size
calculation based on hardcoded values. This patch makes the cleanup of
this logic by means of introduction the named constants.
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Alex Markuze <amarkuze@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In 'ceph_zero_objects', promote 'object_size' to 'u64' to avoid possible
integer overflow.
Compile tested only.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Dmitry Kandybka <d.kandybka@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The generic/397 test hits a BUG_ON for the case of encrypted inode with
unaligned file size (for example, 33K or 1K):
[ 877.737811] run fstests generic/397 at 2025-01-03 12:34:40
[ 877.875761] libceph: mon0 (2)127.0.0.1:40674 session established
[ 877.876130] libceph: client4614 fsid 19b90bca-f1ae-47a6-93dd-0b03ee637949
[ 877.991965] libceph: mon0 (2)127.0.0.1:40674 session established
[ 877.992334] libceph: client4617 fsid 19b90bca-f1ae-47a6-93dd-0b03ee637949
[ 878.017234] libceph: mon0 (2)127.0.0.1:40674 session established
[ 878.017594] libceph: client4620 fsid 19b90bca-f1ae-47a6-93dd-0b03ee637949
[ 878.031394] xfs_io (pid 18988) is setting deprecated v1 encryption policy; recommend upgrading to v2.
[ 878.054528] libceph: mon0 (2)127.0.0.1:40674 session established
[ 878.054892] libceph: client4623 fsid 19b90bca-f1ae-47a6-93dd-0b03ee637949
[ 878.070287] libceph: mon0 (2)127.0.0.1:40674 session established
[ 878.070704] libceph: client4626 fsid 19b90bca-f1ae-47a6-93dd-0b03ee637949
[ 878.264586] libceph: mon0 (2)127.0.0.1:40674 session established
[ 878.265258] libceph: client4629 fsid 19b90bca-f1ae-47a6-93dd-0b03ee637949
[ 878.374578] -----------[ cut here ]------------
[ 878.374586] kernel BUG at net/ceph/messenger.c:1070!
[ 878.375150] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 878.378145] CPU: 2 UID: 0 PID: 4759 Comm: kworker/2:9 Not tainted 6.13.0-rc5+ #1
[ 878.378969] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 878.380167] Workqueue: ceph-msgr ceph_con_workfn
[ 878.381639] RIP: 0010:ceph_msg_data_cursor_init+0x42/0x50
[ 878.382152] Code: 89 17 48 8b 46 70 55 48 89 47 08 c7 47 18 00 00 00 00 48 89 e5 e8 de cc ff ff 5d 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc 0f 0b <0f> 0b 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90
[ 878.383928] RSP: 0018:ffffb4ffc7cbbd28 EFLAGS: 00010287
[ 878.384447] RAX: ffffffff82bb9ac0 RBX: ffff981390c2f1f8 RCX: 0000000000000000
[ 878.385129] RDX: 0000000000009000 RSI: ffff981288232b58 RDI: ffff981390c2f378
[ 878.385839] RBP: ffffb4ffc7cbbe18 R08: 0000000000000000 R09: 0000000000000000
[ 878.386539] R10: 0000000000000000 R11: 0000000000000000 R12: ffff981390c2f030
[ 878.387203] R13: ffff981288232b58 R14: 0000000000000029 R15: 0000000000000001
[ 878.387877] FS: 0000000000000000(0000) GS:ffff9814b7900000(0000) knlGS:0000000000000000
[ 878.388663] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 878.389212] CR2: 00005e106a0554e0 CR3: 0000000112bf0001 CR4: 0000000000772ef0
[ 878.389921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 878.390620] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 878.391307] PKRU: 55555554
[ 878.391567] Call Trace:
[ 878.391807] <TASK>
[ 878.392021] ? show_regs+0x71/0x90
[ 878.392391] ? die+0x38/0xa0
[ 878.392667] ? do_trap+0xdb/0x100
[ 878.392981] ? do_error_trap+0x75/0xb0
[ 878.393372] ? ceph_msg_data_cursor_init+0x42/0x50
[ 878.393842] ? exc_invalid_op+0x53/0x80
[ 878.394232] ? ceph_msg_data_cursor_init+0x42/0x50
[ 878.394694] ? asm_exc_invalid_op+0x1b/0x20
[ 878.395099] ? ceph_msg_data_cursor_init+0x42/0x50
[ 878.395583] ? ceph_con_v2_try_read+0xd16/0x2220
[ 878.396027] ? _raw_spin_unlock+0xe/0x40
[ 878.396428] ? raw_spin_rq_unlock+0x10/0x40
[ 878.396842] ? finish_task_switch.isra.0+0x97/0x310
[ 878.397338] ? __schedule+0x44b/0x16b0
[ 878.397738] ceph_con_workfn+0x326/0x750
[ 878.398121] process_one_work+0x188/0x3d0
[ 878.398522] ? __pfx_worker_thread+0x10/0x10
[ 878.398929] worker_thread+0x2b5/0x3c0
[ 878.399310] ? __pfx_worker_thread+0x10/0x10
[ 878.399727] kthread+0xe1/0x120
[ 878.400031] ? __pfx_kthread+0x10/0x10
[ 878.400431] ret_from_fork+0x43/0x70
[ 878.400771] ? __pfx_kthread+0x10/0x10
[ 878.401127] ret_from_fork_asm+0x1a/0x30
[ 878.401543] </TASK>
[ 878.401760] Modules linked in: hctr2 nhpoly1305_avx2 nhpoly1305_sse2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 essiv authenc mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag intel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit kvm_intel kvm crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel joydev crypto_simd cryptd rapl input_leds psmouse sch_fq_codel serio_raw bochs i2c_piix4 floppy qemu_fw_cfg i2c_smbus mac_hid pata_acpi msr parport_pc ppdev lp parport efi_pstore ip_tables x_tables
[ 878.407319] ---[ end trace 0000000000000000 ]---
[ 878.407775] RIP: 0010:ceph_msg_data_cursor_init+0x42/0x50
[ 878.408317] Code: 89 17 48 8b 46 70 55 48 89 47 08 c7 47 18 00 00 00 00 48 89 e5 e8 de cc ff ff 5d 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc 0f 0b <0f> 0b 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90
[ 878.410087] RSP: 0018:ffffb4ffc7cbbd28 EFLAGS: 00010287
[ 878.410609] RAX: ffffffff82bb9ac0 RBX: ffff981390c2f1f8 RCX: 0000000000000000
[ 878.411318] RDX: 0000000000009000 RSI: ffff981288232b58 RDI: ffff981390c2f378
[ 878.412014] RBP: ffffb4ffc7cbbe18 R08: 0000000000000000 R09: 0000000000000000
[ 878.412735] R10: 0000000000000000 R11: 0000000000000000 R12: ffff981390c2f030
[ 878.413438] R13: ffff981288232b58 R14: 0000000000000029 R15: 0000000000000001
[ 878.414121] FS: 0000000000000000(0000) GS:ffff9814b7900000(0000) knlGS:0000000000000000
[ 878.414935] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 878.415516] CR2: 00005e106a0554e0 CR3: 0000000112bf0001 CR4: 0000000000772ef0
[ 878.416211] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 878.416907] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 878.417630] PKRU: 55555554
(gdb) l *ceph_msg_data_cursor_init+0x42
0xffffffff823b45a2 is in ceph_msg_data_cursor_init (net/ceph/messenger.c:1070).
1065
1066 void ceph_msg_data_cursor_init(struct ceph_msg_data_cursor *cursor,
1067 struct ceph_msg *msg, size_t length)
1068 {
1069 BUG_ON(!length);
1070 BUG_ON(length > msg->data_length);
1071 BUG_ON(!msg->num_data_items);
1072
1073 cursor->total_resid = length;
1074 cursor->data = msg->data;
The issue takes place because of this:
[ 202.628853] libceph: net/ceph/messenger_v2.c:2034 prepare_sparse_read_data(): msg->data_length 33792, msg->sparse_read_total 36864
1070 BUG_ON(length > msg->data_length);
The generic/397 test (xfstests) executes such steps:
(1) create encrypted files and directories;
(2) access the created files and folders with encryption key;
(3) access the created files and folders without encryption key.
The issue takes place in this portion of code:
if (IS_ENCRYPTED(inode)) {
struct page **pages;
size_t page_off;
err = iov_iter_get_pages_alloc2(&subreq->io_iter, &pages, len,
&page_off);
if (err < 0) {
doutc(cl, "%llx.%llx failed to allocate pages, %d\n",
ceph_vinop(inode), err);
goto out;
}
/* should always give us a page-aligned read */
WARN_ON_ONCE(page_off);
len = err;
err = 0;
osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0, false,
false);
The reason of the issue is that subreq->io_iter.count keeps unaligned
value of length:
[ 347.751182] lib/iov_iter.c:1185 __iov_iter_get_pages_alloc(): maxsize 36864, maxpages 4294967295, start 18446659367320516064
[ 347.752808] lib/iov_iter.c:1196 __iov_iter_get_pages_alloc(): maxsize 33792, maxpages 4294967295, start 18446659367320516064
[ 347.754394] lib/iov_iter.c:1015 iter_folioq_get_pages(): maxsize 33792, maxpages 4294967295, extracted 0, _start_offset 18446659367320516064
This patch simply assigns the aligned value to subreq->io_iter.count
before calling iov_iter_get_pages_alloc2().
[ idryomov: tag the comment with FIXME to make it clear that it's only
a workaround for netfslib not coexisting with fscrypt nicely
(this is also noted in another pre-existing comment) ]
Cc: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Based on changes in the 2021 public version of the randstruct
out-of-tree GCC plugin[1], more carefully update the attributes on
resulting decls, to avoid tripping checks in GCC 15's
comptypes_check_enum_int() when it has been configured with
"--enable-checking=misc":
arch/arm64/kernel/kexec_image.c:132:14: internal compiler error: in comptypes_check_enum_int, at c/c-typeck.cc:1519
132 | const struct kexec_file_ops kexec_image_ops = {
| ^~~~~~~~~~~~~~
internal_error(char const*, ...), at gcc/gcc/diagnostic-global-context.cc:517
fancy_abort(char const*, int, char const*), at gcc/gcc/diagnostic.cc:1803
comptypes_check_enum_int(tree_node*, tree_node*, bool*), at gcc/gcc/c/c-typeck.cc:1519
...
Link: https://archive.org/download/grsecurity/grsecurity-3.1-5.10.41-202105280954.patch.gz [1]
Reported-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Closes: https://github.com/KSPP/linux/issues/367
Closes: https://lore.kernel.org/lkml/20250530000646.104457-1-thiago.bauermann@linaro.org/
Reported-by: Ingo Saitz <ingo@hannover.ccc.de>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104745
Fixes: 313dd1b62921 ("gcc-plugins: Add the randstruct plugin")
Tested-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Link: https://lore.kernel.org/r/20250530221824.work.623-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
While not yet in the tree, there is a proposed patch[1] that was
depending on the prior behavior of _DEFINE_FLEX, which did not have an
explicit initializer. Provide this via __DEFINE_FLEX now, which can also
have attributes applied (e.g. __uninitialized).
Examples of the resulting initializer behaviors can be seen here:
https://godbolt.org/z/P7Go8Tr33
Link: https://lore.kernel.org/netdev/20250520205920.2134829-9-anthony.l.nguyen@intel.com [1]
Fixes: 47e36ed78406 ("overflow: Fix direct struct member initialization in _DEFINE_FLEX()")
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
When executing "modprobe iTCO_wdt heartbeat=700", the user-specified
'heartbeat' parameter exceeds the valid range, the driver clamps the
timeout to default 30s but fails to update the logged 'heartbeat' value,
resulting in misleading log output:
iTCO_wdt iTCO_wdt: timeout value out of range, using 30
iTCO_wdt iTCO_wdt: initialized. heartbeat=700 sec (nowayout=0)
After validating the range, update the 'heartbeat' value with the clamped
timeout value to ensure that log messages accurately reflect the actual
runtime parameters.
Signed-off-by: Ziyan Fu <fuzy5@lenovo.com>
Reviewed-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Link: https://lore.kernel.org/r/20250429102533.11886-1-13281011316@163.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
Add a driver for the Intel Over-Clocking Watchdog found in Intel
Platform Controller (PCH) chipsets. This watchdog is controlled
via a simple single-register interface and would otherwise be
standard except for the presence of a LOCK bit that can only be
set once per power cycle, needing extra handling around it.
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250317-ivo-intel_oc_wdt-v3-1-32c396f4eefd@siemens.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
The optional SMCWD_GET_TIMELEFT command can be used to detect if
the watchdog has already been started.
See the implementation in OP-TEE secure OS [1].
At probe time, check if the watchdog is already started and then
set WDOG_HW_RUNNING in the watchdog status. This will cause the
watchdog framework to ping the watchdog until a userspace watchdog
daemon takes over the control.
Link: https://github.com/OP-TEE/optee_os/commit/a7f2d4bd8632 [1]
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250520085952.210723-1-antonio.borneo@foss.st.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
The locking code in the iTCO watchdog driver has been carried along from
before the watchdog core existed. The watchdog core protects calls into
drivers since commit f4e9c82f64b5 ("watchdog: Add Locking support"),
making driver-internal locking unnecessary. Drop it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Link: https://lore.kernel.org/r/20250517160936.3231017-1-linux@roeck-us.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
The hardware only supports timeouts slightly below 3mins, but by using
max_hw_heartbeat_ms we can let the kernel take care of supporting larger
timeouts than that requested from userspace.
Switching to max_hw_heartbeat_ms also means our set_timeout function now
needs to configure the hardware to the minimum of either the requested
timeout (in seconds) or the maximum supported by the user (in seconds).
Signed-off-by: Florian Klink <flokli@flokli.de>
Reviewed-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Link: https://lore.kernel.org/r/20250506142621.11428-2-flokli@flokli.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
To retrieve the restart reason from IMEM, certain device specific data
like IMEM compatible to lookup, location of IMEM to read, etc should be
defined. To achieve that, introduce the separate device data for IPQ5424
and add the required details subsequently.
Signed-off-by: Kathiravan Thirumoorthy <kathiravan.thirumoorthy@oss.qualcomm.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250502-wdt_reset_reason-v3-3-b2dc7ace38ca@oss.qualcomm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
Document support for the watchdog IP found on the Renesas RZ/V2N
(R9A09G056) SoC. The watchdog IP is identical to that on RZ/V2H(P),
so `renesas,r9a09g057-wdt` will be used as a fallback compatible,
enabling reuse of the existing driver without changes.
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250502120054.47323-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
dereference in lenovo_se30_wdt_probe()
devm_ioremap() returns NULL on error. Currently, lenovo_se30_wdt_probe()
does not check for this case, which results in a NULL pointer
dereference.
Add NULL check after devm_ioremap() to prevent this issue.
Fixes: c284153a2c55 ("watchdog: lenovo_se30_wdt: Watchdog driver for Lenovo SE30 platform")
Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250424071648.89016-1-bsdhenrymartin@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
The Exynos990 has two watchdog clusters - cl0 and cl2. Add new
driver data for these two clusters, making it possible to use the
watchdog timer on this SoC.
Signed-off-by: Igor Belwon <igor.belwon@mentallysanemainliners.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250420-wdt-resends-april-v1-2-f58639673959@mentallysanemainliners.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
Add a dt-binding compatible for the Exynos990 Watchdog timer.
This watchdog is compatible with the GS101/Exynos850 design, as
such it requires the cluster-index and syscon-phandle properties
to be present. It also contains a cl2 cluster, as such the
cluster-index property has been expanded.
Signed-off-by: Igor Belwon <igor.belwon@mentallysanemainliners.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250420-wdt-resends-april-v1-1-f58639673959@mentallysanemainliners.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
Use folio_expected_ref_count() instead of open-coded logic in
is_refcount_suitable(). This avoids code duplication and improves
clarity.
Drop is_refcount_suitable() as it is no longer needed.
Link: https://lkml.kernel.org/r/20250526182818.37978-2-shivankg@amd.com
Signed-off-by: Shivank Garg <shivankg@amd.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Fengwei Yin <fengwei.yin@intel.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The kselftest framework uses the string logged when a test result is
reported as the unique identifier for a test, using it to track test
results between runs. The gup_longterm test fails to follow this pattern,
it runs a single test function repeatedly with various parameters but each
result report is a string logging an error message which is fixed between
runs.
Since the code already logs each test uniquely before it starts refactor
to also print this to a buffer, then use that name as the test result.
This isn't especially pretty but is relatively straightforward and is a
great help to tooling.
Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-4-ff198df8e38e@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The kselftest framework uses the string logged when a test result is
reported as the unique identifier for a test, using it to track test
results between runs. The cow test completely fails to follow this
pattern, it runs test functions repeatedly with various parameters with
each result report from those functions being a string logging an error
message which is fixed between runs.
Since the code already logs each test uniquely before it starts refactor
to also print this to a buffer, then use that name as the test result.
This isn't especially pretty but is relatively straightforward and is a
great help to tooling.
Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-3-ff198df8e38e@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Several of the MM tests have a pattern of printing a description of the
test to be run then reporting the actual TAP result using a generic string
not connected to the specific test, often in a shared function used by
many tests. The name reported typically varies depending on the specific
result rather than the test too. This causes problems for tooling that
works with test results, the names reported with the results are used to
deduplicate tests and track them between runs so both duplicated names and
changing names cause trouble for things like UIs and automated bisection.
As a first step towards matching these tests better with the expectations
of kselftest provide helpers which record the test name as part of the
initial print and then use that as part of reporting a result.
This is not added as a generic kselftest helper partly because the use of
a variable to store the test name doesn't fit well with the header only
implementation of kselftest.h and partly because it's not really an
intended pattern. Ideally at some point the mm tests that use it will be
updated to not need it.
Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-2-ff198df8e38e@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "selftests/mm: cow and gup_longterm cleanups", v2.
The bulk of these changes modify the cow and gup_longterm tests to report
unique and stable names for each test, bringing them into line with the
expectations of tooling that works with kselftest. The string reported as
a test result is used by tooling to both deduplicate tests and track tests
between test runs, using the same string for multiple tests or changing
the string depending on test result causes problems for user interfaces
and automation such as bisection.
It was suggested that converting to use kselftest_harness.h would be a
good way of addressing this, however that really wants the set of tests to
run to be known at compile time but both test programs dynamically
enumarate the set of huge page sizes the system supports and test each.
Refactoring to handle this would be even more invasive than these changes
which are large but straightforward and repetitive.
A version of the main gup_longterm cleanup was previously sent separately,
this version factors out the helpers for logging the start of the test
since the cow test looks very similar.
This patch (of 4):
The cow and gup_longterm test programs open code something that looks a
lot like the standard ksft_finished() helper to summarise the test results
and provide an exit code, convert to use ksft_finished().
Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-0-ff198df8e38e@kernel.org
Link: https://lkml.kernel.org/r/20250527-selftests-mm-cow-dedupe-v2-1-ff198df8e38e@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When CONFIG_DAMON_SYSFS is disabled, the selftests fail with the following
outputs,
not ok 2 selftests: damon: sysfs_update_schemes_tried_regions_wss_estimation.py # exit=1
not ok 3 selftests: damon: damos_quota.py # exit=1
not ok 4 selftests: damon: damos_quota_goal.py # exit=1
not ok 5 selftests: damon: damos_apply_interval.py # exit=1
not ok 6 selftests: damon: damos_tried_regions.py # exit=1
not ok 7 selftests: damon: damon_nr_regions.py # exit=1
not ok 11 selftests: damon: sysfs_update_schemes_tried_regions_hang.py # exit=1
The root cause of this issue is that all the testcases above do not check
the sysfs interface of DAMON whether it exists or not. With this patch
applied, all the testcases above now pass successfully.
Link: https://lkml.kernel.org/r/20250531093937.1555159-1-lienze@kylinos.cn
Signed-off-by: Enze Li <lienze@kylinos.cn>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
On systems with NUMA balancing enabled, it has been found that tracking
task activities resulting from NUMA balancing is beneficial. NUMA
balancing employs two mechanisms for task migration: one is to migrate
a task to an idle CPU within its preferred node, and the other is to
swap tasks located on different nodes when they are on each other's
preferred nodes.
The kernel already provides NUMA page migration statistics in
/sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched. However, it
lacks statistics regarding task migration and swapping. Therefore,
relevant counts for task migration and swapping should be added.
The following two new fields:
numa_task_migrated
numa_task_swapped
will be shown in /sys/fs/cgroup/{GROUP}/memory.stat, /proc/{PID}/sched
and /proc/vmstat.
Introducing both per-task and per-memory cgroup (memcg) NUMA balancing
statistics facilitates a rapid evaluation of the performance and
resource utilization of the target workload. For instance, users can
first identify the container with high NUMA balancing activity and then
further pinpoint a specific task within that group, and subsequently
adjust the memory policy for that task. In short, although it is
possible to iterate through /proc/$pid/sched to locate the problematic
task, the introduction of aggregated NUMA balancing activity for tasks
within each memcg can assist users in identifying the task more
efficiently through a divide-and-conquer approach.
As Libo Chen pointed out, the memcg event relies on the text names in
vmstat_text, and /proc/vmstat generates corresponding items based on
vmstat_text. Thus, the relevant task migration and swapping events
introduced in vmstat_text also need to be populated by
count_vm_numa_event(), otherwise these values are zero in /proc/vmstat.
In theory, task migration and swap events are part of the scheduler's
activities. The reason for exposing them through the
memory.stat/vmstat interface is that we already have NUMA balancing
statistics in memory.stat/vmstat, and these events are closely related
to each other. Following Shakeel's suggestion, we describe the
end-to-end flow/story of all these events occurring on a timeline for
future reference:
The goal of NUMA balancing is to co-locate a task and its memory pages
on the same NUMA node. There are two strategies: migrate the pages to
the task's node, or migrate the task to the node where its pages
reside.
Suppose a task p1 is running on Node 0, but its pages are located on
Node 1. NUMA page fault statistics for p1 reveal its "page footprint"
across nodes. If NUMA balancing detects that most of p1's pages are on
Node 1:
1.Page Migration Attempt:
The Numa balance first tries to migrate p1's pages to Node 0.
The numa_page_migrate counter increments.
2.Task Migration Strategies:
After the page migration finishes, Numa balance checks every
1 second to see if p1 can be migrated to Node 1.
Case 2.1: Idle CPU Available
If Node 1 has an idle CPU, p1 is directly scheduled there. This
event is logged as numa_task_migrated.
Case 2.2: No Idle CPU (Task Swap)
If all CPUs on Node1 are busy, direct migration could cause CPU
contention or load imbalance. Instead: The Numa balance selects a
candidate task p2 on Node 1 that prefers Node 0 (e.g., due to its own
page footprint). p1 and p2 are swapped. This cross-node swap is
recorded as numa_task_swapped.
Link: https://lkml.kernel.org/r/d00edb12ba0f0de3c5222f61487e65f2ac58f5b1.1748493462.git.yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/7ef90a88602ed536be46eba7152ed0d33bad5790.1748002400.git.yu.c.chen@intel.com
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ayush Jain <Ayush.jain3@amd.com>
Cc: "Chen, Tim C" <tim.c.chen@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Libo Chen <libo.chen@oracle.com>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "sched/numa: add statistics of numa balance task migration",
v6.
Introduce task migration and swap statistics in the following places:
/sys/fs/cgroup/{GROUP}/memory.stat
/proc/{PID}/sched
/proc/vmstat
These statistics facilitate a rapid evaluation of the performance and
resource utilization of the target workload.
This patch (of 2):
Task swapping is triggered when there are no idle CPUs in task A's
preferred node. In this case, the NUMA load balancer chooses a task B
on A's preferred node and swaps B with A. This helps improve NUMA
locality without introducing load imbalance between nodes. In the
current implementation, B's NUMA node preference is not mandatory.
That is to say, a kernel thread might be incorrectly chosen as B.
However, kernel thread and user space thread that does not have mm are
not supposed to be covered by NUMA balancing because NUMA balancing
only considers user pages via VMAs.
According to Peter's suggestion for fixing this issue, we use
PF_KTHREAD to skip the kernel thread. curr->mm is also checked because
it is possible that user_mode_thread() might create a user thread
without an mm. As per Prateek's analysis, after adding the PF_KTHREAD
check, there is no need to further check the PF_IDLE flag:
: - play_idle_precise() already ensures PF_KTHREAD is set before adding
: PF_IDLE
:
: - cpu_startup_entry() is only called from the startup thread which
: should be marked with PF_KTHREAD (based on my understanding looking at
: commit cff9b2332ab7 ("kernel/sched: Modify initial boot task idle
: setup"))
In summary, the check in task_numa_compare() now aligns with
task_tick_numa().
Link: https://lkml.kernel.org/r/cover.1748493462.git.yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/43d68b356b25d124f0d222ebedf3859e86eefb9f.1748493462.git.yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/cover.1748002400.git.yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/eaacc9c9bd37bac92d43a671867d85b2fdad3b06.1748002400.git.yu.c.chen@intel.com
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Libo Chen <libo.chen@oracle.com>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Tested-by: Ayush Jain <Ayush.jain3@amd.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: "Chen, Tim C" <tim.c.chen@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Check if "procmap_out->fd" is negative instead of "procmap_out" (which is
a pointer).
Link: https://lkml.kernel.org/r/aDbFuUTlJTBqziVd@stanley.mountain
Fixes: bd23f293a0d5 ("tools/testing: add PROCMAP_QUERY helper functions in mm self tests")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: levi.yun <yeoreum.yun@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The hugetlb fix introduced in commit ee40c9920ac2 ("mm: fix copy_vma()
error handling for hugetlb mappings") mistakenly did not provide a stub
for the VMA userland testing, which results in a compile error when trying
to build this.
Provide this stub to resolve the issue.
Link: https://lkml.kernel.org/r/20250528-fix-vma-test-v1-1-c8a5f533b38f@oracle.com
Fixes: ee40c9920ac2 ("mm: fix copy_vma() error handling for hugetlb mappings")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The current comment in gup_fast() talks about "IPIs that come from THPs
splitting", which is outdated and refers to the old THP splitting
implementation that was removed in commit ad0bed24e98b ("thp: drop all
split_huge_page()-related code"), which landed in v4.5. Before then, THP
splitting involved a pmdp_splitting_flush(), which sent an IPI to
serialize against gup_fast().
Nowadays, we use tlb_remove_table_sync_one() to send IPIs that serialize
against gup_fast(); this is used, for example, in THP *collapsing* to stop
gup_fast() walks of a page table before depositing it.
Link: https://lkml.kernel.org/r/20250528-gup-irq-comment-fix-v1-1-b9d83c345333@google.com
Signed-off-by: Jann Horn <jannh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|