Age | Commit message (Collapse) | Author |
|
Commit:
5a5d7e9badd2 ("cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG")
amended warn_slowpath_fmt() to disable preemption until the WARN splat
has been emitted.
However the commit neglected to reenable preemption in the !fmt codepath,
i.e. when a WARN splat is emitted without additional format string.
One consequence is that users may see more splats than intended. E.g. a
WARN splat emitted in a work item results in at least two extra splats:
BUG: workqueue leaked lock or atomic
(emitted by process_one_work())
BUG: scheduling while atomic
(emitted by worker_thread() -> schedule())
Ironically the point of the commit was to *avoid* extra splats. ;)
Fix it.
Fixes: 5a5d7e9badd2 ("cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/3ec48fde01e4ee6505f77908ba351bad200ae3d1.1694763684.git.lukas@wunner.de
|
|
Minor cleanup pointed out by checkpatch (repeated words, missing blank
lines) in smb2pdu.c and old header location referred to in transport.c
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
checkpatch flagged a few places with:
WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP
Also fixed minor typo
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix indentation and other code style issues in the comm.c file.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309150646.n3iBvbPj-lkp@intel.com/
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.
According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.
To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.
Signed-off-by: Szuying Chen <Chloe_Chen@asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable@vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Short summary of fixes pull:
* radeon: Uninterruptible fence waiting
* tests: Fix use-after-free bug
* vkms: Revert hrtimer fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230914122649.GA28252@linux-uq9g
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Only check eDP HPD when AUX CH is shared.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZQL+NqtIZH5F/Nxr@intel.com
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.6-2023-09-13:
amdgpu:
- GC 9.4.3 fixes
- Fix white screen issues with S/G display on system with >= 64G of ram
- Replay fixes
- SMU 13.0.6 fixes
- AUX backlight fix
- NBIO 4.3 SR-IOV fixes for HDP
- RAS fixes
- DP MST resume fix
- Fix segfault on systems with no vbios
- DPIA fixes
amdkfd:
- CWSR grace period fix
- Unaligned doorbell fix
- CRIU fix for GFX11
- Add missing TLB flush on gfx10 and newer
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230913195009.7714-1-alexander.deucher@amd.com
|
|
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.6
- nvme-tcp iov len fix (Varun)
- nvme-hwmon const qualifier for safety (Krzysztof)
- nvme-fc null pointer checks (Nigel)
- nvme-pci no numa node fix (Pratyush)
- nvme timeout fix for non-compliant controllers (Keith)"
* tag 'nvme-6.6-2023-09-14' of git://git.infradead.org/nvme:
nvme: avoid bogus CRTO values
nvme-pci: do not set the NUMA node of device if it has none
nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid()
nvme: host: hwmon: constify pointers to hwmon_channel_info
nvmet-tcp: pass iov_len instead of sg->length to bvec_set_page()
|
|
When opening a directory (opendir(3)) or rewinding it (rewinddir(3)), we
are not holding the directory's inode locked, and this can result in later
attempting to add two entries to the directory with the same index number,
resulting in a transaction abort, with -EEXIST (-17), when inserting the
second delayed dir index. This results in a trace like the following:
Sep 11 22:34:59 myhostname kernel: BTRFS error (device dm-3): err add delayed dir index item(name: cockroach-stderr.log) into the insertion tree of the delayed node(root id: 5, inode id: 4539217, errno: -17)
Sep 11 22:34:59 myhostname kernel: ------------[ cut here ]------------
Sep 11 22:34:59 myhostname kernel: kernel BUG at fs/btrfs/delayed-inode.c:1504!
Sep 11 22:34:59 myhostname kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Sep 11 22:34:59 myhostname kernel: CPU: 0 PID: 7159 Comm: cockroach Not tainted 6.4.15-200.fc38.x86_64 #1
Sep 11 22:34:59 myhostname kernel: Hardware name: ASUS ESC500 G3/P9D WS, BIOS 2402 06/27/2018
Sep 11 22:34:59 myhostname kernel: RIP: 0010:btrfs_insert_delayed_dir_index+0x1da/0x260
Sep 11 22:34:59 myhostname kernel: Code: eb dd 48 (...)
Sep 11 22:34:59 myhostname kernel: RSP: 0000:ffffa9980e0fbb28 EFLAGS: 00010282
Sep 11 22:34:59 myhostname kernel: RAX: 0000000000000000 RBX: ffff8b10b8f4a3c0 RCX: 0000000000000000
Sep 11 22:34:59 myhostname kernel: RDX: 0000000000000000 RSI: ffff8b177ec21540 RDI: ffff8b177ec21540
Sep 11 22:34:59 myhostname kernel: RBP: ffff8b110cf80888 R08: 0000000000000000 R09: ffffa9980e0fb938
Sep 11 22:34:59 myhostname kernel: R10: 0000000000000003 R11: ffffffff86146508 R12: 0000000000000014
Sep 11 22:34:59 myhostname kernel: R13: ffff8b1131ae5b40 R14: ffff8b10b8f4a418 R15: 00000000ffffffef
Sep 11 22:34:59 myhostname kernel: FS: 00007fb14a7fe6c0(0000) GS:ffff8b177ec00000(0000) knlGS:0000000000000000
Sep 11 22:34:59 myhostname kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 11 22:34:59 myhostname kernel: CR2: 000000c00143d000 CR3: 00000001b3b4e002 CR4: 00000000001706f0
Sep 11 22:34:59 myhostname kernel: Call Trace:
Sep 11 22:34:59 myhostname kernel: <TASK>
Sep 11 22:34:59 myhostname kernel: ? die+0x36/0x90
Sep 11 22:34:59 myhostname kernel: ? do_trap+0xda/0x100
Sep 11 22:34:59 myhostname kernel: ? btrfs_insert_delayed_dir_index+0x1da/0x260
Sep 11 22:34:59 myhostname kernel: ? do_error_trap+0x6a/0x90
Sep 11 22:34:59 myhostname kernel: ? btrfs_insert_delayed_dir_index+0x1da/0x260
Sep 11 22:34:59 myhostname kernel: ? exc_invalid_op+0x50/0x70
Sep 11 22:34:59 myhostname kernel: ? btrfs_insert_delayed_dir_index+0x1da/0x260
Sep 11 22:34:59 myhostname kernel: ? asm_exc_invalid_op+0x1a/0x20
Sep 11 22:34:59 myhostname kernel: ? btrfs_insert_delayed_dir_index+0x1da/0x260
Sep 11 22:34:59 myhostname kernel: ? btrfs_insert_delayed_dir_index+0x1da/0x260
Sep 11 22:34:59 myhostname kernel: btrfs_insert_dir_item+0x200/0x280
Sep 11 22:34:59 myhostname kernel: btrfs_add_link+0xab/0x4f0
Sep 11 22:34:59 myhostname kernel: ? ktime_get_real_ts64+0x47/0xe0
Sep 11 22:34:59 myhostname kernel: btrfs_create_new_inode+0x7cd/0xa80
Sep 11 22:34:59 myhostname kernel: btrfs_symlink+0x190/0x4d0
Sep 11 22:34:59 myhostname kernel: ? schedule+0x5e/0xd0
Sep 11 22:34:59 myhostname kernel: ? __d_lookup+0x7e/0xc0
Sep 11 22:34:59 myhostname kernel: vfs_symlink+0x148/0x1e0
Sep 11 22:34:59 myhostname kernel: do_symlinkat+0x130/0x140
Sep 11 22:34:59 myhostname kernel: __x64_sys_symlinkat+0x3d/0x50
Sep 11 22:34:59 myhostname kernel: do_syscall_64+0x5d/0x90
Sep 11 22:34:59 myhostname kernel: ? syscall_exit_to_user_mode+0x2b/0x40
Sep 11 22:34:59 myhostname kernel: ? do_syscall_64+0x6c/0x90
Sep 11 22:34:59 myhostname kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
The race leading to the problem happens like this:
1) Directory inode X is loaded into memory, its ->index_cnt field is
initialized to (u64)-1 (at btrfs_alloc_inode());
2) Task A is adding a new file to directory X, holding its vfs inode lock,
and calls btrfs_set_inode_index() to get an index number for the entry.
Because the inode's index_cnt field is set to (u64)-1 it calls
btrfs_inode_delayed_dir_index_count() which fails because no dir index
entries were added yet to the delayed inode and then it calls
btrfs_set_inode_index_count(). This functions finds the last dir index
key and then sets index_cnt to that index value + 1. It found that the
last index key has an offset of 100. However before it assigns a value
of 101 to index_cnt...
3) Task B calls opendir(3), ending up at btrfs_opendir(), where the VFS
lock for inode X is not taken, so it calls btrfs_get_dir_last_index()
and sees index_cnt still with a value of (u64)-1. Because of that it
calls btrfs_inode_delayed_dir_index_count() which fails since no dir
index entries were added to the delayed inode yet, and then it also
calls btrfs_set_inode_index_count(). This also finds that the last
index key has an offset of 100, and before it assigns the value 101
to the index_cnt field of inode X...
4) Task A assigns a value of 101 to index_cnt. And then the code flow
goes to btrfs_set_inode_index() where it increments index_cnt from
101 to 102. Task A then creates a delayed dir index entry with a
sequence number of 101 and adds it to the delayed inode;
5) Task B assigns 101 to the index_cnt field of inode X;
6) At some later point when someone tries to add a new entry to the
directory, btrfs_set_inode_index() will return 101 again and shortly
after an attempt to add another delayed dir index key with index
number 101 will fail with -EEXIST resulting in a transaction abort.
Fix this by locking the inode at btrfs_get_dir_last_index(), which is only
only used when opening a directory or attempting to lseek on it.
Reported-by: ken <ken@bllue.org>
Link: https://lore.kernel.org/linux-btrfs/CAE6xmH+Lp=Q=E61bU+v9eWX8gYfLvu6jLYxjxjFpo3zHVPR0EQ@mail.gmail.com/
Reported-by: syzbot+d13490c82ad5353c779d@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
Fixes: 9b378f6ad48c ("btrfs: fix infinite directory reads")
CC: stable@vger.kernel.org # 6.5+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When opening a directory we find what's the index of its last entry and
then store it in the directory's file handle private data (struct
btrfs_file_private::last_index), so that in the case new directory entries
are added to a directory after an opendir(3) call we don't end up in an
infinite loop (see commit 9b378f6ad48c ("btrfs: fix infinite directory
reads")) when calling readdir(3).
However once rewinddir(3) is called, POSIX states [1] that any new
directory entries added after the previous opendir(3) call, must be
returned by subsequent calls to readdir(3):
"The rewinddir() function shall reset the position of the directory
stream to which dirp refers to the beginning of the directory.
It shall also cause the directory stream to refer to the current
state of the corresponding directory, as a call to opendir() would
have done."
We currently don't refresh the last_index field of the struct
btrfs_file_private associated to the directory, so after a rewinddir(3)
we are not returning any new entries added after the opendir(3) call.
Fix this by finding the current last index of the directory when llseek
is called against the directory.
This can be reproduced by the following C program provided by Ian Johnson:
#include <dirent.h>
#include <stdio.h>
int main(void) {
DIR *dir = opendir("test");
FILE *file;
file = fopen("test/1", "w");
fwrite("1", 1, 1, file);
fclose(file);
file = fopen("test/2", "w");
fwrite("2", 1, 1, file);
fclose(file);
rewinddir(dir);
struct dirent *entry;
while ((entry = readdir(dir))) {
printf("%s\n", entry->d_name);
}
closedir(dir);
return 0;
}
Reported-by: Ian Johnson <ian@ianjohnson.dev>
Link: https://lore.kernel.org/linux-btrfs/YR1P0S.NGASEG570GJ8@ianjohnson.dev/
Fixes: 9b378f6ad48c ("btrfs: fix infinite directory reads")
CC: stable@vger.kernel.org # 6.5+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When opening a directory for reading it, we set the last index where we
stop iteration to the value in struct btrfs_inode::index_cnt. That value
does not match the index of the most recently added directory entry but
it's instead the index number that will be assigned the next directory
entry.
This means that if after the call to opendir(3) new directory entries are
added, a readdir(3) call will return the first new directory entry. This
is fine because POSIX says the following [1]:
"If a file is removed from or added to the directory after the most
recent call to opendir() or rewinddir(), whether a subsequent call to
readdir() returns an entry for that file is unspecified."
For example for the test script from commit 9b378f6ad48c ("btrfs: fix
infinite directory reads"), where we have 2000 files in a directory, ext4
doesn't return any new directory entry after opendir(3), while xfs returns
the first 13 new directory entries added after the opendir(3) call.
If we move to a shorter example with an empty directory when opendir(3) is
called, and 2 files added to the directory after the opendir(3) call, then
readdir(3) on btrfs will return the first file, ext4 and xfs return the 2
files (but in a different order). A test program for this, reported by
Ian Johnson, is the following:
#include <dirent.h>
#include <stdio.h>
int main(void) {
DIR *dir = opendir("test");
FILE *file;
file = fopen("test/1", "w");
fwrite("1", 1, 1, file);
fclose(file);
file = fopen("test/2", "w");
fwrite("2", 1, 1, file);
fclose(file);
struct dirent *entry;
while ((entry = readdir(dir))) {
printf("%s\n", entry->d_name);
}
closedir(dir);
return 0;
}
To make this less odd, change the behaviour to never return new entries
that were added after the opendir(3) call. This is done by setting the
last_index field of the struct btrfs_file_private attached to the
directory's file handle with a value matching btrfs_inode::index_cnt
minus 1, since that value always matches the index of the next new
directory entry and not the index of the most recently added entry.
[1] https://pubs.opengroup.org/onlinepubs/007904875/functions/readdir_r.html
Link: https://lore.kernel.org/linux-btrfs/YR1P0S.NGASEG570GJ8@ianjohnson.dev/
CC: stable@vger.kernel.org # 6.5+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Some devices are reporting controller ready mode support, but return 0
for CRTO. These devices require a much higher time to ready than that,
so they are failing to initialize after the driver starter preferring
that value over CAP.TO.
The spec requires that CAP.TO match the appropritate CRTO value, or be
set to 0xff if CRTO is larger than that. This means that CAP.TO can be
used to validate if CRTO is reliable, and provides an appropriate
fallback for setting the timeout value if not. Use whichever is larger.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863
Reported-by: Cláudio Sampaio <patola@gmail.com>
Reported-by: Felix Yan <felixonmars@archlinux.org>
Tested-by: Felix Yan <felixonmars@archlinux.org>
Based-on-a-patch-by: Felix Yan <felixonmars@archlinux.org>
Cc: stable@vger.kernel.org
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Commit bc840ea5f9a9 ("thermal: core: Do not handle trip points with
invalid temperature") added a check for invalid temperature to the
disabled trip point check in handle_thermal_trip(), but that check was
added at a point when the trip structure has not been initialized yet.
This may cause handle_thermal_trip() to skip a valid trip point in some
cases, so fix it by moving the check to a suitable place, after
__thermal_zone_get_trip() has been called to populate the trip
structure.
Fixes: bc840ea5f9a9 ("thermal: core: Do not handle trip points with invalid temperature")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.6
Pull MD fixes from Song:
"These commits fix a bugzilla report [1] and some recent issues in 6.5
and 6.6.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217798"
* tag 'md-fixes-20230914' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: Put the right device in md_seq_next
md/raid1: fix error: ISO C90 forbids mixed declarations
md: fix warning for holder mismatch from export_rdev()
md: don't dereference mddev after export_rdev()
|
|
Running "make modules_install" may fail with
make[2]: execvp: /bin/sh: Argument list too long
if many modules are built and INSTALL_MOD_PATH is long. This is because
scripts/Makefile.modinst creates all directories with one mkdir command.
Use $(foreach ...) instead to prevent an excessive argument list.
Fixes: 2dfec887c0fd ("kbuild: reduce the number of mkdir calls during modules_install")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Since commit fe66b5d2ae72 ("kbuild: refactor kernel-devel RPM package
and linux-headers Deb package"), the kernel-devel RPM package and
linux-headers Deb package are broken.
I double-quoted the $(find ... -type d), which resulted in newlines
being included in the argument to the outer find comment.
find: 'arch/arm64/include\narch/arm64/kvm/hyp/include': No such file or directory
The outer find command is unneeded.
Fixes: fe66b5d2ae72 ("kbuild: refactor kernel-devel RPM package and linux-headers Deb package")
Reported-by: Karolis M <k4rolis@protonmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
|
|
If there are multiple arrays in system and one mddevice is marked
with MD_DELETED and md_seq_next() is called in the middle of removal
then it _get()s proper device but it may _put() deleted one. As a result,
active counter may never be zeroed for mddevice and it cannot
be removed.
Put the device which has been _get with previous md_seq_next() call.
Cc: stable@vger.kernel.org
Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk is freed")
Reported-by: AceLan Kao <acelan@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217798
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230914152416.10819-1-mariusz.tkaczyk@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Quite unusually, this does not contains any fix coming from subtrees
(nf, ebpf, wifi, etc).
Current release - regressions:
- bcmasp: fix possible OOB write in bcmasp_netfilt_get_all_active()
Previous releases - regressions:
- ipv4: fix one memleak in __inet_del_ifa()
- tcp: fix bind() regressions for v4-mapped-v6 addresses.
- tls: do not free tls_rec on async operation in
bpf_exec_tx_verdict()
- dsa: fixes for SJA1105 FDB regressions
- veth: update XDP feature set when bringing up device
- igb: fix hangup when enabling SR-IOV
Previous releases - always broken:
- kcm: fix memory leak in error path of kcm_sendmsg()
- smc: fix data corruption in smcr_port_add
- microchip: fix possible memory leak for vcap_dup_rule()"
* tag 'net-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().
net: renesas: rswitch: Add spin lock protection for irq {un}mask
net: renesas: rswitch: Fix unmasking irq condition
igb: clean up in all error paths when enabling SR-IOV
ixgbe: fix timestamp configuration code
selftest: tcp: Add v4-mapped-v6 cases in bind_wildcard.c.
selftest: tcp: Move expected_errno into each test case in bind_wildcard.c.
selftest: tcp: Fix address length in bind_wildcard.c.
tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.
tcp: Fix bind() regression for v4-mapped-v6 wildcard address.
tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any).
ipv6: fix ip6_sock_set_addr_preferences() typo
veth: Update XDP feature set when bringing up device
net: macb: fix sleep inside spinlock
net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()
net: ethernet: mtk_eth_soc: fix pse_port configuration for MT7988
net: ethernet: mtk_eth_soc: fix uninitialized variable
kcm: Fix memory leak in error path of kcm_sendmsg()
r8152: check budget for r8152_poll()
net: dsa: sja1105: block FDB accesses that are concurrent with a switch reset
...
|
|
When using selected buffer feature, io_uring delays data iter setup
until later. If io_setup_async_msg() is called before that it might see
not correctly setup iterator. Pre-init nr_segs and judge from its state
whether we repointing.
Cc: stable@vger.kernel.org
Reported-by: syzbot+a4c6e5ef999b68b26ed1@syzkaller.appspotmail.com
Fixes: 0455d4ccec548 ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0000000000002770be06053c7757@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
With the configuration PAGE_SIZE 64k and filesystem blocksize 64k,
a problem occurred when more than 13 million files were directly created
under a directory:
EXT4-fs error (device xx): ext4_dx_csum_set:492: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D.
EXT4-fs error (device xx): ext4_dx_csum_verify:463: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D.
EXT4-fs error (device xx): dx_probe:856: inode #xxxx: block 8188: comm xxxxx: Directory index failed checksum
When enough files are created, the fake_dirent->reclen will be 0xffff.
it doesn't equal to the blocksize 65536, i.e. 0x10000.
But it is not the same condition when blocksize equals to 4k.
when enough files are created, the fake_dirent->reclen will be 0x1000.
it equals to the blocksize 4k, i.e. 0x1000.
The problem seems to be related to the limitation of the 16-bit field
when the blocksize is set to 64k.
To address this, helpers like ext4_rec_len_{from,to}_disk has already
been introduced to complete the conversion between the encoded and the
plain form of rec_len.
So fix this one by using the helper, and all the other in this file too.
Cc: stable@kernel.org
Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes")
Suggested-by: Andreas Dilger <adilger@dilger.ca>
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20230803060938.1929759-1-zhangshida@kylinos.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Len Brown has reported that system suspend sometimes fail due to
inability to freeze a task working in ext4_trim_fs() for one minute.
Trimming a large filesystem on a disk that slowly processes discard
requests can indeed take a long time. Since discard is just an advisory
call, it is perfectly fine to interrupt it at any time and the return
number of discarded blocks until that moment. Do that when we detect the
task is being frozen.
Cc: stable@kernel.org
Reported-by: Len Brown <lenb@kernel.org>
Suggested-by: Dave Chinner <david@fromorbit.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Currently we set the group's trimmed bit in ext4_trim_all_free() based
on return value of ext4_try_to_trim_range(). However when we will want
to abort trimming because of suspend attempt, we want to return success
from ext4_try_to_trim_range() but not set the trimmed bit. Instead
implementing awkward propagation of this information, just move setting
of trimmed bit into ext4_try_to_trim_range() when the whole group is
trimmed.
Cc: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
There is a memory leak reported by kmemleak:
unreferenced object 0xff11000105903b80 (size 64):
comm "mount", pid 3382, jiffies 4295032021 (age 27.826s)
hex dump (first 32 bytes):
04 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffffae86ac40>] __kmalloc_node+0x50/0x160
[<ffffffffaf2486d8>] crypto_alloc_tfmmem.isra.0+0x38/0x110
[<ffffffffaf2498e5>] crypto_create_tfm_node+0x85/0x2f0
[<ffffffffaf24a92c>] crypto_alloc_tfm_node+0xfc/0x210
[<ffffffffaedde777>] journal_init_common+0x727/0x1ad0
[<ffffffffaede1715>] jbd2_journal_init_inode+0x2b5/0x500
[<ffffffffaed786b5>] ext4_load_and_init_journal+0x255/0x2440
[<ffffffffaed8b423>] ext4_fill_super+0x8823/0xa330
...
The root cause was traced to an error handing path in journal_init_common()
when malloc memory failed in register_shrinker(). The checksum driver is
used to reference to checksum algorithm via cryptoapi and the user should
release the memory when the driver is no longer needed or the journal
initialization failed.
Fix it by calling crypto_free_shash() on the "err_cleanup" error handing
path in journal_init_common().
Fixes: c30713084ba5 ("jbd2: move load_superblock() into journal_init_common()")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230911025138.983101-1-lizetao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
There's a race condition in the multipath target when retrieve_deps
races with multipath_message calling dm_get_device and dm_put_device.
retrieve_deps walks the list of open devices without holding any lock
but multipath may add or remove devices to the list while it is
running. The end result may be memory corruption or use-after-free
memory access.
See this description of a UAF with multipath_message():
https://listman.redhat.com/archives/dm-devel/2022-October/052373.html
Fix this bug by introducing a new rw semaphore "devices_lock". We grab
devices_lock for read in retrieve_deps and we grab it for write in
dm_get_device and dm_put_device.
Reported-by: Luo Meng <luomeng12@huawei.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Tested-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
One doc fix for drm/connector, one fix for amdgpu for an crash when
VRAM usage is high, and one fix in gm12u320 to fix the timeout units in
the code
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/w5nlld5ukeh6bgtljsxmkex3e7s7f4qquuqkv5lv4cv3uxzwqr@pgokpejfsyef
|
|
when using __drm_kunit_helper_alloc_drm_device() the driver may be
dereferenced by device-managed resources up until the device is
freed, which is typically later than the kunit-managed resource code
frees it. Fix this by simply make the driver device-managed as well.
In short, the sequence leading to the UAF is as follows:
INIT:
Code allocates a struct device as a kunit-managed resource.
Code allocates a drm driver as a kunit-managed resource.
Code allocates a drm device as a device-managed resource.
EXIT:
Kunit resource cleanup frees the drm driver
Kunit resource cleanup puts the struct device, which starts a
device-managed resource cleanup
device-managed cleanup calls drm_dev_put()
drm_dev_put() dereferences the (now freed) drm driver -> Boom.
Related KASAN message:
[55272.551542] ==================================================================
[55272.551551] BUG: KASAN: slab-use-after-free in drm_dev_put.part.0+0xd4/0xe0 [drm]
[55272.551603] Read of size 8 at addr ffff888127502828 by task kunit_try_catch/10353
[55272.551612] CPU: 4 PID: 10353 Comm: kunit_try_catch Tainted: G U N 6.5.0-rc7+ #155
[55272.551620] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
[55272.551626] Call Trace:
[55272.551629] <TASK>
[55272.551633] dump_stack_lvl+0x57/0x90
[55272.551639] print_report+0xcf/0x630
[55272.551645] ? _raw_spin_lock_irqsave+0x5f/0x70
[55272.551652] ? drm_dev_put.part.0+0xd4/0xe0 [drm]
[55272.551694] kasan_report+0xd7/0x110
[55272.551699] ? drm_dev_put.part.0+0xd4/0xe0 [drm]
[55272.551742] drm_dev_put.part.0+0xd4/0xe0 [drm]
[55272.551783] devres_release_all+0x15d/0x1f0
[55272.551790] ? __pfx_devres_release_all+0x10/0x10
[55272.551797] device_unbind_cleanup+0x16/0x1a0
[55272.551802] device_release_driver_internal+0x3e5/0x540
[55272.551808] ? kobject_put+0x5d/0x4b0
[55272.551814] bus_remove_device+0x1f1/0x3f0
[55272.551819] device_del+0x342/0x910
[55272.551826] ? __pfx_device_del+0x10/0x10
[55272.551830] ? lock_release+0x339/0x5e0
[55272.551836] ? kunit_remove_resource+0x128/0x290 [kunit]
[55272.551845] ? __pfx_lock_release+0x10/0x10
[55272.551851] platform_device_del.part.0+0x1f/0x1e0
[55272.551856] ? _raw_spin_unlock_irqrestore+0x30/0x60
[55272.551863] kunit_remove_resource+0x195/0x290 [kunit]
[55272.551871] ? _raw_spin_unlock_irqrestore+0x30/0x60
[55272.551877] kunit_cleanup+0x78/0x120 [kunit]
[55272.551885] ? __kthread_parkme+0xc1/0x1f0
[55272.551891] ? __pfx_kunit_try_run_case_cleanup+0x10/0x10 [kunit]
[55272.551900] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10 [kunit]
[55272.551909] kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit]
[55272.551919] kthread+0x2e7/0x3c0
[55272.551924] ? __pfx_kthread+0x10/0x10
[55272.551929] ret_from_fork+0x2d/0x70
[55272.551935] ? __pfx_kthread+0x10/0x10
[55272.551940] ret_from_fork_asm+0x1b/0x30
[55272.551948] </TASK>
[55272.551953] Allocated by task 10351:
[55272.551956] kasan_save_stack+0x1c/0x40
[55272.551962] kasan_set_track+0x21/0x30
[55272.551966] __kasan_kmalloc+0x8b/0x90
[55272.551970] __kmalloc+0x5e/0x160
[55272.551976] kunit_kmalloc_array+0x1c/0x50 [kunit]
[55272.551984] drm_exec_test_init+0xfa/0x2c0 [drm_exec_test]
[55272.551991] kunit_try_run_case+0xdd/0x250 [kunit]
[55272.551999] kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit]
[55272.552008] kthread+0x2e7/0x3c0
[55272.552012] ret_from_fork+0x2d/0x70
[55272.552017] ret_from_fork_asm+0x1b/0x30
[55272.552024] Freed by task 10353:
[55272.552027] kasan_save_stack+0x1c/0x40
[55272.552032] kasan_set_track+0x21/0x30
[55272.552036] kasan_save_free_info+0x27/0x40
[55272.552041] __kasan_slab_free+0x106/0x180
[55272.552046] slab_free_freelist_hook+0xb3/0x160
[55272.552051] __kmem_cache_free+0xb2/0x290
[55272.552056] kunit_remove_resource+0x195/0x290 [kunit]
[55272.552064] kunit_cleanup+0x78/0x120 [kunit]
[55272.552072] kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit]
[55272.552080] kthread+0x2e7/0x3c0
[55272.552085] ret_from_fork+0x2d/0x70
[55272.552089] ret_from_fork_asm+0x1b/0x30
[55272.552096] The buggy address belongs to the object at ffff888127502800
which belongs to the cache kmalloc-512 of size 512
[55272.552105] The buggy address is located 40 bytes inside of
freed 512-byte region [ffff888127502800, ffff888127502a00)
[55272.552115] The buggy address belongs to the physical page:
[55272.552119] page:00000000af6c70ff refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x127500
[55272.552127] head:00000000af6c70ff order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[55272.552133] anon flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[55272.552141] page_type: 0xffffffff()
[55272.552145] raw: 0017ffffc0010200 ffff888100042c80 0000000000000000 dead000000000001
[55272.552152] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[55272.552157] page dumped because: kasan: bad access detected
[55272.552163] Memory state around the buggy address:
[55272.552167] ffff888127502700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[55272.552173] ffff888127502780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[55272.552178] >ffff888127502800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[55272.552184] ^
[55272.552187] ffff888127502880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[55272.552193] ffff888127502900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[55272.552198] ==================================================================
[55272.552203] Disabling lock debugging due to kernel taint
v2:
- Update commit message, add Fixes: tag and Cc stable.
v3:
- Further commit message updates (Maxime Ripard).
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org # v6.3+
Fixes: d98780310719 ("drm/tests: helpers: Allow to pass a custom drm_driver")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20230907135339.7971-2-thomas.hellstrom@linux.intel.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
This reverts commit a0e6a017ab56936c0405fe914a793b241ed25ee0.
Unlocking a mutex in the context of a hrtimer callback is violating mutex
locking rules, as mutex_unlock() from interrupt context is not permitted.
Link: https://lore.kernel.org/dri-devel/ZQLAc%2FFwkv%2FGiVoK@phenom.ffwll.local/T/#t
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20230914102024.1789154-1-mcanal@igalia.com
|
|
syzkaller found a memory leak in kcm_sendmsg(), and commit c821a88bd720
("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by
updating kcm_tx_msg(head)->last_skb if partial data is copied so that the
following sendmsg() will resume from the skb.
However, we cannot know how many bytes were copied when we get the error.
Thus, we could mess up the MSG_MORE queue.
When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we
do so for UDP by udp_flush_pending_frames().
Even without this change, when the error occurred, the following sendmsg()
resumed from a wrong skb and the queue was messed up. However, we have
yet to get such a report, and only syzkaller stumbled on it. So, this
can be changed safely.
Note this does not change SOCK_SEQPACKET behaviour.
Fixes: c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()")
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230912022753.33327-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Yoshihiro Shimoda says:
====================
net: renesas: rswitch: Fix a lot of redundant irq issue
After this patch series was applied, a lot of redundant interrupts
no longer occur.
For example: when "iperf3 -c <ipaddr> -R" on R-Car S4-8 Spider
Before the patches are applied: about 800,000 times happened
After the patches were applied: about 100,000 times happened
====================
Link: https://lore.kernel.org/r/20230912014936.3175430-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add spin lock protection for irq {un}mask registers' control.
After napi_complete_done() and this protection were applied,
a lot of redundant interrupts no longer occur.
For example: when "iperf3 -c <ipaddr> -R" on R-Car S4-8 Spider
Before the patches are applied: about 800,000 times happened
After the patches were applied: about 100,000 times happened
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Fix unmasking irq condition by using napi_complete_done(). Otherwise,
redundant interrupts happen.
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
During rmmod, when dev_loss_tmo callback is called, an ndlp kref count is
decremented twice. Once for SCSI transport registration and second to
remove the initial node allocation kref. If there is also an NVMe
transport registration, another reference count decrement is expected in
lpfc_nvme_unregister_port().
Race conditions between the NVMe transport remoteport_delete and
dev_loss_tmo callbacks sometimes results in premature ndlp object release
resulting in use-after-free issues.
Fix by not dropping the ndlp object in dev_loss_tmo callback with an
outstanding NVMe transport registration. Inversely, mark the final
NLP_DROPPED flag in lpfc_nvme_unregister_port when rmmod flag is set.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230908211923.37603-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When a dev_loss_tmo event occurs, an ndlp lock is taken before checking
nlp_flag for NLP_DROPPED. There is an attempt to restore the ndlp lock
when exiting the if statement, but the nlp_put kref could be the final
decrement causing a use-after-free memory access on a released ndlp object.
Instead of trying to reacquire the ndlp lock after checking nlp_flag, just
return after calling nlp_put.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230908211852.37576-1-justintee8345@gmail.com
Reviewed-by: "Ewan D. Milne" <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Since debugfs_create_file() returns ERR_PTR and never NULL, use IS_ERR() to
check the return value.
Fixes: 2fcbc569b9f5 ("scsi: lpfc: Make debugfs ktime stats generic for NVME and SCSI")
Fixes: 4c47efc140fa ("scsi: lpfc: Move SCSI and NVME Stats to hardware queue structures")
Fixes: 6a828b0f6192 ("scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues")
Fixes: 95bfc6d8ad86 ("scsi: lpfc: Make FW logging dynamically configurable")
Fixes: 9f77870870d8 ("scsi: lpfc: Add debugfs support for cm framework buffers")
Fixes: c490850a0947 ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230906030809.2847970-1-ruanjinjie@huawei.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The target_cmd_counter struct allocated via target_alloc_cmd_counter() is
never freed, resulting in leaks across various transport types, e.g.:
unreferenced object 0xffff88801f920120 (size 96):
comm "sh", pid 102, jiffies 4294892535 (age 713.412s)
hex dump (first 32 bytes):
07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 38 01 92 1f 80 88 ff ff ........8.......
backtrace:
[<00000000e58a6252>] kmalloc_trace+0x11/0x20
[<0000000043af4b2f>] target_alloc_cmd_counter+0x17/0x90 [target_core_mod]
[<000000007da2dfa7>] target_setup_session+0x2d/0x140 [target_core_mod]
[<0000000068feef86>] tcm_loop_tpg_nexus_store+0x19b/0x350 [tcm_loop]
[<000000006a80e021>] configfs_write_iter+0xb1/0x120
[<00000000e9f4d860>] vfs_write+0x2e4/0x3c0
[<000000008143433b>] ksys_write+0x80/0xb0
[<00000000a7df29b2>] do_syscall_64+0x42/0x90
[<0000000053f45fb8>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Free the structure alongside the corresponding iscsit_conn / se_sess
parent.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://lore.kernel.org/r/20230831183459.6938-1-ddiss@suse.de
Fixes: becd9be6069e ("scsi: target: Move sess cmd counter to new struct")
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The function pm8001_pci_resume() only calls pm8001_request_irq() without
calling pm8001_setup_irq(). This causes the IRQ allocation to fail, which
leads all drives being removed from the system.
Fix this issue by integrating the code for pm8001_setup_irq() directly
inside pm8001_request_irq() so that MSI-X setup is performed both during
normal initialization and resume operations.
Fixes: dbf9bfe61571 ("[SCSI] pm8001: add SAS/SATA HBA driver")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-2-dlemoal@kernel.org
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
OPC_INB_SET_CONTROLLER_CONFIG command
Tags allocated for OPC_INB_SET_CONTROLLER_CONFIG command need to be freed
when we receive the response.
Signed-off-by: Michal Grzedzicki <mge@meta.com>
Link: https://lore.kernel.org/r/20230911170340.699533-2-mge@meta.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Some cards have more than one SAS address. Using an incorrect address
causes communication issues with some devices like expanders.
Closes: https://lore.kernel.org/linux-kernel/A57AEA84-5CA0-403E-8053-106033C73C70@fb.com/
Signed-off-by: Michal Grzedzicki <mge@meta.com>
Link: https://lore.kernel.org/r/20230913155611.3183612-1-mge@meta.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pull in staged fixes for 6.6.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull genpm / pmdomain rename from Ulf Hansson:
"This renames the genpd subsystem to pmdomain.
As discussed on LKML, using 'genpd' as the name of a subsystem isn't
very self-explanatory and the acronym itself that means Generic PM
Domain, is known only by a limited group of people.
The suggestion to improve the situation is to rename the subsystem to
'pmdomain', which there seems to be a good consensus around using.
Ideally it should indicate that its purpose is to manage Power Domains
or 'PM domains' as we often also use within the Linux Kernel
terminology"
* tag 'pmdomain-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: Rename the genpd subsystem to pmdomain
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fix from Jarkko Sakkinen.
* tag 'tpmdd-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Fix typo in tpmrm class definition
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
- fix reference to exported symbols for parisc64 [Masahiro Yamada]
- Block-TLB (BTLB) support on 32-bit CPUs
- sparse and build-warning fixes
* tag 'parisc-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
linux/export: fix reference to exported functions for parisc64
parisc: BTLB: Initialize BTLB tables at CPU startup
parisc: firmware: Simplify calling non-PA20 functions
parisc: BTLB: _edata symbol has to be page aligned for BTLB support
parisc: BTLB: Add BTLB insert and purge firmware function wrappers
parisc: BTLB: Clear possibly existing BTLB entries
parisc: Prepare for Block-TLB support on 32-bit kernel
parisc: shmparam.h: Document aliasing requirements of PA-RISC
parisc: irq: Make irq_stack_union static to avoid sparse warning
parisc: drivers: Fix sparse warning
parisc: iosapic.c: Fix sparse warnings
parisc: ccio-dma: Fix sparse warnings
parisc: sba-iommu: Fix sparse warnigs
parisc: sba: Fix compile warning wrt list of SBA devices
parisc: sba_iommu: Fix build warning if procfs if disabled
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Add missing LOCKDOWN checks for eventfs callers
When LOCKDOWN is active for tracing, it causes inconsistent state
when some functions succeed and others fail.
- Use dput() to free the top level eventfs descriptor
There was a race between accesses and freeing it.
- Fix a long standing bug that eventfs exposed due to changing timings
by dynamically creating files. That is, If a event file is opened for
an instance, there's nothing preventing the instance from being
removed which will make accessing the files cause use-after-free
bugs.
- Fix a ring buffer race that happens when iterating over the ring
buffer while writers are active. Check to make sure not to read the
event meta data if it's beyond the end of the ring buffer sub buffer.
- Fix the print trigger that disappeared because the test to create it
was looking for the event dir field being filled, but now it has the
"ef" field filled for the eventfs structure.
- Remove the unused "dir" field from the event structure.
- Fix the order of the trace_dynamic_info as it had it backwards for
the offset and len fields for which one was for which endianess.
- Fix NULL pointer dereference with eventfs_remove_rec()
If an allocation fails in one of the eventfs_add_*() functions, the
caller of it in event_subsystem_dir() or event_create_dir() assigns
the result to the structure. But it's assigning the ERR_PTR and not
NULL. This was passed to eventfs_remove_rec() which expects either a
good pointer or a NULL, not ERR_PTR. The fix is to not assign the
ERR_PTR to the structure, but to keep it NULL on error.
- Fix list_for_each_rcu() to use list_for_each_srcu() in
dcache_dir_open_wrapper(). One iteration of the code used RCU but
because it had to call sleepable code, it had to be changed to use
SRCU, but one of the iterations was missed.
- Fix synthetic event print function to use "as_u64" instead of passing
in a pointer to the union. To fix big/little endian issues, the u64
that represented several types was turned into a union to define the
types properly.
* tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()
tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper()
tracing/synthetic: Print out u64 values properly
tracing/synthetic: Fix order of struct trace_dynamic_info
selftests/ftrace: Fix dependencies for some of the synthetic event tests
tracing: Remove unused trace_event_file dir field
tracing: Use the new eventfs descriptor for print trigger
ring-buffer: Do not attempt to read past "commit"
tracefs/eventfs: Free top level files on removal
ring-buffer: Avoid softlockup in ring_buffer_resize()
tracing: Have event inject files inc the trace array ref count
tracing: Have option files inc the trace array ref count
tracing: Have current_trace inc the trace array ref count
tracing: Have tracing_max_latency inc the trace array ref count
tracing: Increase trace array ref count on enable and filter files
tracefs/eventfs: Use dput to free the toplevel events directory
tracefs/eventfs: Add missing lockdown checks
tracefs: Add missing lockdown check to tracefs_create_dir()
|
|
We have been consistently seeing hangs with generic/648 in our subpage
GitHub CI setup. This is a classic deadlock, we are calling
btrfs_read_folio() on a folio, which requires holding the folio lock on
the folio, and then finding a ordered extent that overlaps that range
and calling btrfs_start_ordered_extent(), which then tries to write out
the dirty page, which requires taking the folio lock and then we
deadlock.
The hang happens because we're writing to range [1271750656, 1271767040),
page index [77621, 77622], and page 77621 is !Uptodate. It is also Dirty,
so we call btrfs_read_folio() for 77621 and which does
btrfs_lock_and_flush_ordered_range() for that range, and we find an ordered
extent which is [1271644160, 1271746560), page index [77615, 77621].
The page indexes overlap, but the actual bytes don't overlap. We're
holding the page lock for 77621, then call
btrfs_lock_and_flush_ordered_range() which tries to flush the dirty
page, and tries to lock 77621 again and then we deadlock.
The byte ranges do not overlap, but with subpage support if we clear
uptodate on any portion of the page we mark the entire thing as not
uptodate.
We have been clearing page uptodate on write errors, but no other file
system does this, and is in fact incorrect. This doesn't hurt us in the
!subpage case because we can't end up with overlapped ranges that don't
also overlap on the page.
Fix this by not clearing uptodate when we have a write error. The only
thing we should be doing in this case is setting the mapping error and
carrying on. This makes it so we would no longer call
btrfs_read_folio() on the page as it's uptodate and eliminates the
deadlock.
With this patch we're now able to make it through a full fstests run on
our subpage blocksize VMs.
Note for stable backports: this probably goes beyond 6.1 but the code
has been cleaned up and clearing the uptodate bit must be verified on
each version independently.
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This was noticed by Miklos that file_remove_privs might call into
notify_change(), which requires to hold an exclusive lock. The problem
exists in FUSE and btrfs. We can fix it without any additional helpers
from VFS, in case the privileges would need to be dropped, change the
lock type to be exclusive and redo the loop.
Fixes: e9adabb9712e ("btrfs: use shared lock for direct writes within EOF")
CC: Miklos Szeredi <miklos@szeredi.hu>
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Remove a number of hidden calls to compound_head() by using a folio
throughout. Also follow core kernel coding style by adding the folio to
the page cache immediately after allocation instead of doing the read
first, then adding it to the page cache. This ordering makes subsequent
readers block waiting for the first reader instead of duplicating the
work only to throw it away when they find out they lost the race.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently, when GETDEVICEINFO returns multiple locations where each
is a different IP but the server's identity is same as MDS, then
nfs4_set_ds_client() finds the existing nfs_client structure which
has the MDS's max_connect value (and if it's 1), then the 1st IP
on the DS's list will get dropped due to MDS trunking rules. Other
IPs would be added as they fall under the pnfs trunking rules.
For the list of IPs the 1st goes thru calling nfs4_set_ds_client()
which will eventually call nfs4_add_trunk() and call into
rpc_clnt_test_and_add_xprt() which has the check for MDS trunking.
The other IPs (after the 1st one), would call rpc_clnt_add_xprt()
which doesn't go thru that check.
nfs4_add_trunk() is called when MDS trunking is happening and it
needs to enforce the usage of max_connect mount option of the
1st mount. However, this shouldn't be applied to pnfs flow.
Instead, this patch proposed to treat MDS=DS as DS trunking and
make sure that MDS's max_connect limit does not apply to the
1st IP returned in the GETDEVICEINFO list. It does so by
marking the newly created client with a new flag NFS_CS_PNFS
which then used to pass max_connect value to use into the
rpc_clnt_test_and_add_xprt() instead of the existing rpc
client's max_connect value set by the MDS connection.
For example, mount was done without max_connect value set
so MDS's rpc client has cl_max_connect=1. Upon calling into
rpc_clnt_test_and_add_xprt() and using rpc client's value,
the caller passes in max_connect value which is previously
been set in the pnfs path (as a part of handling
GETDEVICEINFO list of IPs) in nfs4_set_ds_client().
However, when NFS_CS_PNFS flag is not set and we know we
are doing MDS trunking, comparing a new IP of the same
server, we then set the max_connect value to the
existing MDS's value and pass that into
rpc_clnt_test_and_add_xprt().
Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
This reverts commit 0701214cd6e66585a999b132eb72ae0489beb724.
The premise of this commit was incorrect. There are exactly 2 cases
where rpcauth_checkverf() will return an error:
1) If there was an XDR decode problem (i.e. garbage data).
2) If gss_validate() had a problem verifying the RPCSEC_GSS MIC.
In the second case, there are again 2 subcases:
a) The GSS context expires, in which case gss_validate() will force a
new context negotiation on retry by invalidating the cred.
b) The sequence number check failed because an RPC call timed out, and
the client retransmitted the request using a new sequence number,
as required by RFC2203.
In neither subcase is this a fatal error.
Reported-by: Russell Cattelan <cattelan@thebarn.com>
Fixes: 0701214cd6e6 ("SUNRPC: Fail faster on bad verifier")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
If the server rejects the credential as being stale, or bad, then we
should mark it for revalidation before retransmitting.
Fixes: 7f5667a5f8c4 ("SUNRPC: Clean up rpc_verify_header()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|