summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-09-15panic: Reenable preemption in WARN slowpathLukas Wunner
Commit: 5a5d7e9badd2 ("cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG") amended warn_slowpath_fmt() to disable preemption until the WARN splat has been emitted. However the commit neglected to reenable preemption in the !fmt codepath, i.e. when a WARN splat is emitted without additional format string. One consequence is that users may see more splats than intended. E.g. a WARN splat emitted in a work item results in at least two extra splats: BUG: workqueue leaked lock or atomic (emitted by process_one_work()) BUG: scheduling while atomic (emitted by worker_thread() -> schedule()) Ironically the point of the commit was to *avoid* extra splats. ;) Fix it. Fixes: 5a5d7e9badd2 ("cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/3ec48fde01e4ee6505f77908ba351bad200ae3d1.1694763684.git.lukas@wunner.de
2023-09-15smb3: fix some minor typos and repeated wordsSteve French
Minor cleanup pointed out by checkpatch (repeated words, missing blank lines) in smb2pdu.c and old header location referred to in transport.c Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-15smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPPSteve French
checkpatch flagged a few places with: WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP Also fixed minor typo Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-15ata: pata_parport: Fix code style issuesDamien Le Moal
Fix indentation and other code style issues in the comm.c file. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309150646.n3iBvbPj-lkp@intel.com/ Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-09-15ata: libahci: clear pending interrupt statusSzuying Chen
When a CRC error occurs, the HBA asserts an interrupt to indicate an interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then does error recovery. If the adapter receives another SDB FIS with an error (PxIS.TFES) from the device before the start of the EH recovery process, the interrupt signaling the new SDB cannot be serviced as PxIE was cleared already. This in turn results in the HBA inability to issue any command during the error recovery process after setting PxCMD.ST to 1 because PxIS.TFES is still set. According to AHCI 1.3.1 specifications section 6.2.2, fatal errors notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will cause the HBA to enter the ERR:Fatal state. In this state, the HBA shall not issue any new commands. To avoid this situation, introduce the function ahci_port_clear_pending_irq() to clear pending interrupts before executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2 specification. Signed-off-by: Szuying Chen <Chloe_Chen@asmedia.com.tw> Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset") Cc: stable@vger.kernel.org Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-09-15Merge tag 'drm-misc-fixes-2023-09-14' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * radeon: Uninterruptible fence waiting * tests: Fix use-after-free bug * vkms: Revert hrtimer fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230914122649.GA28252@linux-uq9g
2023-09-15Merge tag 'drm-intel-fixes-2023-09-14' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Only check eDP HPD when AUX CH is shared. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZQL+NqtIZH5F/Nxr@intel.com
2023-09-15Merge tag 'amd-drm-fixes-6.6-2023-09-13' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.6-2023-09-13: amdgpu: - GC 9.4.3 fixes - Fix white screen issues with S/G display on system with >= 64G of ram - Replay fixes - SMU 13.0.6 fixes - AUX backlight fix - NBIO 4.3 SR-IOV fixes for HDP - RAS fixes - DP MST resume fix - Fix segfault on systems with no vbios - DPIA fixes amdkfd: - CWSR grace period fix - Unaligned doorbell fix - CRIU fix for GFX11 - Add missing TLB flush on gfx10 and newer Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230913195009.7714-1-alexander.deucher@amd.com
2023-09-14Merge tag 'nvme-6.6-2023-09-14' of git://git.infradead.org/nvme into block-6.6Jens Axboe
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.6 - nvme-tcp iov len fix (Varun) - nvme-hwmon const qualifier for safety (Krzysztof) - nvme-fc null pointer checks (Nigel) - nvme-pci no numa node fix (Pratyush) - nvme timeout fix for non-compliant controllers (Keith)" * tag 'nvme-6.6-2023-09-14' of git://git.infradead.org/nvme: nvme: avoid bogus CRTO values nvme-pci: do not set the NUMA node of device if it has none nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid() nvme: host: hwmon: constify pointers to hwmon_channel_info nvmet-tcp: pass iov_len instead of sg->length to bvec_set_page()
2023-09-14btrfs: fix race between reading a directory and adding entries to itFilipe Manana
When opening a directory (opendir(3)) or rewinding it (rewinddir(3)), we are not holding the directory's inode locked, and this can result in later attempting to add two entries to the directory with the same index number, resulting in a transaction abort, with -EEXIST (-17), when inserting the second delayed dir index. This results in a trace like the following: Sep 11 22:34:59 myhostname kernel: BTRFS error (device dm-3): err add delayed dir index item(name: cockroach-stderr.log) into the insertion tree of the delayed node(root id: 5, inode id: 4539217, errno: -17) Sep 11 22:34:59 myhostname kernel: ------------[ cut here ]------------ Sep 11 22:34:59 myhostname kernel: kernel BUG at fs/btrfs/delayed-inode.c:1504! Sep 11 22:34:59 myhostname kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI Sep 11 22:34:59 myhostname kernel: CPU: 0 PID: 7159 Comm: cockroach Not tainted 6.4.15-200.fc38.x86_64 #1 Sep 11 22:34:59 myhostname kernel: Hardware name: ASUS ESC500 G3/P9D WS, BIOS 2402 06/27/2018 Sep 11 22:34:59 myhostname kernel: RIP: 0010:btrfs_insert_delayed_dir_index+0x1da/0x260 Sep 11 22:34:59 myhostname kernel: Code: eb dd 48 (...) Sep 11 22:34:59 myhostname kernel: RSP: 0000:ffffa9980e0fbb28 EFLAGS: 00010282 Sep 11 22:34:59 myhostname kernel: RAX: 0000000000000000 RBX: ffff8b10b8f4a3c0 RCX: 0000000000000000 Sep 11 22:34:59 myhostname kernel: RDX: 0000000000000000 RSI: ffff8b177ec21540 RDI: ffff8b177ec21540 Sep 11 22:34:59 myhostname kernel: RBP: ffff8b110cf80888 R08: 0000000000000000 R09: ffffa9980e0fb938 Sep 11 22:34:59 myhostname kernel: R10: 0000000000000003 R11: ffffffff86146508 R12: 0000000000000014 Sep 11 22:34:59 myhostname kernel: R13: ffff8b1131ae5b40 R14: ffff8b10b8f4a418 R15: 00000000ffffffef Sep 11 22:34:59 myhostname kernel: FS: 00007fb14a7fe6c0(0000) GS:ffff8b177ec00000(0000) knlGS:0000000000000000 Sep 11 22:34:59 myhostname kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 11 22:34:59 myhostname kernel: CR2: 000000c00143d000 CR3: 00000001b3b4e002 CR4: 00000000001706f0 Sep 11 22:34:59 myhostname kernel: Call Trace: Sep 11 22:34:59 myhostname kernel: <TASK> Sep 11 22:34:59 myhostname kernel: ? die+0x36/0x90 Sep 11 22:34:59 myhostname kernel: ? do_trap+0xda/0x100 Sep 11 22:34:59 myhostname kernel: ? btrfs_insert_delayed_dir_index+0x1da/0x260 Sep 11 22:34:59 myhostname kernel: ? do_error_trap+0x6a/0x90 Sep 11 22:34:59 myhostname kernel: ? btrfs_insert_delayed_dir_index+0x1da/0x260 Sep 11 22:34:59 myhostname kernel: ? exc_invalid_op+0x50/0x70 Sep 11 22:34:59 myhostname kernel: ? btrfs_insert_delayed_dir_index+0x1da/0x260 Sep 11 22:34:59 myhostname kernel: ? asm_exc_invalid_op+0x1a/0x20 Sep 11 22:34:59 myhostname kernel: ? btrfs_insert_delayed_dir_index+0x1da/0x260 Sep 11 22:34:59 myhostname kernel: ? btrfs_insert_delayed_dir_index+0x1da/0x260 Sep 11 22:34:59 myhostname kernel: btrfs_insert_dir_item+0x200/0x280 Sep 11 22:34:59 myhostname kernel: btrfs_add_link+0xab/0x4f0 Sep 11 22:34:59 myhostname kernel: ? ktime_get_real_ts64+0x47/0xe0 Sep 11 22:34:59 myhostname kernel: btrfs_create_new_inode+0x7cd/0xa80 Sep 11 22:34:59 myhostname kernel: btrfs_symlink+0x190/0x4d0 Sep 11 22:34:59 myhostname kernel: ? schedule+0x5e/0xd0 Sep 11 22:34:59 myhostname kernel: ? __d_lookup+0x7e/0xc0 Sep 11 22:34:59 myhostname kernel: vfs_symlink+0x148/0x1e0 Sep 11 22:34:59 myhostname kernel: do_symlinkat+0x130/0x140 Sep 11 22:34:59 myhostname kernel: __x64_sys_symlinkat+0x3d/0x50 Sep 11 22:34:59 myhostname kernel: do_syscall_64+0x5d/0x90 Sep 11 22:34:59 myhostname kernel: ? syscall_exit_to_user_mode+0x2b/0x40 Sep 11 22:34:59 myhostname kernel: ? do_syscall_64+0x6c/0x90 Sep 11 22:34:59 myhostname kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc The race leading to the problem happens like this: 1) Directory inode X is loaded into memory, its ->index_cnt field is initialized to (u64)-1 (at btrfs_alloc_inode()); 2) Task A is adding a new file to directory X, holding its vfs inode lock, and calls btrfs_set_inode_index() to get an index number for the entry. Because the inode's index_cnt field is set to (u64)-1 it calls btrfs_inode_delayed_dir_index_count() which fails because no dir index entries were added yet to the delayed inode and then it calls btrfs_set_inode_index_count(). This functions finds the last dir index key and then sets index_cnt to that index value + 1. It found that the last index key has an offset of 100. However before it assigns a value of 101 to index_cnt... 3) Task B calls opendir(3), ending up at btrfs_opendir(), where the VFS lock for inode X is not taken, so it calls btrfs_get_dir_last_index() and sees index_cnt still with a value of (u64)-1. Because of that it calls btrfs_inode_delayed_dir_index_count() which fails since no dir index entries were added to the delayed inode yet, and then it also calls btrfs_set_inode_index_count(). This also finds that the last index key has an offset of 100, and before it assigns the value 101 to the index_cnt field of inode X... 4) Task A assigns a value of 101 to index_cnt. And then the code flow goes to btrfs_set_inode_index() where it increments index_cnt from 101 to 102. Task A then creates a delayed dir index entry with a sequence number of 101 and adds it to the delayed inode; 5) Task B assigns 101 to the index_cnt field of inode X; 6) At some later point when someone tries to add a new entry to the directory, btrfs_set_inode_index() will return 101 again and shortly after an attempt to add another delayed dir index key with index number 101 will fail with -EEXIST resulting in a transaction abort. Fix this by locking the inode at btrfs_get_dir_last_index(), which is only only used when opening a directory or attempting to lseek on it. Reported-by: ken <ken@bllue.org> Link: https://lore.kernel.org/linux-btrfs/CAE6xmH+Lp=Q=E61bU+v9eWX8gYfLvu6jLYxjxjFpo3zHVPR0EQ@mail.gmail.com/ Reported-by: syzbot+d13490c82ad5353c779d@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/ Fixes: 9b378f6ad48c ("btrfs: fix infinite directory reads") CC: stable@vger.kernel.org # 6.5+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-14btrfs: refresh dir last index during a rewinddir(3) callFilipe Manana
When opening a directory we find what's the index of its last entry and then store it in the directory's file handle private data (struct btrfs_file_private::last_index), so that in the case new directory entries are added to a directory after an opendir(3) call we don't end up in an infinite loop (see commit 9b378f6ad48c ("btrfs: fix infinite directory reads")) when calling readdir(3). However once rewinddir(3) is called, POSIX states [1] that any new directory entries added after the previous opendir(3) call, must be returned by subsequent calls to readdir(3): "The rewinddir() function shall reset the position of the directory stream to which dirp refers to the beginning of the directory. It shall also cause the directory stream to refer to the current state of the corresponding directory, as a call to opendir() would have done." We currently don't refresh the last_index field of the struct btrfs_file_private associated to the directory, so after a rewinddir(3) we are not returning any new entries added after the opendir(3) call. Fix this by finding the current last index of the directory when llseek is called against the directory. This can be reproduced by the following C program provided by Ian Johnson: #include <dirent.h> #include <stdio.h> int main(void) { DIR *dir = opendir("test"); FILE *file; file = fopen("test/1", "w"); fwrite("1", 1, 1, file); fclose(file); file = fopen("test/2", "w"); fwrite("2", 1, 1, file); fclose(file); rewinddir(dir); struct dirent *entry; while ((entry = readdir(dir))) { printf("%s\n", entry->d_name); } closedir(dir); return 0; } Reported-by: Ian Johnson <ian@ianjohnson.dev> Link: https://lore.kernel.org/linux-btrfs/YR1P0S.NGASEG570GJ8@ianjohnson.dev/ Fixes: 9b378f6ad48c ("btrfs: fix infinite directory reads") CC: stable@vger.kernel.org # 6.5+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-14btrfs: set last dir index to the current last index when opening dirFilipe Manana
When opening a directory for reading it, we set the last index where we stop iteration to the value in struct btrfs_inode::index_cnt. That value does not match the index of the most recently added directory entry but it's instead the index number that will be assigned the next directory entry. This means that if after the call to opendir(3) new directory entries are added, a readdir(3) call will return the first new directory entry. This is fine because POSIX says the following [1]: "If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified." For example for the test script from commit 9b378f6ad48c ("btrfs: fix infinite directory reads"), where we have 2000 files in a directory, ext4 doesn't return any new directory entry after opendir(3), while xfs returns the first 13 new directory entries added after the opendir(3) call. If we move to a shorter example with an empty directory when opendir(3) is called, and 2 files added to the directory after the opendir(3) call, then readdir(3) on btrfs will return the first file, ext4 and xfs return the 2 files (but in a different order). A test program for this, reported by Ian Johnson, is the following: #include <dirent.h> #include <stdio.h> int main(void) { DIR *dir = opendir("test"); FILE *file; file = fopen("test/1", "w"); fwrite("1", 1, 1, file); fclose(file); file = fopen("test/2", "w"); fwrite("2", 1, 1, file); fclose(file); struct dirent *entry; while ((entry = readdir(dir))) { printf("%s\n", entry->d_name); } closedir(dir); return 0; } To make this less odd, change the behaviour to never return new entries that were added after the opendir(3) call. This is done by setting the last_index field of the struct btrfs_file_private attached to the directory's file handle with a value matching btrfs_inode::index_cnt minus 1, since that value always matches the index of the next new directory entry and not the index of the most recently added entry. [1] https://pubs.opengroup.org/onlinepubs/007904875/functions/readdir_r.html Link: https://lore.kernel.org/linux-btrfs/YR1P0S.NGASEG570GJ8@ianjohnson.dev/ CC: stable@vger.kernel.org # 6.5+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-14nvme: avoid bogus CRTO valuesKeith Busch
Some devices are reporting controller ready mode support, but return 0 for CRTO. These devices require a much higher time to ready than that, so they are failing to initialize after the driver starter preferring that value over CAP.TO. The spec requires that CAP.TO match the appropritate CRTO value, or be set to 0xff if CRTO is larger than that. This means that CAP.TO can be used to validate if CRTO is reliable, and provides an appropriate fallback for setting the timeout value if not. Use whichever is larger. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863 Reported-by: Cláudio Sampaio <patola@gmail.com> Reported-by: Felix Yan <felixonmars@archlinux.org> Tested-by: Felix Yan <felixonmars@archlinux.org> Based-on-a-patch-by: Felix Yan <felixonmars@archlinux.org> Cc: stable@vger.kernel.org Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-09-14thermal: core: Fix disabled trip point check in handle_thermal_trip()Rafael J. Wysocki
Commit bc840ea5f9a9 ("thermal: core: Do not handle trip points with invalid temperature") added a check for invalid temperature to the disabled trip point check in handle_thermal_trip(), but that check was added at a point when the trip structure has not been initialized yet. This may cause handle_thermal_trip() to skip a valid trip point in some cases, so fix it by moving the check to a suitable place, after __thermal_zone_get_trip() has been called to populate the trip structure. Fixes: bc840ea5f9a9 ("thermal: core: Do not handle trip points with invalid temperature") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-14Merge tag 'md-fixes-20230914' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.6 Pull MD fixes from Song: "These commits fix a bugzilla report [1] and some recent issues in 6.5 and 6.6. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217798" * tag 'md-fixes-20230914' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: Put the right device in md_seq_next md/raid1: fix error: ISO C90 forbids mixed declarations md: fix warning for holder mismatch from export_rdev() md: don't dereference mddev after export_rdev()
2023-09-15kbuild: avoid long argument lists in make modules_installMichal Kubecek
Running "make modules_install" may fail with make[2]: execvp: /bin/sh: Argument list too long if many modules are built and INSTALL_MOD_PATH is long. This is because scripts/Makefile.modinst creates all directories with one mkdir command. Use $(foreach ...) instead to prevent an excessive argument list. Fixes: 2dfec887c0fd ("kbuild: reduce the number of mkdir calls during modules_install") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-09-15kbuild: fix kernel-devel RPM package and linux-headers Deb packageMasahiro Yamada
Since commit fe66b5d2ae72 ("kbuild: refactor kernel-devel RPM package and linux-headers Deb package"), the kernel-devel RPM package and linux-headers Deb package are broken. I double-quoted the $(find ... -type d), which resulted in newlines being included in the argument to the outer find comment. find: 'arch/arm64/include\narch/arm64/kvm/hyp/include': No such file or directory The outer find command is unneeded. Fixes: fe66b5d2ae72 ("kbuild: refactor kernel-devel RPM package and linux-headers Deb package") Reported-by: Karolis M <k4rolis@protonmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de>
2023-09-14md: Put the right device in md_seq_nextMariusz Tkaczyk
If there are multiple arrays in system and one mddevice is marked with MD_DELETED and md_seq_next() is called in the middle of removal then it _get()s proper device but it may _put() deleted one. As a result, active counter may never be zeroed for mddevice and it cannot be removed. Put the device which has been _get with previous md_seq_next() call. Cc: stable@vger.kernel.org Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk is freed") Reported-by: AceLan Kao <acelan@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217798 Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230914152416.10819-1-mariusz.tkaczyk@linux.intel.com
2023-09-14Merge tag 'net-6.6-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Quite unusually, this does not contains any fix coming from subtrees (nf, ebpf, wifi, etc). Current release - regressions: - bcmasp: fix possible OOB write in bcmasp_netfilt_get_all_active() Previous releases - regressions: - ipv4: fix one memleak in __inet_del_ifa() - tcp: fix bind() regressions for v4-mapped-v6 addresses. - tls: do not free tls_rec on async operation in bpf_exec_tx_verdict() - dsa: fixes for SJA1105 FDB regressions - veth: update XDP feature set when bringing up device - igb: fix hangup when enabling SR-IOV Previous releases - always broken: - kcm: fix memory leak in error path of kcm_sendmsg() - smc: fix data corruption in smcr_port_add - microchip: fix possible memory leak for vcap_dup_rule()" * tag 'net-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits) kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg(). net: renesas: rswitch: Add spin lock protection for irq {un}mask net: renesas: rswitch: Fix unmasking irq condition igb: clean up in all error paths when enabling SR-IOV ixgbe: fix timestamp configuration code selftest: tcp: Add v4-mapped-v6 cases in bind_wildcard.c. selftest: tcp: Move expected_errno into each test case in bind_wildcard.c. selftest: tcp: Fix address length in bind_wildcard.c. tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address. tcp: Fix bind() regression for v4-mapped-v6 wildcard address. tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any). ipv6: fix ip6_sock_set_addr_preferences() typo veth: Update XDP feature set when bringing up device net: macb: fix sleep inside spinlock net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict() net: ethernet: mtk_eth_soc: fix pse_port configuration for MT7988 net: ethernet: mtk_eth_soc: fix uninitialized variable kcm: Fix memory leak in error path of kcm_sendmsg() r8152: check budget for r8152_poll() net: dsa: sja1105: block FDB accesses that are concurrent with a switch reset ...
2023-09-14io_uring/net: fix iter retargeting for selected bufPavel Begunkov
When using selected buffer feature, io_uring delays data iter setup until later. If io_setup_async_msg() is called before that it might see not correctly setup iterator. Pre-init nr_segs and judge from its state whether we repointing. Cc: stable@vger.kernel.org Reported-by: syzbot+a4c6e5ef999b68b26ed1@syzkaller.appspotmail.com Fixes: 0455d4ccec548 ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0000000000002770be06053c7757@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-14ext4: fix rec_len verify errorShida Zhang
With the configuration PAGE_SIZE 64k and filesystem blocksize 64k, a problem occurred when more than 13 million files were directly created under a directory: EXT4-fs error (device xx): ext4_dx_csum_set:492: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D. EXT4-fs error (device xx): ext4_dx_csum_verify:463: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D. EXT4-fs error (device xx): dx_probe:856: inode #xxxx: block 8188: comm xxxxx: Directory index failed checksum When enough files are created, the fake_dirent->reclen will be 0xffff. it doesn't equal to the blocksize 65536, i.e. 0x10000. But it is not the same condition when blocksize equals to 4k. when enough files are created, the fake_dirent->reclen will be 0x1000. it equals to the blocksize 4k, i.e. 0x1000. The problem seems to be related to the limitation of the 16-bit field when the blocksize is set to 64k. To address this, helpers like ext4_rec_len_{from,to}_disk has already been introduced to complete the conversion between the encoded and the plain form of rec_len. So fix this one by using the helper, and all the other in this file too. Cc: stable@kernel.org Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Suggested-by: Andreas Dilger <adilger@dilger.ca> Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20230803060938.1929759-1-zhangshida@kylinos.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-09-14ext4: do not let fstrim block system suspendJan Kara
Len Brown has reported that system suspend sometimes fail due to inability to freeze a task working in ext4_trim_fs() for one minute. Trimming a large filesystem on a disk that slowly processes discard requests can indeed take a long time. Since discard is just an advisory call, it is perfectly fine to interrupt it at any time and the return number of discarded blocks until that moment. Do that when we detect the task is being frozen. Cc: stable@kernel.org Reported-by: Len Brown <lenb@kernel.org> Suggested-by: Dave Chinner <david@fromorbit.com> References: https://bugzilla.kernel.org/show_bug.cgi?id=216322 Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-09-14ext4: move setting of trimmed bit into ext4_try_to_trim_range()Jan Kara
Currently we set the group's trimmed bit in ext4_trim_all_free() based on return value of ext4_try_to_trim_range(). However when we will want to abort trimming because of suspend attempt, we want to return success from ext4_try_to_trim_range() but not set the trimmed bit. Instead implementing awkward propagation of this information, just move setting of trimmed bit into ext4_try_to_trim_range() when the whole group is trimmed. Cc: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-09-14jbd2: Fix memory leak in journal_init_common()Li Zetao
There is a memory leak reported by kmemleak: unreferenced object 0xff11000105903b80 (size 64): comm "mount", pid 3382, jiffies 4295032021 (age 27.826s) hex dump (first 32 bytes): 04 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffae86ac40>] __kmalloc_node+0x50/0x160 [<ffffffffaf2486d8>] crypto_alloc_tfmmem.isra.0+0x38/0x110 [<ffffffffaf2498e5>] crypto_create_tfm_node+0x85/0x2f0 [<ffffffffaf24a92c>] crypto_alloc_tfm_node+0xfc/0x210 [<ffffffffaedde777>] journal_init_common+0x727/0x1ad0 [<ffffffffaede1715>] jbd2_journal_init_inode+0x2b5/0x500 [<ffffffffaed786b5>] ext4_load_and_init_journal+0x255/0x2440 [<ffffffffaed8b423>] ext4_fill_super+0x8823/0xa330 ... The root cause was traced to an error handing path in journal_init_common() when malloc memory failed in register_shrinker(). The checksum driver is used to reference to checksum algorithm via cryptoapi and the user should release the memory when the driver is no longer needed or the journal initialization failed. Fix it by calling crypto_free_shash() on the "err_cleanup" error handing path in journal_init_common(). Fixes: c30713084ba5 ("jbd2: move load_superblock() into journal_init_common()") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230911025138.983101-1-lizetao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-09-14dm: fix a race condition in retrieve_depsMikulas Patocka
There's a race condition in the multipath target when retrieve_deps races with multipath_message calling dm_get_device and dm_put_device. retrieve_deps walks the list of open devices without holding any lock but multipath may add or remove devices to the list while it is running. The end result may be memory corruption or use-after-free memory access. See this description of a UAF with multipath_message(): https://listman.redhat.com/archives/dm-devel/2022-October/052373.html Fix this bug by introducing a new rw semaphore "devices_lock". We grab devices_lock for read in retrieve_deps and we grab it for write in dm_get_device and dm_put_device. Reported-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Tested-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-09-14Merge tag 'drm-misc-fixes-2023-09-07' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes One doc fix for drm/connector, one fix for amdgpu for an crash when VRAM usage is high, and one fix in gm12u320 to fix the timeout units in the code Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/w5nlld5ukeh6bgtljsxmkex3e7s7f4qquuqkv5lv4cv3uxzwqr@pgokpejfsyef
2023-09-14drm/tests: helpers: Avoid a driver uafThomas Hellström
when using __drm_kunit_helper_alloc_drm_device() the driver may be dereferenced by device-managed resources up until the device is freed, which is typically later than the kunit-managed resource code frees it. Fix this by simply make the driver device-managed as well. In short, the sequence leading to the UAF is as follows: INIT: Code allocates a struct device as a kunit-managed resource. Code allocates a drm driver as a kunit-managed resource. Code allocates a drm device as a device-managed resource. EXIT: Kunit resource cleanup frees the drm driver Kunit resource cleanup puts the struct device, which starts a device-managed resource cleanup device-managed cleanup calls drm_dev_put() drm_dev_put() dereferences the (now freed) drm driver -> Boom. Related KASAN message: [55272.551542] ================================================================== [55272.551551] BUG: KASAN: slab-use-after-free in drm_dev_put.part.0+0xd4/0xe0 [drm] [55272.551603] Read of size 8 at addr ffff888127502828 by task kunit_try_catch/10353 [55272.551612] CPU: 4 PID: 10353 Comm: kunit_try_catch Tainted: G U N 6.5.0-rc7+ #155 [55272.551620] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [55272.551626] Call Trace: [55272.551629] <TASK> [55272.551633] dump_stack_lvl+0x57/0x90 [55272.551639] print_report+0xcf/0x630 [55272.551645] ? _raw_spin_lock_irqsave+0x5f/0x70 [55272.551652] ? drm_dev_put.part.0+0xd4/0xe0 [drm] [55272.551694] kasan_report+0xd7/0x110 [55272.551699] ? drm_dev_put.part.0+0xd4/0xe0 [drm] [55272.551742] drm_dev_put.part.0+0xd4/0xe0 [drm] [55272.551783] devres_release_all+0x15d/0x1f0 [55272.551790] ? __pfx_devres_release_all+0x10/0x10 [55272.551797] device_unbind_cleanup+0x16/0x1a0 [55272.551802] device_release_driver_internal+0x3e5/0x540 [55272.551808] ? kobject_put+0x5d/0x4b0 [55272.551814] bus_remove_device+0x1f1/0x3f0 [55272.551819] device_del+0x342/0x910 [55272.551826] ? __pfx_device_del+0x10/0x10 [55272.551830] ? lock_release+0x339/0x5e0 [55272.551836] ? kunit_remove_resource+0x128/0x290 [kunit] [55272.551845] ? __pfx_lock_release+0x10/0x10 [55272.551851] platform_device_del.part.0+0x1f/0x1e0 [55272.551856] ? _raw_spin_unlock_irqrestore+0x30/0x60 [55272.551863] kunit_remove_resource+0x195/0x290 [kunit] [55272.551871] ? _raw_spin_unlock_irqrestore+0x30/0x60 [55272.551877] kunit_cleanup+0x78/0x120 [kunit] [55272.551885] ? __kthread_parkme+0xc1/0x1f0 [55272.551891] ? __pfx_kunit_try_run_case_cleanup+0x10/0x10 [kunit] [55272.551900] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10 [kunit] [55272.551909] kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit] [55272.551919] kthread+0x2e7/0x3c0 [55272.551924] ? __pfx_kthread+0x10/0x10 [55272.551929] ret_from_fork+0x2d/0x70 [55272.551935] ? __pfx_kthread+0x10/0x10 [55272.551940] ret_from_fork_asm+0x1b/0x30 [55272.551948] </TASK> [55272.551953] Allocated by task 10351: [55272.551956] kasan_save_stack+0x1c/0x40 [55272.551962] kasan_set_track+0x21/0x30 [55272.551966] __kasan_kmalloc+0x8b/0x90 [55272.551970] __kmalloc+0x5e/0x160 [55272.551976] kunit_kmalloc_array+0x1c/0x50 [kunit] [55272.551984] drm_exec_test_init+0xfa/0x2c0 [drm_exec_test] [55272.551991] kunit_try_run_case+0xdd/0x250 [kunit] [55272.551999] kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit] [55272.552008] kthread+0x2e7/0x3c0 [55272.552012] ret_from_fork+0x2d/0x70 [55272.552017] ret_from_fork_asm+0x1b/0x30 [55272.552024] Freed by task 10353: [55272.552027] kasan_save_stack+0x1c/0x40 [55272.552032] kasan_set_track+0x21/0x30 [55272.552036] kasan_save_free_info+0x27/0x40 [55272.552041] __kasan_slab_free+0x106/0x180 [55272.552046] slab_free_freelist_hook+0xb3/0x160 [55272.552051] __kmem_cache_free+0xb2/0x290 [55272.552056] kunit_remove_resource+0x195/0x290 [kunit] [55272.552064] kunit_cleanup+0x78/0x120 [kunit] [55272.552072] kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit] [55272.552080] kthread+0x2e7/0x3c0 [55272.552085] ret_from_fork+0x2d/0x70 [55272.552089] ret_from_fork_asm+0x1b/0x30 [55272.552096] The buggy address belongs to the object at ffff888127502800 which belongs to the cache kmalloc-512 of size 512 [55272.552105] The buggy address is located 40 bytes inside of freed 512-byte region [ffff888127502800, ffff888127502a00) [55272.552115] The buggy address belongs to the physical page: [55272.552119] page:00000000af6c70ff refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x127500 [55272.552127] head:00000000af6c70ff order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [55272.552133] anon flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) [55272.552141] page_type: 0xffffffff() [55272.552145] raw: 0017ffffc0010200 ffff888100042c80 0000000000000000 dead000000000001 [55272.552152] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [55272.552157] page dumped because: kasan: bad access detected [55272.552163] Memory state around the buggy address: [55272.552167] ffff888127502700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [55272.552173] ffff888127502780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [55272.552178] >ffff888127502800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [55272.552184] ^ [55272.552187] ffff888127502880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [55272.552193] ffff888127502900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [55272.552198] ================================================================== [55272.552203] Disabling lock debugging due to kernel taint v2: - Update commit message, add Fixes: tag and Cc stable. v3: - Further commit message updates (Maxime Ripard). Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v6.3+ Fixes: d98780310719 ("drm/tests: helpers: Allow to pass a custom drm_driver") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20230907135339.7971-2-thomas.hellstrom@linux.intel.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2023-09-14Revert "drm/vkms: Fix race-condition between the hrtimer and the atomic commit"Maíra Canal
This reverts commit a0e6a017ab56936c0405fe914a793b241ed25ee0. Unlocking a mutex in the context of a hrtimer callback is violating mutex locking rules, as mutex_unlock() from interrupt context is not permitted. Link: https://lore.kernel.org/dri-devel/ZQLAc%2FFwkv%2FGiVoK@phenom.ffwll.local/T/#t Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Maíra Canal <mairacanal@riseup.net> Link: https://patchwork.freedesktop.org/patch/msgid/20230914102024.1789154-1-mcanal@igalia.com
2023-09-14kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().Kuniyuki Iwashima
syzkaller found a memory leak in kcm_sendmsg(), and commit c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by updating kcm_tx_msg(head)->last_skb if partial data is copied so that the following sendmsg() will resume from the skb. However, we cannot know how many bytes were copied when we get the error. Thus, we could mess up the MSG_MORE queue. When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we do so for UDP by udp_flush_pending_frames(). Even without this change, when the error occurred, the following sendmsg() resumed from a wrong skb and the queue was messed up. However, we have yet to get such a report, and only syzkaller stumbled on it. So, this can be changed safely. Note this does not change SOCK_SEQPACKET behaviour. Fixes: c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()") Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230912022753.33327-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14Merge branch 'net-renesas-rswitch-fix-a-lot-of-redundant-irq-issue'Paolo Abeni
Yoshihiro Shimoda says: ==================== net: renesas: rswitch: Fix a lot of redundant irq issue After this patch series was applied, a lot of redundant interrupts no longer occur. For example: when "iperf3 -c <ipaddr> -R" on R-Car S4-8 Spider Before the patches are applied: about 800,000 times happened After the patches were applied: about 100,000 times happened ==================== Link: https://lore.kernel.org/r/20230912014936.3175430-1-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14net: renesas: rswitch: Add spin lock protection for irq {un}maskYoshihiro Shimoda
Add spin lock protection for irq {un}mask registers' control. After napi_complete_done() and this protection were applied, a lot of redundant interrupts no longer occur. For example: when "iperf3 -c <ipaddr> -R" on R-Car S4-8 Spider Before the patches are applied: about 800,000 times happened After the patches were applied: about 100,000 times happened Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14net: renesas: rswitch: Fix unmasking irq conditionYoshihiro Shimoda
Fix unmasking irq condition by using napi_complete_done(). Otherwise, redundant interrupts happen. Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-13scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rportsJustin Tee
During rmmod, when dev_loss_tmo callback is called, an ndlp kref count is decremented twice. Once for SCSI transport registration and second to remove the initial node allocation kref. If there is also an NVMe transport registration, another reference count decrement is expected in lpfc_nvme_unregister_port(). Race conditions between the NVMe transport remoteport_delete and dev_loss_tmo callbacks sometimes results in premature ndlp object release resulting in use-after-free issues. Fix by not dropping the ndlp object in dev_loss_tmo callback with an outstanding NVMe transport registration. Inversely, mark the final NLP_DROPPED flag in lpfc_nvme_unregister_port when rmmod flag is set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230908211923.37603-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmoJustin Tee
When a dev_loss_tmo event occurs, an ndlp lock is taken before checking nlp_flag for NLP_DROPPED. There is an attempt to restore the ndlp lock when exiting the if statement, but the nlp_put kref could be the final decrement causing a use-after-free memory access on a released ndlp object. Instead of trying to reacquire the ndlp lock after checking nlp_flag, just return after calling nlp_put. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230908211852.37576-1-justintee8345@gmail.com Reviewed-by: "Ewan D. Milne" <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()Jinjie Ruan
Since debugfs_create_file() returns ERR_PTR and never NULL, use IS_ERR() to check the return value. Fixes: 2fcbc569b9f5 ("scsi: lpfc: Make debugfs ktime stats generic for NVME and SCSI") Fixes: 4c47efc140fa ("scsi: lpfc: Move SCSI and NVME Stats to hardware queue structures") Fixes: 6a828b0f6192 ("scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues") Fixes: 95bfc6d8ad86 ("scsi: lpfc: Make FW logging dynamically configurable") Fixes: 9f77870870d8 ("scsi: lpfc: Add debugfs support for cm framework buffers") Fixes: c490850a0947 ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20230906030809.2847970-1-ruanjinjie@huawei.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13scsi: target: core: Fix target_cmd_counter leakDavid Disseldorp
The target_cmd_counter struct allocated via target_alloc_cmd_counter() is never freed, resulting in leaks across various transport types, e.g.: unreferenced object 0xffff88801f920120 (size 96): comm "sh", pid 102, jiffies 4294892535 (age 713.412s) hex dump (first 32 bytes): 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 38 01 92 1f 80 88 ff ff ........8....... backtrace: [<00000000e58a6252>] kmalloc_trace+0x11/0x20 [<0000000043af4b2f>] target_alloc_cmd_counter+0x17/0x90 [target_core_mod] [<000000007da2dfa7>] target_setup_session+0x2d/0x140 [target_core_mod] [<0000000068feef86>] tcm_loop_tpg_nexus_store+0x19b/0x350 [tcm_loop] [<000000006a80e021>] configfs_write_iter+0xb1/0x120 [<00000000e9f4d860>] vfs_write+0x2e4/0x3c0 [<000000008143433b>] ksys_write+0x80/0xb0 [<00000000a7df29b2>] do_syscall_64+0x42/0x90 [<0000000053f45fb8>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Free the structure alongside the corresponding iscsit_conn / se_sess parent. Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20230831183459.6938-1-ddiss@suse.de Fixes: becd9be6069e ("scsi: target: Move sess cmd counter to new struct") Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13scsi: pm8001: Setup IRQs on resumeDamien Le Moal
The function pm8001_pci_resume() only calls pm8001_request_irq() without calling pm8001_setup_irq(). This causes the IRQ allocation to fail, which leads all drives being removed from the system. Fix this issue by integrating the code for pm8001_setup_irq() directly inside pm8001_request_irq() so that MSI-X setup is performed both during normal initialization and resume operations. Fixes: dbf9bfe61571 ("[SCSI] pm8001: add SAS/SATA HBA driver") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230911232745.325149-2-dlemoal@kernel.org Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13scsi: pm80xx: Avoid leaking tags when processing ↵Michal Grzedzicki
OPC_INB_SET_CONTROLLER_CONFIG command Tags allocated for OPC_INB_SET_CONTROLLER_CONFIG command need to be freed when we receive the response. Signed-off-by: Michal Grzedzicki <mge@meta.com> Link: https://lore.kernel.org/r/20230911170340.699533-2-mge@meta.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13scsi: pm80xx: Use phy-specific SAS address when sending PHY_START commandMichal Grzedzicki
Some cards have more than one SAS address. Using an incorrect address causes communication issues with some devices like expanders. Closes: https://lore.kernel.org/linux-kernel/A57AEA84-5CA0-403E-8053-106033C73C70@fb.com/ Signed-off-by: Michal Grzedzicki <mge@meta.com> Link: https://lore.kernel.org/r/20230913155611.3183612-1-mge@meta.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13Merge branch '6.6/scsi-staging' into 6.6/scsi-fixesMartin K. Petersen
Pull in staged fixes for 6.6. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13Merge tag 'pmdomain-v6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull genpm / pmdomain rename from Ulf Hansson: "This renames the genpd subsystem to pmdomain. As discussed on LKML, using 'genpd' as the name of a subsystem isn't very self-explanatory and the acronym itself that means Generic PM Domain, is known only by a limited group of people. The suggestion to improve the situation is to rename the subsystem to 'pmdomain', which there seems to be a good consensus around using. Ideally it should indicate that its purpose is to manage Power Domains or 'PM domains' as we often also use within the Linux Kernel terminology" * tag 'pmdomain-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: Rename the genpd subsystem to pmdomain
2023-09-13Merge tag 'tpmdd-v6.6-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fix from Jarkko Sakkinen. * tag 'tpmdd-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: Fix typo in tpmrm class definition
2023-09-13Merge tag 'parisc-for-6.6-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fixes from Helge Deller: - fix reference to exported symbols for parisc64 [Masahiro Yamada] - Block-TLB (BTLB) support on 32-bit CPUs - sparse and build-warning fixes * tag 'parisc-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: linux/export: fix reference to exported functions for parisc64 parisc: BTLB: Initialize BTLB tables at CPU startup parisc: firmware: Simplify calling non-PA20 functions parisc: BTLB: _edata symbol has to be page aligned for BTLB support parisc: BTLB: Add BTLB insert and purge firmware function wrappers parisc: BTLB: Clear possibly existing BTLB entries parisc: Prepare for Block-TLB support on 32-bit kernel parisc: shmparam.h: Document aliasing requirements of PA-RISC parisc: irq: Make irq_stack_union static to avoid sparse warning parisc: drivers: Fix sparse warning parisc: iosapic.c: Fix sparse warnings parisc: ccio-dma: Fix sparse warnings parisc: sba-iommu: Fix sparse warnigs parisc: sba: Fix compile warning wrt list of SBA devices parisc: sba_iommu: Fix build warning if procfs if disabled
2023-09-13Merge tag 'trace-v6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Add missing LOCKDOWN checks for eventfs callers When LOCKDOWN is active for tracing, it causes inconsistent state when some functions succeed and others fail. - Use dput() to free the top level eventfs descriptor There was a race between accesses and freeing it. - Fix a long standing bug that eventfs exposed due to changing timings by dynamically creating files. That is, If a event file is opened for an instance, there's nothing preventing the instance from being removed which will make accessing the files cause use-after-free bugs. - Fix a ring buffer race that happens when iterating over the ring buffer while writers are active. Check to make sure not to read the event meta data if it's beyond the end of the ring buffer sub buffer. - Fix the print trigger that disappeared because the test to create it was looking for the event dir field being filled, but now it has the "ef" field filled for the eventfs structure. - Remove the unused "dir" field from the event structure. - Fix the order of the trace_dynamic_info as it had it backwards for the offset and len fields for which one was for which endianess. - Fix NULL pointer dereference with eventfs_remove_rec() If an allocation fails in one of the eventfs_add_*() functions, the caller of it in event_subsystem_dir() or event_create_dir() assigns the result to the structure. But it's assigning the ERR_PTR and not NULL. This was passed to eventfs_remove_rec() which expects either a good pointer or a NULL, not ERR_PTR. The fix is to not assign the ERR_PTR to the structure, but to keep it NULL on error. - Fix list_for_each_rcu() to use list_for_each_srcu() in dcache_dir_open_wrapper(). One iteration of the code used RCU but because it had to call sleepable code, it had to be changed to use SRCU, but one of the iterations was missed. - Fix synthetic event print function to use "as_u64" instead of passing in a pointer to the union. To fix big/little endian issues, the u64 that represented several types was turned into a union to define the types properly. * tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec() tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper() tracing/synthetic: Print out u64 values properly tracing/synthetic: Fix order of struct trace_dynamic_info selftests/ftrace: Fix dependencies for some of the synthetic event tests tracing: Remove unused trace_event_file dir field tracing: Use the new eventfs descriptor for print trigger ring-buffer: Do not attempt to read past "commit" tracefs/eventfs: Free top level files on removal ring-buffer: Avoid softlockup in ring_buffer_resize() tracing: Have event inject files inc the trace array ref count tracing: Have option files inc the trace array ref count tracing: Have current_trace inc the trace array ref count tracing: Have tracing_max_latency inc the trace array ref count tracing: Increase trace array ref count on enable and filter files tracefs/eventfs: Use dput to free the toplevel events directory tracefs/eventfs: Add missing lockdown checks tracefs: Add missing lockdown check to tracefs_create_dir()
2023-09-13btrfs: don't clear uptodate on write errorsJosef Bacik
We have been consistently seeing hangs with generic/648 in our subpage GitHub CI setup. This is a classic deadlock, we are calling btrfs_read_folio() on a folio, which requires holding the folio lock on the folio, and then finding a ordered extent that overlaps that range and calling btrfs_start_ordered_extent(), which then tries to write out the dirty page, which requires taking the folio lock and then we deadlock. The hang happens because we're writing to range [1271750656, 1271767040), page index [77621, 77622], and page 77621 is !Uptodate. It is also Dirty, so we call btrfs_read_folio() for 77621 and which does btrfs_lock_and_flush_ordered_range() for that range, and we find an ordered extent which is [1271644160, 1271746560), page index [77615, 77621]. The page indexes overlap, but the actual bytes don't overlap. We're holding the page lock for 77621, then call btrfs_lock_and_flush_ordered_range() which tries to flush the dirty page, and tries to lock 77621 again and then we deadlock. The byte ranges do not overlap, but with subpage support if we clear uptodate on any portion of the page we mark the entire thing as not uptodate. We have been clearing page uptodate on write errors, but no other file system does this, and is in fact incorrect. This doesn't hurt us in the !subpage case because we can't end up with overlapped ranges that don't also overlap on the page. Fix this by not clearing uptodate when we have a write error. The only thing we should be doing in this case is setting the mapping error and carrying on. This makes it so we would no longer call btrfs_read_folio() on the page as it's uptodate and eliminates the deadlock. With this patch we're now able to make it through a full fstests run on our subpage blocksize VMs. Note for stable backports: this probably goes beyond 6.1 but the code has been cleaned up and clearing the uptodate bit must be verified on each version independently. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-13btrfs: file_remove_privs needs an exclusive lock in direct io writeBernd Schubert
This was noticed by Miklos that file_remove_privs might call into notify_change(), which requires to hold an exclusive lock. The problem exists in FUSE and btrfs. We can fix it without any additional helpers from VFS, in case the privileges would need to be dropped, change the lock type to be exclusive and redo the loop. Fixes: e9adabb9712e ("btrfs: use shared lock for direct writes within EOF") CC: Miklos Szeredi <miklos@szeredi.hu> CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-13btrfs: convert btrfs_read_merkle_tree_page() to use a folioMatthew Wilcox (Oracle)
Remove a number of hidden calls to compound_head() by using a folio throughout. Also follow core kernel coding style by adding the folio to the page cache immediately after allocation instead of doing the read first, then adding it to the page cache. This ordering makes subsequent readers block waiting for the first reader instead of duplicating the work only to throw it away when they find out they lost the race. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-13NFSv4.1: fix pnfs MDS=DS session trunkingOlga Kornievskaia
Currently, when GETDEVICEINFO returns multiple locations where each is a different IP but the server's identity is same as MDS, then nfs4_set_ds_client() finds the existing nfs_client structure which has the MDS's max_connect value (and if it's 1), then the 1st IP on the DS's list will get dropped due to MDS trunking rules. Other IPs would be added as they fall under the pnfs trunking rules. For the list of IPs the 1st goes thru calling nfs4_set_ds_client() which will eventually call nfs4_add_trunk() and call into rpc_clnt_test_and_add_xprt() which has the check for MDS trunking. The other IPs (after the 1st one), would call rpc_clnt_add_xprt() which doesn't go thru that check. nfs4_add_trunk() is called when MDS trunking is happening and it needs to enforce the usage of max_connect mount option of the 1st mount. However, this shouldn't be applied to pnfs flow. Instead, this patch proposed to treat MDS=DS as DS trunking and make sure that MDS's max_connect limit does not apply to the 1st IP returned in the GETDEVICEINFO list. It does so by marking the newly created client with a new flag NFS_CS_PNFS which then used to pass max_connect value to use into the rpc_clnt_test_and_add_xprt() instead of the existing rpc client's max_connect value set by the MDS connection. For example, mount was done without max_connect value set so MDS's rpc client has cl_max_connect=1. Upon calling into rpc_clnt_test_and_add_xprt() and using rpc client's value, the caller passes in max_connect value which is previously been set in the pnfs path (as a part of handling GETDEVICEINFO list of IPs) in nfs4_set_ds_client(). However, when NFS_CS_PNFS flag is not set and we know we are doing MDS trunking, comparing a new IP of the same server, we then set the max_connect value to the existing MDS's value and pass that into rpc_clnt_test_and_add_xprt(). Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13Revert "SUNRPC: Fail faster on bad verifier"Trond Myklebust
This reverts commit 0701214cd6e66585a999b132eb72ae0489beb724. The premise of this commit was incorrect. There are exactly 2 cases where rpcauth_checkverf() will return an error: 1) If there was an XDR decode problem (i.e. garbage data). 2) If gss_validate() had a problem verifying the RPCSEC_GSS MIC. In the second case, there are again 2 subcases: a) The GSS context expires, in which case gss_validate() will force a new context negotiation on retry by invalidating the cred. b) The sequence number check failed because an RPC call timed out, and the client retransmitted the request using a new sequence number, as required by RFC2203. In neither subcase is this a fatal error. Reported-by: Russell Cattelan <cattelan@thebarn.com> Fixes: 0701214cd6e6 ("SUNRPC: Fail faster on bad verifier") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13SUNRPC: Mark the cred for revalidation if the server rejects itTrond Myklebust
If the server rejects the credential as being stale, or bad, then we should mark it for revalidation before retransmitting. Fixes: 7f5667a5f8c4 ("SUNRPC: Clean up rpc_verify_header()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>