summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-20net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link()D. Wythe
There is a certain chance to trigger the following panic: PID: 5900 TASK: ffff88c1c8af4100 CPU: 1 COMMAND: "kworker/1:48" #0 [ffff9456c1cc79a0] machine_kexec at ffffffff870665b7 #1 [ffff9456c1cc79f0] __crash_kexec at ffffffff871b4c7a #2 [ffff9456c1cc7ab0] crash_kexec at ffffffff871b5b60 #3 [ffff9456c1cc7ac0] oops_end at ffffffff87026ce7 #4 [ffff9456c1cc7ae0] page_fault_oops at ffffffff87075715 #5 [ffff9456c1cc7b58] exc_page_fault at ffffffff87ad0654 #6 [ffff9456c1cc7b80] asm_exc_page_fault at ffffffff87c00b62 [exception RIP: ib_alloc_mr+19] RIP: ffffffffc0c9cce3 RSP: ffff9456c1cc7c38 RFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000004 RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88c1ea281d00 R8: 000000020a34ffff R9: ffff88c1350bbb20 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000010 R14: ffff88c1ab040a50 R15: ffff88c1ea281d00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff9456c1cc7c60] smc_ib_get_memory_region at ffffffffc0aff6df [smc] #8 [ffff9456c1cc7c88] smcr_buf_map_link at ffffffffc0b0278c [smc] #9 [ffff9456c1cc7ce0] __smc_buf_create at ffffffffc0b03586 [smc] The reason here is that when the server tries to create a second link, smc_llc_srv_add_link() has no protection and may add a new link to link group. This breaks the security environment protected by llc_conf_mutex. Fixes: 2d2209f20189 ("net/smc: first part of add link processing as SMC server") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20Merge branch 'taprio-queuemaxsdu-fixes'Paolo Abeni
Vladimir Oltean says: ==================== taprio queueMaxSDU fixes This fixes 3 issues noticed while attempting to reoffload the dynamically calculated queueMaxSDU values. These are: - Dynamic queueMaxSDU is not calculated correctly due to a lost patch - Dynamically calculated queueMaxSDU needs to be clamped on the low end - Dynamically calculated queueMaxSDU needs to be clamped on the high end ==================== Link: https://lore.kernel.org/r/20230215224632.2532685-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net/sched: taprio: dynamic max_sdu larger than the max_mtu is unlimitedVladimir Oltean
It makes no sense to keep randomly large max_sdu values, especially if larger than the device's max_mtu. These are visible in "tc qdisc show". Such a max_sdu is practically unlimited and will cause no packets for that traffic class to be dropped on enqueue. Just set max_sdu_dynamic to U32_MAX, which in the logic below causes taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user space of 0 (unlimited). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net/sched: taprio: don't allow dynamic max_sdu to go negative after stab ↵Vladimir Oltean
adjustment The overhead specified in the size table comes from the user. With small time intervals (or gates always closed), the overhead can be larger than the max interval for that traffic class, and their difference is negative. What we want to happen is for max_sdu_dynamic to have the smallest non-zero value possible (1) which means that all packets on that traffic class are dropped on enqueue. However, since max_sdu_dynamic is u32, a negative is represented as a large value and oversized dropping never happens. Use max_t with int to force a truncation of max_frm_len to no smaller than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no smaller than 1. Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net/sched: taprio: fix calculation of maximum gate durationsVladimir Oltean
taprio_calculate_gate_durations() depends on netdev_get_num_tc() and this returns 0. So it calculates the maximum gate durations for no traffic class. I had tested the blamed commit only with another patch in my tree, one which in the end I decided isn't valuable enough to submit ("net/sched: taprio: mask off bits in gate mask that exceed number of TCs"). The problem is that having this patch threw off my testing. By moving the netdev_set_num_tc() call earlier, we implicitly gave to taprio_calculate_gate_durations() the information it needed. Extract only the portion from the unsubmitted change which applies the mqprio configuration to the netdev earlier. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-15-vladimir.oltean@nxp.com/ Fixes: a306a90c8ffe ("net/sched: taprio: calculate tc gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20rxrpc: Fix overproduction of wakeups to recvmsg()David Howells
Fix three cases of overproduction of wakeups: (1) rxrpc_input_split_jumbo() conditionally notifies the app that there's data for recvmsg() to collect if it queues some data - and then its only caller, rxrpc_input_data(), goes and wakes up recvmsg() anyway. Fix the rxrpc_input_data() to only do the wakeup in failure cases. (2) If a DATA packet is received for a call by the I/O thread whilst recvmsg() is busy draining the call's rx queue in the app thread, the call will left on the recvmsg() queue for recvmsg() to pick up, even though there isn't any data on it. This can cause an unexpected recvmsg() with a 0 return and no MSG_EOR set after the reply has been posted to a service call. Fix this by discarding pending calls from the recvmsg() queue that don't need servicing yet. (3) Not-yet-completed calls get requeued after having data read from them, even if they have no data to read. Fix this by only requeuing them if they have data waiting on them; if they don't, the I/O thread will requeue them when data arrives or they fail. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/3386149.1676497685@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20Merge branch 'net-final-gsi-register-updates'Paolo Abeni
Alex Elder says: ==================== net: final GSI register updates I believe this is the last set of changes required to allow IPA v5.0 to be supported. There is a little cleanup work remaining, but that can happen in the next Linux release cycle. Otherwise we just need config data and register definitions for IPA v5.0 (and DTS updates). These are ready but won't be posted without further testing. The first patch in this series fixes a minor bug in a patch just posted, which I found too late. The second eliminates the GSI memory "adjustment"; this was done previously to avoid/delay the need to implement a more general way to define GSI register offsets. Note that this patch causes "checkpatch" warnings due to indentation that aligns with an open parenthesis. The third patch makes use of the newly-defined register offsets, to eliminate the need for a function that hid a few details. The next modifies a different helper function to work properly for IPA v5.0+. The fifth patch changes the way the event ring size is specified based on how it's now done for IPA v5.0+. And the last defines a new register required for IPA v5.0+. ==================== Link: https://lore.kernel.org/r/20230215195352.755744-1-elder@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net: ipa: add HW_PARAM_4 GSI registerAlex Elder
Starting at IPA v5.0, the number of event rings per EE is defined in a field in a new HW_PARAM_4 GSI register rather than HW_PARAM_2. Define this new register and its fields, and update the code that checks the number of rings supported by hardware to use the proper field based on IPA version. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net: ipa: support different event ring encodingAlex Elder
Starting with IPA v5.0, a channel's event ring index is encoded in a field in the CH_C_CNTXT_1 GSI register rather than CH_C_CNTXT_0. Define a new field ID for the former register and encode the event ring in the appropriate register. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net: ipa: avoid setting an undefined fieldAlex Elder
The GSI channel protocol field in the CH_C_CNTXT_0 GSI register is widened starting IPA v5.0, making the CHTYPE_PROTOCOL_MSB field added in IPA v4.5 unnecessary. Update the code to reflect this. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net: ipa: kill ev_ch_e_cntxt_1_length_encode()Alex Elder
Now that we explicitly define each register field width there is no need to have a special encoding function for the event ring length. Add a field for this to the EV_CH_E_CNTXT_1 GSI register, and use it in place of ev_ch_e_cntxt_1_length_encode() (which can be removed). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net: ipa: kill gsi->virt_rawAlex Elder
Starting at IPA v4.5, almost all GSI registers had their offsets changed by a fixed amount (shifted downward by 0xd000). Rather than defining offsets for all those registers dependent on version, an adjustment was applied for most register accesses. This was implemented in commit cdeee49f3ef7f ("net: ipa: adjust GSI register addresses"). It was later modified to be a bit more obvious about the adjusment, in commit 571b1e7e58ad3 ("net: ipa: use a separate pointer for adjusted GSI memory"). We now are able to define every GSI register with its own offset, so there's no need to implement this special adjustment. So get rid of the "virt_raw" pointer, and just maintain "virt" as the (non-adjusted) base address of I/O mapped GSI register memory. Redefine the offsets of all GSI registers (other than the INTER_EE ones, which were not subject to the adjustment) for IPA v4.5+, subtracting 0xd000 from their defined offsets instead. Move the ERROR_LOG and ERROR_LOG_CLR definitions further down in the register definition files so all registers are defined in order of their offset. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net: ipa: fix an incorrect assignmentAlex Elder
I spotted an error in a patch posted this week, unfortunately just after it got accepted. The effect of the bug is that time-based interrupt moderation is disabled. This is not technically a bug, but it is not what is intended. The problem is that a |= assignment got implemented as a simple assignment, so the previously assigned value was ignored. Fixes: edc6158b18af ("net: ipa: define fields for event-ring related registers") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net: dpaa2-eth: do not always set xsk support in xdp_features flagLorenzo Bianconi
Do not always add NETDEV_XDP_ACT_XSK_ZEROCOPY bit in xdp_features flag but check if the NIC really supports it. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/3dba6ea42dc343a9f2d7d1a6a6a6c173235e1ebf.1676471386.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20powerpc/e500: Add missing prototype for 'relocate_init'Christophe Leroy
Kernel test robot reports: arch/powerpc/mm/nohash/e500.c:314:21: warning: no previous prototype for 'relocate_init' [-Wmissing-prototypes] 314 | notrace void __init relocate_init(u64 dt_ptr, phys_addr_t start) | ^~~~~~~~~~~~~ Add it in mm/mmu_decl.h, close to associated is_second_reloc variable declaration. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/oe-kbuild-all/202302181136.wgyCKUcs-lkp@intel.com/ Link: https://lore.kernel.org/r/ac9107acf24135e1a07e8f84d2090572d43e3fe4.1676712510.git.christophe.leroy@csgroup.eu
2023-02-19ksmbd: fix possible memory leak in smb2_lock()Hangyu Hua
argv needs to be free when setup_async_work fails or when the current process is woken up. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-19Linux 6.2v6.2Linus Torvalds
2023-02-19firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3Darrell Kavanagh
Another Lenovo convertable which reports a landscape resolution of 1920x1200 with a pitch of (1920 * 4) bytes, while the actual framebuffer has a resolution of 1200x1920 with a pitch of (1200 * 4) bytes. Signed-off-by: Darrell Kavanagh <darrell.kavanagh@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-02-19arm64: efi: Make efi_rt_lock a raw_spinlockPierre Gondois
Running a rt-kernel base on 6.2.0-rc3-rt1 on an Ampere Altra outputs the following: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: kworker/u320:0 preempt_count: 2, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by kworker/u320:0/9: #0: ffff3fff8c27d128 ((wq_completion)efi_rts_wq){+.+.}-{0:0}, at: process_one_work (./include/linux/atomic/atomic-long.h:41) #1: ffff80000861bdd0 ((work_completion)(&efi_rts_work.work)){+.+.}-{0:0}, at: process_one_work (./include/linux/atomic/atomic-long.h:41) #2: ffffdf7e1ed3e460 (efi_rt_lock){+.+.}-{3:3}, at: efi_call_rts (drivers/firmware/efi/runtime-wrappers.c:101) Preemption disabled at: efi_virtmap_load (./arch/arm64/include/asm/mmu_context.h:248) CPU: 0 PID: 9 Comm: kworker/u320:0 Tainted: G W 6.2.0-rc3-rt1 Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18 Workqueue: efi_rts_wq efi_call_rts Call trace: dump_backtrace (arch/arm64/kernel/stacktrace.c:158) show_stack (arch/arm64/kernel/stacktrace.c:165) dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) dump_stack (lib/dump_stack.c:114) __might_resched (kernel/sched/core.c:10134) rt_spin_lock (kernel/locking/rtmutex.c:1769 (discriminator 4)) efi_call_rts (drivers/firmware/efi/runtime-wrappers.c:101) [...] This seems to come from commit ff7a167961d1 ("arm64: efi: Execute runtime services from a dedicated stack") which adds a spinlock. This spinlock is taken through: efi_call_rts() \-efi_call_virt() \-efi_call_virt_pointer() \-arch_efi_call_virt_setup() Make 'efi_rt_lock' a raw_spinlock to avoid being preempted. [ardb: The EFI runtime services are called with a different set of translation tables, and are permitted to use the SIMD registers. The context switch code preserves/restores neither, and so EFI calls must be made with preemption disabled, rather than only disabling migration.] Fixes: ff7a167961d1 ("arm64: efi: Execute runtime services from a dedicated stack") Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Cc: <stable@vger.kernel.org> # v6.1+ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-02-19IB/mlx5: Extend debug control for CC parametersEdward Srouji
This patch adds rtt_resp_dscp to the current debug controllability of congestion control (CC) parameters. rtt_resp_dscp can be read or written through debugfs. If set, its value overwrites the DSCP of the generated RTT response. Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://lore.kernel.org/r/1dcc3440ee53c688f19f579a051ded81a2aaa70a.1676538714.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-19mips: fix syscall_get_nrElvira Khabirova
The implementation of syscall_get_nr on mips used to ignore the task argument and return the syscall number of the calling thread instead of the target thread. The bug was exposed to user space by commit 201766a20e30f ("ptrace: add PTRACE_GET_SYSCALL_INFO request") and detected by strace test suite. Link: https://github.com/strace/strace/issues/235 Fixes: c2d9f1775731 ("MIPS: Fix syscall_get_nr for the syscall exit tracing.") Cc: <stable@vger.kernel.org> # v3.19+ Co-developed-by: Dmitry V. Levin <ldv@strace.io> Signed-off-by: Dmitry V. Levin <ldv@strace.io> Signed-off-by: Elvira Khabirova <lineprinter0@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-02-19MIPS: SMP-CPS: fix build error when HOTPLUG_CPU not setRandy Dunlap
When MIPS_CPS=y, MIPS_CPS_PM is not set, HOTPLUG_CPU is not set, and KEXEC=y, cps_shutdown_this_cpu() attempts to call cps_pm_enter_state(), which is not built when MIPS_CPS_PM is not set. Conditionally execute the else branch based on CONFIG_HOTPLUG_CPU to remove the build error. This build failure is from a randconfig file. mips-linux-ld: arch/mips/kernel/smp-cps.o: in function `$L162': smp-cps.c:(.text.cps_kexec_nonboot_cpu+0x31c): undefined reference to `cps_pm_enter_state' Fixes: 1447864bee4c ("MIPS: kexec: CPS systems to halt nonboot CPUs") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Dengcheng Zhu <dzhu@wavecomp.com> Cc: Paul Burton <paulburton@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: linux-mips@vger.kernel.org Cc: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-02-19MIPS: DTS: jz4780: add #clock-cells to rtc_devH. Nikolaus Schaller
This makes the driver present the clk32k signal if requested. It is needed to clock the PMU of the BCM4330 WiFi and Bluetooth module of the CI20 board. Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Reviewed-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-02-19ext4: fix task hung in ext4_xattr_delete_inodeBaokun Li
Syzbot reported a hung task problem: ================================================================== INFO: task syz-executor232:5073 blocked for more than 143 seconds. Not tainted 6.2.0-rc2-syzkaller-00024-g512dee0c00ad #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-exec232 state:D stack:21024 pid:5073 ppid:5072 flags:0x00004004 Call Trace: <TASK> context_switch kernel/sched/core.c:5244 [inline] __schedule+0x995/0xe20 kernel/sched/core.c:6555 schedule+0xcb/0x190 kernel/sched/core.c:6631 __wait_on_freeing_inode fs/inode.c:2196 [inline] find_inode_fast+0x35a/0x4c0 fs/inode.c:950 iget_locked+0xb1/0x830 fs/inode.c:1273 __ext4_iget+0x22e/0x3ed0 fs/ext4/inode.c:4861 ext4_xattr_inode_iget+0x68/0x4e0 fs/ext4/xattr.c:389 ext4_xattr_inode_dec_ref_all+0x1a7/0xe50 fs/ext4/xattr.c:1148 ext4_xattr_delete_inode+0xb04/0xcd0 fs/ext4/xattr.c:2880 ext4_evict_inode+0xd7c/0x10b0 fs/ext4/inode.c:296 evict+0x2a4/0x620 fs/inode.c:664 ext4_orphan_cleanup+0xb60/0x1340 fs/ext4/orphan.c:474 __ext4_fill_super fs/ext4/super.c:5516 [inline] ext4_fill_super+0x81cd/0x8700 fs/ext4/super.c:5644 get_tree_bdev+0x400/0x620 fs/super.c:1282 vfs_get_tree+0x88/0x270 fs/super.c:1489 do_new_mount+0x289/0xad0 fs/namespace.c:3145 do_mount fs/namespace.c:3488 [inline] __do_sys_mount fs/namespace.c:3697 [inline] __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fa5406fd5ea RSP: 002b:00007ffc7232f968 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fa5406fd5ea RDX: 0000000020000440 RSI: 0000000020000000 RDI: 00007ffc7232f970 RBP: 00007ffc7232f970 R08: 00007ffc7232f9b0 R09: 0000000000000432 R10: 0000000000804a03 R11: 0000000000000202 R12: 0000000000000004 R13: 0000555556a7a2c0 R14: 00007ffc7232f9b0 R15: 0000000000000000 </TASK> ================================================================== The problem is that the inode contains an xattr entry with ea_inum of 15 when cleaning up an orphan inode <15>. When evict inode <15>, the reference counting of the corresponding EA inode is decreased. When EA inode <15> is found by find_inode_fast() in __ext4_iget(), it is found that the EA inode holds the I_FREEING flag and waits for the EA inode to complete deletion. As a result, when inode <15> is being deleted, we wait for inode <15> to complete the deletion, resulting in an infinite loop and triggering Hung Task. To solve this problem, we only need to check whether the ino of EA inode and parent is the same before getting EA inode. Link: https://syzkaller.appspot.com/bug?extid=77d6fcc37bbb92f26048 Reported-by: syzbot+77d6fcc37bbb92f26048@syzkaller.appspotmail.com Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230110133436.996350-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-19jbd2: fix data missing when reusing bh which is ready to be checkpointedZhihao Cheng
Following process will make data lost and could lead to a filesystem corrupted problem: 1. jh(bh) is inserted into T1->t_checkpoint_list, bh is dirty, and jh->b_transaction = NULL 2. T1 is added into journal->j_checkpoint_transactions. 3. Get bh prepare to write while doing checkpoing: PA PB do_get_write_access jbd2_log_do_checkpoint spin_lock(&jh->b_state_lock) if (buffer_dirty(bh)) clear_buffer_dirty(bh) // clear buffer dirty set_buffer_jbddirty(bh) transaction = journal->j_checkpoint_transactions jh = transaction->t_checkpoint_list if (!buffer_dirty(bh)) __jbd2_journal_remove_checkpoint(jh) // bh won't be flushed jbd2_cleanup_journal_tail __jbd2_journal_file_buffer(jh, transaction, BJ_Reserved) 4. Aborting journal/Power-cut before writing latest bh on journal area. In this way we get a corrupted filesystem with bh's data lost. Fix it by moving the clearing of buffer_dirty bit just before the call to __jbd2_journal_file_buffer(), both bit clearing and jh->b_transaction assignment are under journal->j_list_lock locked, so that jbd2_log_do_checkpoint() will wait until jh's new transaction fininshed even bh is currently not dirty. And journal_shrink_one_cp_list() won't remove jh from checkpoint list if the buffer head is reused in do_get_write_access(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216898 Cc: <stable@kernel.org> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: zhanchengbin <zhanchengbin1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230110015327.1181863-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-19ext4: update s_journal_inum if it changes after journal replayBaokun Li
When mounting a crafted ext4 image, s_journal_inum may change after journal replay, which is obviously unreasonable because we have successfully loaded and replayed the journal through the old s_journal_inum. And the new s_journal_inum bypasses some of the checks in ext4_get_journal(), which may trigger a null pointer dereference problem. So if s_journal_inum changes after the journal replay, we ignore the change, and rewrite the current journal_inum to the superblock. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216541 Reported-by: Luís Henriques <lhenriques@suse.de> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230107032126.4165860-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-19ext4: fail ext4_iget if special inode unallocatedBaokun Li
In ext4_fill_super(), EXT4_ORPHAN_FS flag is cleared after ext4_orphan_cleanup() is executed. Therefore, when __ext4_iget() is called to get an inode whose i_nlink is 0 when the flag exists, no error is returned. If the inode is a special inode, a null pointer dereference may occur. If the value of i_nlink is 0 for any inodes (except boot loader inodes) got by using the EXT4_IGET_SPECIAL flag, the current file system is corrupted. Therefore, make the ext4_iget() function return an error if it gets such an abnormal special inode. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199179 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216541 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216539 Reported-by: Luís Henriques <lhenriques@suse.de> Suggested-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230107032126.4165860-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-19ext4: fix function prototype mismatch for ext4_feat_ktypeKees Cook
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed. ext4_feat_ktype was setting the "release" handler to "kfree", which doesn't have a matching function prototype. Add a simple wrapper with the correct prototype. This was found as a result of Clang's new -Wcast-function-type-strict flag, which is more sensitive than the simpler -Wcast-function-type, which only checks for type width mismatches. Note that this code is only reached when ext4 is a loadable module and it is being unloaded: CFI failure at kobject_put+0xbb/0x1b0 (target: kfree+0x0/0x180; expected type: 0x7c4aa698) ... RIP: 0010:kobject_put+0xbb/0x1b0 ... Call Trace: <TASK> ext4_exit_sysfs+0x14/0x60 [ext4] cleanup_module+0x67/0xedb [ext4] Fixes: b99fee58a20a ("ext4: create ext4_feat kobject dynamically") Cc: Theodore Ts'o <tytso@mit.edu> Cc: Eric Biggers <ebiggers@kernel.org> Cc: stable@vger.kernel.org Build-tested-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230103234616.never.915-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230104210908.gonna.388-kees@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-19ext4: remove unnecessary variable initializationXU pengfei
Variables are assigned first and then used. Initialization is not required. Signed-off-by: XU pengfei <xupengfei@nfschina.com> Link: https://lore.kernel.org/r/20230104055229.3663-1-xupengfei@nfschina.com
2023-02-18ext4: fix inode tree inconsistency caused by ENOMEMzhanchengbin
If ENOMEM fails when the extent is splitting, we need to restore the length of the split extent. In the ext4_split_extent_at function, only in ext4_ext_create_new_leaf will it alloc memory and change the shape of the extent tree,even if an ENOMEM is returned at this time, the extent tree is still self-consistent, Just restore the split extent lens in the function ext4_split_extent_at. ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf 1)ext4_ext_split ext4_find_extent 2)ext4_ext_grow_indepth ext4_find_extent Signed-off-by: zhanchengbin <zhanchengbin1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230103022812.130603-1-zhanchengbin1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-18ext4: refuse to create ea block when umountedJun Nie
The ea block expansion need to access s_root while it is already set as NULL when umount is triggered. Refuse this request to avoid panic. Reported-by: syzbot+2dacb8f015bf1420155f@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08 Link: https://lore.kernel.org/r/20230103014517.495275-3-jun.nie@linaro.org Cc: stable@kernel.org Signed-off-by: Jun Nie <jun.nie@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-18ext4: optimize ea_inode block expansionJun Nie
Copy ea data from inode entry when expanding ea block if possible. Then remove the ea entry if expansion success. Thus memcpy to a temporary buffer may be avoided. If the expansion fails, we do not need to recovery the removed ea entry neither in this way. Reported-by: syzbot+2dacb8f015bf1420155f@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08 Link: https://lore.kernel.org/r/20230103014517.495275-2-jun.nie@linaro.org Cc: stable@kernel.org Signed-off-by: Jun Nie <jun.nie@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-18ext4: remove dead code in updating backup sbTanmay Bhushan
ext4_update_backup_sb checks for err having some value after unlocking buffer. But err has not been updated till that point in any code which will lead execution of the code in question. Signed-off-by: Tanmay Bhushan <007047221b@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221230141858.3828-1-007047221b@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-18Merge tag 'x86-urgent-2023-02-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix for x86. Revert the recent change to the MTRR code which aimed to support SEV-SNP guests on Hyper-V. It caused a regression on XEN Dom0 kernels. The underlying issue of MTTR (mis)handling in the x86 code needs some deeper investigation and is definitely not 6.2 material" * tag 'x86-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mtrr: Revert 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
2023-02-18Merge tag 'timers-urgent-2023-02-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A fix for a long standing issue in the alarmtimer code. Posix-timers armed with a short interval with an ignored signal result in an unpriviledged DoS. Due to the ignored signal the timer switches into self rearm mode. This issue had been "fixed" before but a rework of the alarmtimer code 5 years ago lost that workaround. There is no real good solution for this issue, which is also worked around in the core posix-timer code in the same way, but it certainly moved way up on the ever growing todo list" * tag 'timers-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: alarmtimer: Prevent starvation by small intervals and SIG_IGN
2023-02-18Merge tag 'irq-urgent-2023-02-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single build fix for the PCI/MSI infrastructure. The addition of the new alloc/free interfaces in this cycle forgot to add stub functions for pci_msix_alloc_irq_at() and pci_msix_free_irq() for the CONFIG_PCI_MSI=n case" * tag 'irq-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Provide missing stubs for CONFIG_PCI_MSI=n
2023-02-19Merge tag 'irqchip-6.3' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - New and improved irqdomain locking, closing a number of races that became apparent now that we are able to probe drivers in parallel - A bunch of OF node refcounting bugs have been fixed - We now have a new IPI mux, lifted from the Apple AIC code and made common. It is expected that riscv will eventually benefit from it - Two small fixes for the Broadcom L2 drivers - Various cleanups and minor bug fixes Link: https://lore.kernel.org/r/20230218143452.3817627-1-maz@kernel.org
2023-02-18tracing: Remove unnecessary NULL assignmentWang ShaoBo
Remove unnecessary NULL assignment int create_new_subsystem(). Link: https://lkml.kernel.org/r/20221123065124.3982439-1-bobo.shaobowang@huawei.com Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-18tracepoint: Allow livepatch module add trace eventJianlin Lv
In the case of keeping the system running, the preferred method for tracing the kernel is dynamic tracing (kprobe), but the drawback of this method is that events are lost, especially when tracing packages in the network stack. Livepatching provides a potential solution, which is to reimplement the function you want to replace and insert a static tracepoint. In such a way, custom stable static tracepoints can be expanded without rebooting the system. Link: https://lkml.kernel.org/r/20221102160236.11696-1-iecedge@gmail.com Signed-off-by: Jianlin Lv <iecedge@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-18tracing: Always use canonical ftrace pathRoss Zwisler
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing Many comments and Kconfig help messages in the tracing code still refer to this older debugfs path, so let's update them to avoid confusion. Link: https://lore.kernel.org/linux-trace-kernel/20230215223350.2658616-2-zwisler@google.com Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm/x86 fixes from Paolo Bonzini: - zero all padding for KVM_GET_DEBUGREGS - fix rST warning - disable vPMU support on hybrid CPUs * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: initialize all of the kvm_debugregs structure before sending it to userspace perf/x86: Refuse to export capabilities for hybrid PMUs KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs) Documentation/hw-vuln: Fix rST warning
2023-02-18Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 regression fix from Will Deacon: "Apologies for the _extremely_ late pull request here, but we had a 'perf' (i.e. CPU PMU) regression on the Apple M1 reported on Wednesday [1] which was introduced by bd2756811766 ("perf: Rewrite core context handling") during the merge window. Mark and I looked into this and noticed an additional problem caused by the same patch, where the 'CHAIN' event (used to combine two adjacent 32-bit counters into a single 64-bit counter) was not being filtered correctly. Mark posted a series on Thursday [2] which addresses both of these regressions and I queued it the same day. The changes are small, self-contained and have been confirmed to fix the original regression. Summary: - Fix 'perf' regression for non-standard CPU PMU hardware (i.e. Apple M1)" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: perf: reject CHAIN events at creation time arm_pmu: fix event CPU filtering
2023-02-18Merge tag 'block-6.2-2023-02-17' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fix from Jens Axboe: "I guess this is what can happen when you prep things early for going away, something else comes in last minute. This one fixes another regression in 6.2 for NVMe, from this release, and hence we should probably get it submitted for 6.2. Still waiting for the original reporter (see bugzilla linked in the commit) to test this, but Keith managed to setup and recreate the issue and tested the patch that way" * tag 'block-6.2-2023-02-17' of git://git.kernel.dk/linux: nvme-pci: refresh visible attrs for cmb attributes
2023-02-18xen: sysfs: make kobj_type structure constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230216-kobj_type-xen-v1-1-742423de7d71@weissschuh.net Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-18ieee802154: Drop device trackersMiquel Raynal
In order to prevent a device from disappearing when a background job was started, dev_hold() and dev_put() calls were made. During the stabilization phase of the scan/beacon features, it was later decided that removing the device while a background job was ongoing was a valid use case, and we should instead stop the background job and then remove the device, rather than prevent the device from being removed. This is what is currently done, which means manually reference counting the device during background jobs is no longer needed. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Fixes: 9bc114504b07 ("ieee802154: Add support for user beaconing requests") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-7-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18mac802154: Fix an always true conditionMiquel Raynal
At this stage we simply do not care about the delayed work value, because active scan is not yet supported, so we can blindly queue another work once a beacon has been sent. It fixes a smatch warning: mac802154_beacon_worker() warn: always true condition '(local->beacon_interval >= 0) => (0-u32max >= 0)' Fixes: 3accf4762734 ("mac802154: Handle basic beaconing") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-6-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18mac802154: Send beacons using the MLME Tx pathMiquel Raynal
Using ieee802154_subif_start_xmit() to bypass the net queue when sending beacons is broken because it does not acquire the HARD_TX_LOCK(), hence not preventing datagram buffers to be smashed by beacons upon contention situation. Using the mlme_tx helper is not the best fit either but at least it is not buggy and has little-to-no performance hit. More details are given in the comment explaining this choice in the code. Fixes: 3accf4762734 ("mac802154: Handle basic beaconing") Reported-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-5-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18ieee802154: Change error code on monitor scan netlink requestMiquel Raynal
Returning EPERM gives the impression that "right now" it is not possible, but "later" it could be, while what we want to express is the fact that this is not currently supported at all (might change in the future). So let's return EOPNOTSUPP instead. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Suggested-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-4-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18ieee802154: Convert scan error messages to extackMiquel Raynal
Instead of printing error messages in the kernel log, let's use extack. When there is a netlink error returned that could be further specified with a string, use extack as well. Apply this logic to the very recent scan/beacon infrastructure. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Fixes: 9bc114504b07 ("ieee802154: Add support for user beaconing requests") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-3-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18ieee802154: Use netlink policies when relevant on scan parametersMiquel Raynal
Instead of open-coding scan parameters (page, channels, duration, etc), let's use the existing NLA_POLICY* macros. This help greatly reducing the error handling and clarifying the overall logic. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230214135035.1202471-2-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>