linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-07-10	scsi: ufs: core: Initialize struct uic_command once	Bart Van Assche
	Instead of first zero-initializing struct uic_command and next initializing it memberwise, initialize all members at once. Reviewed-by: Daejun Park <daejun7.park@samsung.com> Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240708211716.2827751-3-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10	scsi: ufs: core: Declare functions once	Bart Van Assche
	Several functions are declared in include/ufs/ufshcd.h and also in drivers/ufs/core/ufshcd-priv.h. Remove the duplicate declarations. Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240708211716.2827751-2-bvanassche@acm.org Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Keoseong Park <keosung.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10	Merge branch 'ice-support-to-dump-phy-config-fec'	Jakub Kicinski
	Tony Nguyen says: ==================== ice: Support to dump PHY config, FEC Anil Samal says: Implementation to dump PHY configuration and FEC statistics to facilitate link level debugging of customer issues. Implementation has two parts a. Serdes equalization # ethtool -d eth0 Output: Offset Values ------ ------ 0x0000: 00 00 00 00 03 00 00 00 05 00 00 00 01 08 00 40 0x0010: 01 00 00 40 00 00 39 3c 01 00 00 00 00 00 00 00 0x0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 0x01f0: 01 00 00 00 ef be ad de 8f 00 00 00 00 00 00 00 0x0200: 00 00 00 00 ef be ad de 00 00 00 00 00 00 00 00 0x0210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0230: 00 00 00 00 00 00 00 00 00 00 00 00 fa ff 00 00 0x0240: 06 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 0x0250: 0f b0 0f b0 00 00 00 00 00 00 00 00 00 00 00 00 0x0260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02e0: 00 00 00 00 00 00 00 00 00 00 00 00 Current implementation appends 176 bytes i.e. 44 bytes * 4 serdes lane. For port with 2 serdes lane, first 88 bytes are valid values and remaining 88 bytes are filled with zero. Similarly for port with 1 serdes lane, first 44 bytes are valid and remaining 132 bytes are marked zero. Each set of serdes equalizer parameter (i.e. set of 44 bytes) follows below order a. rx_equalization_pre2 b. rx_equalization_pre1 c. rx_equalization_post1 d. rx_equalization_bflf e. rx_equalization_bfhf f. rx_equalization_drate g. tx_equalization_pre1 h. tx_equalization_pre3 i. tx_equalization_atten j. tx_equalization_post1 k. tx_equalization_pre2 Where each individual equalizer parameter is of 4 bytes. As ethtool prints values as individual bytes, for little endian machine these values will be in reverse byte order. b. FEC block counts # ethtool -I --show-fec eth0 Output: FEC parameters for eth0: Supported/Configured FEC encodings: Auto RS BaseR Active FEC encoding: RS Statistics: corrected_blocks: 0 uncorrectable_blocks: 0 This series do following: Patch 1 - Implementation to support user provided flag for side band queue command. Patch 2 - Currently driver does not have a way to derive serdes lane number, pcs quad , pcs port from port number. So we introduced a mechanism to derive above info. Ethtool interface extension to include FEC statistics counter. Patch 3 - Ethtool interface extension to include serdes equalizer output. v1: https://lore.kernel.org/netdev/20240702180710.2606969-1-anthony.l.nguyen@intel.com/ ==================== Link: https://patch.msgid.link/20240709202951.2103115-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10	ice: Implement driver functionality to dump serdes equalizer values	Anil Samal
	To debug link issues in the field, serdes Tx/Rx equalizer values help to determine the health of serdes lane. Extend 'ethtool -d' option to dump serdes Tx/Rx equalizer. The following list of equalizer param is supported a. rx_equalization_pre2 b. rx_equalization_pre1 c. rx_equalization_post1 d. rx_equalization_bflf e. rx_equalization_bfhf f. rx_equalization_drate g. tx_equalization_pre1 h. tx_equalization_pre3 i. tx_equalization_atten j. tx_equalization_post1 k. tx_equalization_pre2 Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anil Samal <anil.samal@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240709202951.2103115-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10	ice: Implement driver functionality to dump fec statistics	Anil Samal
	To debug link issues in the field, it is paramount to dump fec corrected/uncorrected block counts from firmware. Firmware requires PCS quad number and PCS port number to read FEC statistics. Current driver implementation does not maintain above physical properties of a port. Add new driver API to derive physical properties of an input port.These properties include PCS quad number, PCS port number, serdes lane count, primary serdes lane number. Extend ethtool option '--show-fec' to support fec statistics. The IEEE standard mandates two sets of counters: - 30.5.1.1.17 aFECCorrectedBlocks - 30.5.1.1.18 aFECUncorrectableBlocks Standard defines above statistics per lane but current implementation supports total FEC statistics per port i.e. sum of all lane per port. Find sample output below FEC parameters for ens21f0np0: Supported/Configured FEC encodings: Auto RS BaseR Active FEC encoding: RS Statistics: corrected_blocks: 0 uncorrectable_blocks: 0 Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anil Samal <anil.samal@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240709202951.2103115-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10	ice: Extend Sideband Queue command to support flags	Anil Samal
	Current driver implementation for Sideband Queue supports a fixed flag (ICE_AQ_FLAG_RD). To retrieve FEC statistics from firmware, Sideband Queue command is used with a different flag. Extend API for Sideband Queue command to use 'flags' as input argument. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anil Samal <anil.samal@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240709202951.2103115-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10	e1000e: fix force smbus during suspend flow	Vitaly Lifshits
	Commit 861e8086029e ("e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue") resolved a PHY access loss during suspend on Meteor Lake consumer platforms, but it affected corporate systems incorrectly. A better fix, working for both consumer and corporate systems, was proposed in commit bfd546a552e1 ("e1000e: move force SMBUS near the end of enable_ulp function"). However, it introduced a regression on older devices, such as [8086:15B8], [8086:15F9], [8086:15BE]. This patch aims to fix the secondary regression, by limiting the scope of the changes to Meteor Lake platforms only. Fixes: bfd546a552e1 ("e1000e: move force SMBUS near the end of enable_ulp function") Reported-by: Todd Brandt <todd.e.brandt@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218940 Reported-by: Dieter Mummenschanz <dmummenschanz@web.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218936 Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240709203123.2103296-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10	dt-bindings: net: convert enetc to yaml	Frank Li
	Convert enetc device binding file to yaml. Split to 3 yaml files, 'fsl,enetc.yaml', 'fsl,enetc-mdio.yaml', 'fsl,enetc-ierb.yaml'. Additional Changes: - Add pci<vendor id>,<production id> in compatible string. - Ref to common ethernet-controller.yaml and mdio.yaml. - Add Wei fang, Vladimir and Claudiu as maintainer. - Update ENETC description. - Remove fixed-link part. Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240709214841.570154-1-Frank.Li@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10	tcp: avoid too many retransmit packets	Eric Dumazet
	If a TCP socket is using TCP_USER_TIMEOUT, and the other peer retracted its window to zero, tcp_retransmit_timer() can retransmit a packet every two jiffies (2 ms for HZ=1000), for about 4 minutes after TCP_USER_TIMEOUT has 'expired'. The fix is to make sure tcp_rtx_probe0_timed_out() takes icsk->icsk_user_timeout into account. Before blamed commit, the socket would not timeout after icsk->icsk_user_timeout, but would use standard exponential backoff for the retransmits. Also worth noting that before commit e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0"), the issue would last 2 minutes instead of 4. Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10	dt-bindings: net: realtek,rtl82xx: Document RTL8211F LED support	Marek Vasut
	The RTL8211F PHY does support LED configuration, document support for LEDs in the binding document. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240708211649.165793-1-marex@denx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-11	ASoC: dt-bindings: convert qcom sound bindings to	Mark Brown
	Merge series from Rayyan Ansari <rayyan.ansari@linaro.org>: These patches convert the remaining plain text bindings for Qualcomm sound drivers to dt schema, so device trees can be validated against them.
2024-07-11	firmware: cs_dsp: Some small coding improvements	Mark Brown
	Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>: Commit series that makes some small improvements to code and the kernel log messages.
2024-07-10	Merge branch 'fixes-for-bpf-timer-lockup-and-uaf'	Alexei Starovoitov
	Kumar Kartikeya Dwivedi says: ==================== Fixes for BPF timer lockup and UAF The following patches contain fixes for timer lockups and a use-after-free scenario. This set proposes to fix the following lockup situation for BPF timers. CPU 1 CPU 2 bpf_timer_cb bpf_timer_cb timer_cb1 timer_cb2 bpf_timer_cancel(timer_cb2) bpf_timer_cancel(timer_cb1) hrtimer_cancel hrtimer_cancel In this case, both callbacks will continue waiting for each other to finish synchronously, causing a lockup. The proposed fix adds support for tracking in-flight cancellations begun by other timer callbacks for a particular BPF timer. Whenever preparing to call hrtimer_cancel, a callback will increment the target timer's counter, then inspect its in-flight cancellations, and if non-zero, return -EDEADLK to avoid situations where the target timer's callback is waiting for its completion. This does mean that in cases where a callback is fired and cancelled, it will be unable to cancel any timers in that execution. This can be alleviated by maintaining the list of waiting callbacks in bpf_hrtimer and searching through it to avoid interdependencies, but this may introduce additional delays in bpf_timer_cancel, in addition to requiring extra state at runtime which may need to be allocated or reused from bpf_hrtimer storage. Moreover, extra synchronization is needed to delete these elements from the list of waiting callbacks once hrtimer_cancel has finished. The second patch is for a deadlock situation similar to above in bpf_timer_cancel_and_free, but also a UAF scenario that can occur if timer is armed before entering it, if hrtimer_running check causes the hrtimer_cancel call to be skipped. As seen above, synchronous hrtimer_cancel would lead to deadlock (if same callback tries to free its timer, or two timers free each other), therefore we queue work onto the global workqueue to ensure outstanding timers are cancelled before bpf_hrtimer state is freed. Further details are in the patches. ==================== Link: https://lore.kernel.org/r/20240709185440.1104957-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10	dt-bindings: clock: Document T-Head TH1520 AP_SUBSYS controller	Drew Fustini
	Document bindings for the T-Head TH1520 AP sub-system clock controller. Link: https://openbeagle.org/beaglev-ahead/beaglev-ahead/-/blob/main/docs/TH1520%20System%20User%20Manual.pdf Co-developed-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Drew Fustini <dfustini@tenstorrent.com> Link: https://lore.kernel.org/r/20240623-th1520-clk-v2-1-ad8d6432d9fb@tenstorrent.com Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-10	f2fs: clean up addrs_per_{inode,block}()	Chao Yu
	Introduce a new help addrs_per_page() to wrap common code from addrs_per_inode() and addrs_per_block() for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10	f2fs: clean up F2FS_I()	Chao Yu
	Use temporary variable instead of F2FS_I() for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10	bpf: Defer work in bpf_timer_cancel_and_free	Kumar Kartikeya Dwivedi
	Currently, the same case as previous patch (two timer callbacks trying to cancel each other) can be invoked through bpf_map_update_elem as well, or more precisely, freeing map elements containing timers. Since this relies on hrtimer_cancel as well, it is prone to the same deadlock situation as the previous patch. It would be sufficient to use hrtimer_try_to_cancel to fix this problem, as the timer cannot be enqueued after async_cancel_and_free. Once async_cancel_and_free has been done, the timer must be reinitialized before it can be armed again. The callback running in parallel trying to arm the timer will fail, and freeing bpf_hrtimer without waiting is sufficient (given kfree_rcu), and bpf_timer_cb will return HRTIMER_NORESTART, preventing the timer from being rearmed again. However, there exists a UAF scenario where the callback arms the timer before entering this function, such that if cancellation fails (due to timer callback invoking this routine, or the target timer callback running concurrently). In such a case, if the timer expiration is significantly far in the future, the RCU grace period expiration happening before it will free the bpf_hrtimer state and along with it the struct hrtimer, that is enqueued. Hence, it is clear cancellation needs to occur after async_cancel_and_free, and yet it cannot be done inline due to deadlock issues. We thus modify bpf_timer_cancel_and_free to defer work to the global workqueue, adding a work_struct alongside rcu_head (both used at _different_ points of time, so can share space). Update existing code comments to reflect the new state of affairs. Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240709185440.1104957-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10	bpf: Fail bpf_timer_cancel when callback is being cancelled	Kumar Kartikeya Dwivedi
	Given a schedule: timer1 cb timer2 cb bpf_timer_cancel(timer2); bpf_timer_cancel(timer1); Both bpf_timer_cancel calls would wait for the other callback to finish executing, introducing a lockup. Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps track of all in-flight cancellation requests for a given BPF timer. Whenever cancelling a BPF timer, we must check if we have outstanding cancellation requests, and if so, we must fail the operation with an error (-EDEADLK) since cancellation is synchronous and waits for the callback to finish executing. This implies that we can enter a deadlock situation involving two or more timer callbacks executing in parallel and attempting to cancel one another. Note that we avoid incrementing the cancelling counter for the target timer (the one being cancelled) if bpf_timer_cancel is not invoked from a callback, to avoid spurious errors. The whole point of detecting cur->cancelling and returning -EDEADLK is to not enter a busy wait loop (which may or may not lead to a lockup). This does not apply in case the caller is in a non-callback context, the other side can continue to cancel as it sees fit without running into errors. Background on prior attempts: Earlier versions of this patch used a bool 'cancelling' bit and used the following pattern under timer->lock to publish cancellation status. lock(t->lock); t->cancelling = true; mb(); if (cur->cancelling) return -EDEADLK; unlock(t->lock); hrtimer_cancel(t->timer); t->cancelling = false; The store outside the critical section could overwrite a parallel requests t->cancelling assignment to true, to ensure the parallely executing callback observes its cancellation status. It would be necessary to clear this cancelling bit once hrtimer_cancel is done, but lack of serialization introduced races. Another option was explored where bpf_timer_start would clear the bit when (re)starting the timer under timer->lock. This would ensure serialized access to the cancelling bit, but may allow it to be cleared before in-flight hrtimer_cancel has finished executing, such that lockups can occur again. Thus, we choose an atomic counter to keep track of all outstanding cancellation requests and use it to prevent lockups in case callbacks attempt to cancel each other while executing in parallel. Reported-by: Dohyun Kim <dohyunkim@google.com> Reported-by: Neel Natu <neelnatu@google.com> Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240709185440.1104957-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10	f2fs: use meta inode for GC of COW file	Sunmin Jeong
	In case of the COW file, new updates and GC writes are already separated to page caches of the atomic file and COW file. As some cases that use the meta inode for GC, there are some race issues between a foreground thread and GC thread. To handle them, we need to take care when to invalidate and wait writeback of GC pages in COW files as the case of using the meta inode. Also, a pointer from the COW inode to the original inode is required to check the state of original pages. For the former, we can solve the problem by using the meta inode for GC of COW files. Then let's get a page from the original inode in move_data_block when GCing the COW file to avoid race condition. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable@vger.kernel.org #v5.19+ Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com> Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10	f2fs: use meta inode for GC of atomic file	Sunmin Jeong
	The page cache of the atomic file keeps new data pages which will be stored in the COW file. It can also keep old data pages when GCing the atomic file. In this case, new data can be overwritten by old data if a GC thread sets the old data page as dirty after new data page was evicted. Also, since all writes to the atomic file are redirected to COW inodes, GC for the atomic file is not working well as below. f2fs_gc(gc_type=FG_GC) - select A as a victim segment do_garbage_collect - iget atomic file's inode for block B move_data_page f2fs_do_write_data_page - use dn of cow inode - set fio->old_blkaddr from cow inode - seg_freed is 0 since block B is still valid - goto gc_more and A is selected as victim again To solve the problem, let's separate GC writes and updates in the atomic file by using the meta inode for GC writes. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable@vger.kernel.org #v5.19+ Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com> Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10	f2fs: only fragment segment in the same section	Sheng Yong
	When new_curseg() is allocating a new segment, if mode=fragment:xxx is switched on in large section scenario, __get_next_segno() will select the next segno randomly in the range of [0, maxsegno] in order to fragment segments. If the candidate segno is free, get_new_segment() will use it directly as the new segment. However, if the section of the candidate is not empty, and some other segments have already been used, and have a different type (e.g NODE) with the candidate (e.g DATA), GC will complain inconsistent segment type later. This could be reproduced by the following steps: dd if=/dev/zero of=test.img bs=1M count=10240 mkfs.f2fs -s 128 test.img mount -t f2fs test.img /mnt -o mode=fragment:block echo 1 > /sys/fs/f2fs/loop0/max_fragment_chunk echo 512 > /sys/fs/f2fs/loop0/max_fragment_hole dd if=/dev/zero of=/mnt/testfile bs=4K count=100 umount /mnt F2FS-fs (loop0): Inconsistent segment (4377) type [0, 1] in SSA and SIT In order to allow simulating segment fragmentation in large section scenario, this patch reduces the candidate range: * if curseg is the last segment in the section, return curseg->segno to make get_new_segment() itself find the next free segment. * if curseg is in the middle of the section, select candicate randomly in the range of [curseg + 1, last_seg_in_the_same_section] to keep type consistent. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Sheng Yong <shengyong@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10	f2fs: fix to update user block counts in block_operations()	Chao Yu
	Commit 59c9081bc86e ("f2fs: allow write page cache when writting cp") allows write() to write data to page cache during checkpoint, so block count fields like .total_valid_block_count, .alloc_valid_block_count and .rf_node_block_count may encounter race condition as below: CP Thread A - write_checkpoint - block_operations - f2fs_down_write(&sbi->node_change) - __prepare_cp_block : ckpt->valid_block_count = .total_valid_block_count - f2fs_up_write(&sbi->node_change) - write - f2fs_preallocate_blocks - f2fs_map_blocks(,F2FS_GET_BLOCK_PRE_AIO) - f2fs_map_lock - f2fs_down_read(&sbi->node_change) - f2fs_reserve_new_blocks - inc_valid_block_count : percpu_counter_add(&sbi->alloc_valid_block_count, count) : sbi->total_valid_block_count += count - f2fs_up_read(&sbi->node_change) - do_checkpoint : sbi->last_valid_block_count = sbi->total_valid_block_count : percpu_counter_set(&sbi->alloc_valid_block_count, 0) : percpu_counter_set(&sbi->rf_node_block_count, 0) - fsync - need_do_checkpoint - f2fs_space_for_roll_forward : alloc_valid_block_count was reset to zero, so, it may missed last data during checkpoint Let's change to update .total_valid_block_count, .alloc_valid_block_count and .rf_node_block_count in block_operations(), then their access can be protected by .node_change and .cp_rwsem lock, so that it can avoid above race condition. Fixes: 59c9081bc86e ("f2fs: allow write page cache when writting cp") Cc: Yunlei He <heyunlei@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10	f2fs: remove unreachable lazytime mount option parsing	Eric Sandeen
	The lazytime/nolazytime options are now handled in the VFS, and are never seen in filesystem parsers, so remove handling of these options from f2fs. Note: when lazytime support was added in 6d94c74ab85f it made lazytime the default in default_options() - as a result, lazytime cannot be disabled (because Opt_nolazytime is never seen in f2fs parsing). If lazytime is desired to be configurable, and default off is OK, default_options() could be updated to stop setting it by default and allow mount option control. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10	f2fs: fix null reference error when checking end of zone	Daejun Park
	This patch fixes a potentially null pointer being accessed by is_end_zone_blkaddr() that checks the last block of a zone when f2fs is mounted as a single device. Fixes: e067dc3c6b9c ("f2fs: maintain six open zones for zoned devices") Signed-off-by: Daejun Park <daejun7.park@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10	bpf: fix order of args in call to bpf_map_kvcalloc	Mohammad Shehar Yaar Tausif
	The original function call passed size of smap->bucket before the number of buckets which raises the error 'calloc-transposed-args' on compilation. Vlastimil Babka added: The order of parameters can be traced back all the way to 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") accross several refactorings, and that's why the commit is used as a Fixes: tag. In v6.10-rc1, a different commit 2c321f3f70bc ("mm: change inlined allocation helpers to account at the call site") however exposed the order of args in a way that gcc-14 has enough visibility to start warning about it, because (in !CONFIG_MEMCG case) bpf_map_kvcalloc is then a macro alias for kvcalloc instead of a static inline wrapper. To sum up the warning happens when the following conditions are all met: - gcc-14 is used (didn't see it with gcc-13) - commit 2c321f3f70bc is present - CONFIG_MEMCG is not enabled in .config - CONFIG_WERROR turns this from a compiler warning to error Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") Reviewed-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Mohammad Shehar Yaar Tausif <sheharyaar48@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20240710100521.15061-2-vbabka@suse.cz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10	regmap: Implement regmap_multi_reg_read()	Mark Brown
	Merge series from Guenter Roeck <linux@roeck-us.net>: regmap_multi_reg_read() is similar to regmap_bilk_read() but reads from an array of non-sequential registers. It is helpful if multiple non- sequential registers need to be read in a single operation which would otherwise have to be mutex protected. The name of the new function was chosen to match the existing function regmap_multi_reg_write().
2024-07-10	Merge tag 'mm-hotfixes-stable-2024-07-10-13-19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes, 15 of which are cc:stable. No identifiable theme here - all are singleton patches, 19 are for MM" * tag 'mm-hotfixes-stable-2024-07-10-13-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio() filemap: replace pte_offset_map() with pte_offset_map_nolock() arch/xtensa: always_inline get_current() and current_thread_info() sched.h: always_inline alloc_tag_{save\|restore} to fix modpost warnings MAINTAINERS: mailmap: update Lorenzo Stoakes's email address mm: fix crashes from deferred split racing folio migration lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat mm: gup: stop abusing try_grab_folio nilfs2: fix kernel bug on rename operation of broken directory mm/hugetlb_vmemmap: fix race with speculative PFN walkers cachestat: do not flush stats in recency check mm/shmem: disable PMD-sized page cache if needed mm/filemap: skip to create PMD-sized page cache if needed mm/readahead: limit page cache size in page_cache_ra_order() mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray mm/damon/core: merge regions aggressively when max_nr_regions is unmet Fix userfaultfd_api to return EINVAL as expected mm: vmalloc: check if a hash-index is in cpu_possible_mask mm: prevent derefencing NULL ptr in pfn_section_valid() ...
2024-07-10	PCI: vmd: Create domain symlink before pci_bus_add_devices()	Jiwei Sun
	The vmd driver creates a "domain" symlink in sysfs for each VMD bridge. Previously this symlink was created after pci_bus_add_devices() added devices below the VMD bridge and emitted udev events to announce them to userspace. This led to a race between userspace consumers of the udev events and the kernel creation of the symlink. One such consumer is mdadm, which assembles block devices into a RAID array, and for devices below a VMD bridge, mdadm depends on the "domain" symlink. If mdadm loses the race, it may be unable to assemble a RAID array, which may cause a boot failure or other issues, with complaints like this: (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: Unable to get real path for '/sys/bus/pci/drivers/vmd/0000:c7:00.5/domain/device'' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: /dev/nvme1n1 is not attached to Intel(R) RAID controller.' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: No OROM/EFI properties for /dev/nvme1n1' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: no RAID superblock on /dev/nvme1n1.' (udev-worker)[2149]: nvme1n1: Process '/sbin/mdadm -I /dev/nvme1n1' failed with exit code 1. This symptom prevents the OS from booting successfully. After a NVMe disk is probed/added by the nvme driver, udevd invokes mdadm to detect if there is a mdraid associated with this NVMe disk, and mdadm determines if a NVMe device is connected to a particular VMD domain by checking the "domain" symlink. For example: Thread A Thread B Thread mdadm vmd_enable_domain pci_bus_add_devices __driver_probe_device ... work_on_cpu schedule_work_on : wakeup Thread B nvme_probe : wakeup scan_work to scan nvme disk and add nvme disk then wakeup udevd : udevd executes mdadm command flush_work main : wait for nvme_probe done ... __driver_probe_device find_driver_devices : probe next nvme device : 1) Detect domain symlink ... 2) Find domain symlink ... from vmd sysfs ... 3) Domain symlink not ... created yet; failed sysfs_create_link : create domain symlink Create the VMD "domain" symlink before invoking pci_bus_add_devices() to avoid this race. Suggested-by: Adrian Huang <ahuang12@lenovo.com> Link: https://lore.kernel.org/linux-pci/20240605124844.24293-1-sjiwei@163.com Signed-off-by: Jiwei Sun <sunjw10@lenovo.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com>
2024-07-10	Merge tag 'scsi-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "One core change that moves a disk start message to a location where it will only be printed once instead of twice plus a couple of error handling race fixes in the ufs driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Do not repeat the starting disk message scsi: ufs: core: Fix ufshcd_abort_one racing issue scsi: ufs: core: Fix ufshcd_clear_cmd racing issue
2024-07-10	clk: sophgo: Avoid -Wsometimes-uninitialized in sg2042_clk_pll_set_rate()	Nathan Chancellor
	Clang warns (or errors with CONFIG_WERROR=y): drivers/clk/sophgo/clk-sg2042-pll.c:396:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] 396 \| if (sg2042_pll_enable(pll, 0)) { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ drivers/clk/sophgo/clk-sg2042-pll.c:418:9: note: uninitialized use occurs here 418 \| return ret; \| ^~~ drivers/clk/sophgo/clk-sg2042-pll.c:396:2: note: remove the 'if' if its condition is always false 396 \| if (sg2042_pll_enable(pll, 0)) { \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 397 \| pr_warn("Can't disable pll(%s), status error\n", pll->hw.init->name); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 398 \| goto out; \| ~~~~~~~~~ 399 \| } \| ~ drivers/clk/sophgo/clk-sg2042-pll.c:393:9: note: initialize the variable 'ret' to silence this warning 393 \| int ret; \| ^ \| = 0 1 error generated. sg2042_pll_enable() only ever returns zero, so this situation cannot happen, but clang does not perform interprocedural analysis, so it cannot know this to avoid the warning. Make it clearer to the compiler by making sg2042_pll_enable() void and eliminate the error handling in sg2042_clk_pll_set_rate(), which clears up the warning, as ret will always be initialized. Fixes: 48cf7e01386e ("clk: sophgo: Add SG2042 clock driver") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240710-clk-sg2042-fix-sometimes-uninitialized-pll_set_rate-v1-1-538fa82dd539@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-10	clk/sophgo: Using BUG() instead of unreachable() in mmux_get_parent_id()	Li Qiang
	In general it's a good idea to avoid using bare unreachable() because it introduces undefined behavior in compiled code. but it caused a compilation warning, Using BUG() instead of unreachable() to resolve compilation warnings. Fixes the following warnings: drivers/clk/sophgo/clk-cv18xx-ip.o: warning: objtool: mmux_round_rate() falls through to next function bypass_div_round_rate() Fixes: 80fd61ec46124 ("clk: sophgo: Add clock support for CV1800 SoC") Signed-off-by: Li Qiang <liqiang01@kylinos.cn> Link: https://lore.kernel.org/r/c8e66d51f880127549e2a3e623be6787f62b310d.1720506143.git.liqiang01@kylinos.cn Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-10	i2c: rcar: clear NO_RXDMA flag after resetting	Wolfram Sang
	We should allow RXDMA only if the reset was really successful, so clear the flag after the reset call. Fixes: 0e864b552b23 ("i2c: rcar: reset controller is mandatory for Gen3+") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-07-10	smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu()	Zqiang
	For CONFIG_DEBUG_OBJECTS_WORK=y kernels sscs.work defined by INIT_WORK_ONSTACK() is initialized by debug_object_init_on_stack() for the debug check in __init_work() to work correctly. But this lacks the counterpart to remove the tracked object from debug objects again, which will cause a debug object warning once the stack is freed. Add the missing destroy_work_on_stack() invocation to cure that. [ tglx: Massaged changelog ] Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240704065213.13559-1-qiang.zhang1211@gmail.com
2024-07-10	riscv: Improve sbi_ecall() code generation by reordering arguments	Alexandre Ghiti
	The sbi_ecall() function arguments are not in the same order as the ecall arguments, so we end up re-ordering the registers before the ecall which is useless and costly. So simply reorder the arguments in the same way as expected by ecall. Instead of reordering directly the arguments of sbi_ecall(), use a proxy macro since the current ordering is more natural. Before: Dump of assembler code for function sbi_ecall: 0xffffffff800085e0 <+0>: add sp,sp,-32 0xffffffff800085e2 <+2>: sd s0,24(sp) 0xffffffff800085e4 <+4>: mv t1,a0 0xffffffff800085e6 <+6>: add s0,sp,32 0xffffffff800085e8 <+8>: mv t3,a1 0xffffffff800085ea <+10>: mv a0,a2 0xffffffff800085ec <+12>: mv a1,a3 0xffffffff800085ee <+14>: mv a2,a4 0xffffffff800085f0 <+16>: mv a3,a5 0xffffffff800085f2 <+18>: mv a4,a6 0xffffffff800085f4 <+20>: mv a5,a7 0xffffffff800085f6 <+22>: mv a6,t3 0xffffffff800085f8 <+24>: mv a7,t1 0xffffffff800085fa <+26>: ecall 0xffffffff800085fe <+30>: ld s0,24(sp) 0xffffffff80008600 <+32>: add sp,sp,32 0xffffffff80008602 <+34>: ret After: Dump of assembler code for function __sbi_ecall: 0xffffffff8000b6b2 <+0>: add sp,sp,-32 0xffffffff8000b6b4 <+2>: sd s0,24(sp) 0xffffffff8000b6b6 <+4>: add s0,sp,32 0xffffffff8000b6b8 <+6>: ecall 0xffffffff8000b6bc <+10>: ld s0,24(sp) 0xffffffff8000b6be <+12>: add sp,sp,32 0xffffffff8000b6c0 <+14>: ret Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20240322112629.68170-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10	riscv: Add tracepoints for SBI calls and returns	Samuel Holland
	These are useful for measuring the latency of SBI calls. The SBI HSM extension is excluded because those functions are called from contexts such as cpuidle where instrumentation is not allowed. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240321230131.1838105-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10	Merge tag 'sunxi-dt-for-6.11-2' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/dt Allwinner SoC device tree changes for 6.11 part 2 One additional peripheral enabled for the H616. - H616 crypto engine added * tag 'sunxi-dt-for-6.11-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: arm64: dts: allwinner: h616: add crypto engine node Link: https://lore.kernel.org/r/Zo7O73Afx7lZcBRi@wens.tw Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-10	Merge tag 'sunxi-drivers-for-6.11-2' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/drivers Allwinner SoC driver changes for 6.11 part 2 One additional minor cleanup - Const-ify \|struct regmap_config\| in SRAM driver - Const-ify \|struct regmap_bus\| in Allwinner RSB bus driver * tag 'sunxi-drivers-for-6.11-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: bus: sunxi-rsb: Constify struct regmap_bus soc: sunxi: sram: Constify struct regmap_config Link: https://lore.kernel.org/r/Zo7T4YsfamN0PbYK@wens.tw Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-10	riscv: Optimize crc32 with Zbc extension	Xiao Wang
	As suggested by the B-ext spec, the Zbc (carry-less multiplication) instructions can be used to accelerate CRC calculations. Currently, the crc32 is the most widely used crc function inside kernel, so this patch focuses on the optimization of just the crc32 APIs. Compared with the current table-lookup based optimization, Zbc based optimization can also achieve large stride during CRC calculation loop, meantime, it avoids the memory access latency of the table-lookup based implementation and it reduces memory footprint. If Zbc feature is not supported in a runtime environment, then the table-lookup based implementation would serve as fallback via alternative mechanism. By inspecting the vmlinux built by gcc v12.2.0 with default optimization level (-O2), we can see below instruction count change for each 8-byte stride in the CRC32 loop: rv64: crc32_be (54->31), crc32_le (54->13), __crc32c_le (54->13) rv32: crc32_be (50->32), crc32_le (50->16), __crc32c_le (50->16) The compile target CPU is little endian, extra effort is needed for byte swapping for the crc32_be API, thus, the instruction count change is not as significant as that in the *_le cases. This patch is tested on QEMU VM with the kernel CRC32 selftest for both rv64 and rv32. Running the CRC32 selftest on a real hardware (SpacemiT K1) with Zbc extension shows 65% and 125% performance improvement respectively on crc32_test() and crc32c_test(). Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240621054707.1847548-1-xiao.w.wang@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10	drm/amdgpu: reject gang submit on reserved VMIDs	Christian König
	A gang submit won't work if the VMID is reserved and we can't flush out VM changes from multiple engines at the same time. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 320debca1ba3a81c87247eac84eff976ead09ee0)
2024-07-10	clk: mxs: Use clamp() in clk_ref_round_rate() and clk_ref_set_rate()	Thorsten Blum
	Use clamp() instead of duplicating its implementation. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Link: https://lore.kernel.org/r/20240710143309.706135-2-thorsten.blum@toblux.com Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-10	clk: sunxi-ng r40: Constify struct regmap_config	Javier Carrasco
	`sun8i_r40_ccu_regmap_config` is not modified and can be declared as const to move its data to a read-only section. Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Link: https://lore.kernel.org/r/20240703-clk-const-regmap-v1-9-7d15a0671d6f@gmail.com Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-10	Merge branch 'BPF selftests misc fixes'	Martin KaFai Lau
	Geliang Tang says: ==================== v2: - only check the first "link" (link_nl) in test_mixed_links(). - Drop patch 2 in v1. This patchset fixes a segfault and a bpf object leak in test_progs. It is a resend patch 1 out of "skip ENOTSUPP BPF selftests" set as Eduard suggested. Together with another fix for xdp_adjust_tail. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-07-10	selftests/bpf: Close obj in error path in xdp_adjust_tail	Geliang Tang
	If bpf_object__load() fails in test_xdp_adjust_frags_tail_grow(), "obj" opened before this should be closed. So use "goto out" to close it instead of using "return" here. Fixes: 110221081aac ("bpf: selftests: update xdp_adjust_tail selftest to include xdp frags") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/f282a1ed2d0e3fb38cceefec8e81cabb69cab260.1720615848.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-07-10	selftests/bpf: Null checks for links in bpf_tcp_ca	Geliang Tang
	Run bpf_tcp_ca selftests (./test_progs -t bpf_tcp_ca) on a Loongarch platform, some "Segmentation fault" errors occur: ''' test_dctcp:PASS:bpf_dctcp__open_and_load 0 nsec test_dctcp:FAIL:bpf_map__attach_struct_ops unexpected error: -524 #29/1 bpf_tcp_ca/dctcp:FAIL test_cubic:PASS:bpf_cubic__open_and_load 0 nsec test_cubic:FAIL:bpf_map__attach_struct_ops unexpected error: -524 #29/2 bpf_tcp_ca/cubic:FAIL test_dctcp_fallback:PASS:dctcp_skel 0 nsec test_dctcp_fallback:PASS:bpf_dctcp__load 0 nsec test_dctcp_fallback:FAIL:dctcp link unexpected error: -524 #29/4 bpf_tcp_ca/dctcp_fallback:FAIL test_write_sk_pacing:PASS:open_and_load 0 nsec test_write_sk_pacing:FAIL:attach_struct_ops unexpected error: -524 #29/6 bpf_tcp_ca/write_sk_pacing:FAIL test_update_ca:PASS:open 0 nsec test_update_ca:FAIL:attach_struct_ops unexpected error: -524 settcpca:FAIL:setsockopt unexpected setsockopt: \ actual -1 == expected -1 (network_helpers.c:99: errno: No such file or directory) \ Failed to call post_socket_cb start_test:FAIL:start_server_str unexpected start_server_str: \ actual -1 == expected -1 test_update_ca:FAIL:ca1_ca1_cnt unexpected ca1_ca1_cnt: \ actual 0 <= expected 0 #29/9 bpf_tcp_ca/update_ca:FAIL #29 bpf_tcp_ca:FAIL Caught signal #11! Stack trace: ./test_progs(crash_handler+0x28)[0x5555567ed91c] linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x7ffffee408b0] ./test_progs(bpf_link__update_map+0x80)[0x555556824a78] ./test_progs(+0x94d68)[0x5555564c4d68] ./test_progs(test_bpf_tcp_ca+0xe8)[0x5555564c6a88] ./test_progs(+0x3bde54)[0x5555567ede54] ./test_progs(main+0x61c)[0x5555567efd54] /usr/lib64/libc.so.6(+0x22208)[0x7ffff2aaa208] /usr/lib64/libc.so.6(__libc_start_main+0xac)[0x7ffff2aaa30c] ./test_progs(_start+0x48)[0x55555646bca8] Segmentation fault ''' This is because BPF trampoline is not implemented on Loongarch yet, "link" returned by bpf_map__attach_struct_ops() is NULL. test_progs crashs when this NULL link passes to bpf_link__update_map(). This patch adds NULL checks for all links in bpf_tcp_ca to fix these errors. If "link" is NULL, goto the newly added label "out" to destroy the skel. v2: - use "goto out" instead of "return" as Eduard suggested. Fixes: 06da9f3bd641 ("selftests/bpf: Test switching TCP Congestion Control algorithms.") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/b4c841492bd4ed97964e4e61e92827ce51bf1dc9.1720615848.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-07-10	mm: zswap: fix zswap_never_enabled() for CONFIG_ZSWAP==N	Barry Song
	If CONFIG_ZSWAP is set to N, it means zswap cannot be enabled. zswap_never_enabled() should return true. The only effect of this issue is that with Barry's latest large folio swapin patches for zram ("mm: support mTHP swap-in for zRAM-like swapfile"), we will always fallback to order-0 swapin, even mistakenly when !CONFIG_ZSWAP. Basically this bug makes Barry's in progress patches not work at all. The API was created to inform the mm core that zswap has never been enabled, allowing the mm core to perform mTHP swap-in. This is a transitional solution until zswap supports mTHP. If zswap has been enabled, performing mTHP swap-in will result in corrupted data. You may find the answer in the mTHP swap-in series: https://lore.kernel.org/linux-mm/CAJD7tkZ4FQr6HZpduOdvmqgg_-whuZYE-Bz5O2t6yzw6Yg+v1A@mail.gmail.com/ Link: https://lkml.kernel.org/r/20240629232231.42394-1-21cnbao@gmail.com Fixes: 0300e17d67c3 ("mm: zswap: add zswap_never_enabled()") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Chris Li <chrisl@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10	mm/vmscan: drop checking if _deferred_list is empty before using TTU_SYNC	Barry Song
	The optimization of list_empty(&folio->_deferred_list) aimed to prevent increasing the PTL duration when a large folio is partially unmapped, for example, from subpage 0 to subpage (nr - 2). But Ryan's commit 5ed890ce5147 ("mm: vmscan: avoid split during shrink_folio_list()") actually splits this kind of large folios. This makes the "optimization" useless. Additionally, the list_empty() technically required a data_race() annotation. Link: https://lkml.kernel.org/r/20240629234155.53524-1-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10	mm/page_alloc: remove prefetchw() on freeing page to buddy system	Wei Yang
	The prefetchw() is introduced from an ancient patch[1]. The change log says: The basic idea is to free higher order pages instead of going through every single one. Also, some unnecessary atomic operations are done away with and replaced with non-atomic equivalents, and prefetching is done where it helps the most. For a more in-depth discusion of this patch, please see the linux-ia64 archives (topic is "free bootmem feedback patch"). So there are several changes improve the bootmem freeing, in which the most basic idea is freeing higher order pages. And as Matthew says, "Itanium CPUs of this era had no prefetchers." I did 10 round bootup tests before and after this change, the data doesn't prove prefetchw() help speeding up bootmem freeing. The sum of the 10 round bootmem freeing time after prefetchw() removal even 5.2% faster than before. [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/ Link: https://lkml.kernel.org/r/20240702020931.7061-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10	kernel/fork.c: put set_max_threads()/task_struct_whitelist() in __init section	Wei Yang
	The functions set_max_threads() and task_struct_whitelist() are only used by fork_init() during bootup. Let's add __init tag to them. Link: https://lkml.kernel.org/r/20240701013410.17260-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10	kernel/fork.c: get totalram_pages from memblock to calculate max_threads	Wei Yang
	Since we plan to move the accounting into __free_pages_core(), totalram_pages may not represent the total usable pages on system at this point when defer_init is enabled. Instead we can get the total usable pages from memblock directly. Link: https://lkml.kernel.org/r/20240701013410.17260-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10	mm: remove CONFIG_MEMCG_KMEM	Johannes Weiner
	CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab tracking is enabled. It has been default-enabled and equivalent to CONFIG_MEMCG for almost a decade. We've only grown more kernel memory accounting sites since, and there is no imaginable cgroup usecase going forward that wants to track user pages but not the multitude of user-drivable kernel allocations. Link: https://lkml.kernel.org/r/20240701153148.452230-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>