summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-10scsi: ufs: core: Initialize struct uic_command onceBart Van Assche
Instead of first zero-initializing struct uic_command and next initializing it memberwise, initialize all members at once. Reviewed-by: Daejun Park <daejun7.park@samsung.com> Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240708211716.2827751-3-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: ufs: core: Declare functions onceBart Van Assche
Several functions are declared in include/ufs/ufshcd.h and also in drivers/ufs/core/ufshcd-priv.h. Remove the duplicate declarations. Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240708211716.2827751-2-bvanassche@acm.org Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Keoseong Park <keosung.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10Merge branch 'ice-support-to-dump-phy-config-fec'Jakub Kicinski
Tony Nguyen says: ==================== ice: Support to dump PHY config, FEC Anil Samal says: Implementation to dump PHY configuration and FEC statistics to facilitate link level debugging of customer issues. Implementation has two parts a. Serdes equalization # ethtool -d eth0 Output: Offset Values ------ ------ 0x0000: 00 00 00 00 03 00 00 00 05 00 00 00 01 08 00 40 0x0010: 01 00 00 40 00 00 39 3c 01 00 00 00 00 00 00 00 0x0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 0x01f0: 01 00 00 00 ef be ad de 8f 00 00 00 00 00 00 00 0x0200: 00 00 00 00 ef be ad de 00 00 00 00 00 00 00 00 0x0210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0230: 00 00 00 00 00 00 00 00 00 00 00 00 fa ff 00 00 0x0240: 06 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 0x0250: 0f b0 0f b0 00 00 00 00 00 00 00 00 00 00 00 00 0x0260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02e0: 00 00 00 00 00 00 00 00 00 00 00 00 Current implementation appends 176 bytes i.e. 44 bytes * 4 serdes lane. For port with 2 serdes lane, first 88 bytes are valid values and remaining 88 bytes are filled with zero. Similarly for port with 1 serdes lane, first 44 bytes are valid and remaining 132 bytes are marked zero. Each set of serdes equalizer parameter (i.e. set of 44 bytes) follows below order a. rx_equalization_pre2 b. rx_equalization_pre1 c. rx_equalization_post1 d. rx_equalization_bflf e. rx_equalization_bfhf f. rx_equalization_drate g. tx_equalization_pre1 h. tx_equalization_pre3 i. tx_equalization_atten j. tx_equalization_post1 k. tx_equalization_pre2 Where each individual equalizer parameter is of 4 bytes. As ethtool prints values as individual bytes, for little endian machine these values will be in reverse byte order. b. FEC block counts # ethtool -I --show-fec eth0 Output: FEC parameters for eth0: Supported/Configured FEC encodings: Auto RS BaseR Active FEC encoding: RS Statistics: corrected_blocks: 0 uncorrectable_blocks: 0 This series do following: Patch 1 - Implementation to support user provided flag for side band queue command. Patch 2 - Currently driver does not have a way to derive serdes lane number, pcs quad , pcs port from port number. So we introduced a mechanism to derive above info. Ethtool interface extension to include FEC statistics counter. Patch 3 - Ethtool interface extension to include serdes equalizer output. v1: https://lore.kernel.org/netdev/20240702180710.2606969-1-anthony.l.nguyen@intel.com/ ==================== Link: https://patch.msgid.link/20240709202951.2103115-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10ice: Implement driver functionality to dump serdes equalizer valuesAnil Samal
To debug link issues in the field, serdes Tx/Rx equalizer values help to determine the health of serdes lane. Extend 'ethtool -d' option to dump serdes Tx/Rx equalizer. The following list of equalizer param is supported a. rx_equalization_pre2 b. rx_equalization_pre1 c. rx_equalization_post1 d. rx_equalization_bflf e. rx_equalization_bfhf f. rx_equalization_drate g. tx_equalization_pre1 h. tx_equalization_pre3 i. tx_equalization_atten j. tx_equalization_post1 k. tx_equalization_pre2 Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anil Samal <anil.samal@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240709202951.2103115-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10ice: Implement driver functionality to dump fec statisticsAnil Samal
To debug link issues in the field, it is paramount to dump fec corrected/uncorrected block counts from firmware. Firmware requires PCS quad number and PCS port number to read FEC statistics. Current driver implementation does not maintain above physical properties of a port. Add new driver API to derive physical properties of an input port.These properties include PCS quad number, PCS port number, serdes lane count, primary serdes lane number. Extend ethtool option '--show-fec' to support fec statistics. The IEEE standard mandates two sets of counters: - 30.5.1.1.17 aFECCorrectedBlocks - 30.5.1.1.18 aFECUncorrectableBlocks Standard defines above statistics per lane but current implementation supports total FEC statistics per port i.e. sum of all lane per port. Find sample output below FEC parameters for ens21f0np0: Supported/Configured FEC encodings: Auto RS BaseR Active FEC encoding: RS Statistics: corrected_blocks: 0 uncorrectable_blocks: 0 Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anil Samal <anil.samal@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240709202951.2103115-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10ice: Extend Sideband Queue command to support flagsAnil Samal
Current driver implementation for Sideband Queue supports a fixed flag (ICE_AQ_FLAG_RD). To retrieve FEC statistics from firmware, Sideband Queue command is used with a different flag. Extend API for Sideband Queue command to use 'flags' as input argument. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anil Samal <anil.samal@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240709202951.2103115-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10e1000e: fix force smbus during suspend flowVitaly Lifshits
Commit 861e8086029e ("e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue") resolved a PHY access loss during suspend on Meteor Lake consumer platforms, but it affected corporate systems incorrectly. A better fix, working for both consumer and corporate systems, was proposed in commit bfd546a552e1 ("e1000e: move force SMBUS near the end of enable_ulp function"). However, it introduced a regression on older devices, such as [8086:15B8], [8086:15F9], [8086:15BE]. This patch aims to fix the secondary regression, by limiting the scope of the changes to Meteor Lake platforms only. Fixes: bfd546a552e1 ("e1000e: move force SMBUS near the end of enable_ulp function") Reported-by: Todd Brandt <todd.e.brandt@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218940 Reported-by: Dieter Mummenschanz <dmummenschanz@web.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218936 Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240709203123.2103296-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10dt-bindings: net: convert enetc to yamlFrank Li
Convert enetc device binding file to yaml. Split to 3 yaml files, 'fsl,enetc.yaml', 'fsl,enetc-mdio.yaml', 'fsl,enetc-ierb.yaml'. Additional Changes: - Add pci<vendor id>,<production id> in compatible string. - Ref to common ethernet-controller.yaml and mdio.yaml. - Add Wei fang, Vladimir and Claudiu as maintainer. - Update ENETC description. - Remove fixed-link part. Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240709214841.570154-1-Frank.Li@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10tcp: avoid too many retransmit packetsEric Dumazet
If a TCP socket is using TCP_USER_TIMEOUT, and the other peer retracted its window to zero, tcp_retransmit_timer() can retransmit a packet every two jiffies (2 ms for HZ=1000), for about 4 minutes after TCP_USER_TIMEOUT has 'expired'. The fix is to make sure tcp_rtx_probe0_timed_out() takes icsk->icsk_user_timeout into account. Before blamed commit, the socket would not timeout after icsk->icsk_user_timeout, but would use standard exponential backoff for the retransmits. Also worth noting that before commit e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0"), the issue would last 2 minutes instead of 4. Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10dt-bindings: net: realtek,rtl82xx: Document RTL8211F LED supportMarek Vasut
The RTL8211F PHY does support LED configuration, document support for LEDs in the binding document. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240708211649.165793-1-marex@denx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-11ASoC: dt-bindings: convert qcom sound bindings toMark Brown
Merge series from Rayyan Ansari <rayyan.ansari@linaro.org>: These patches convert the remaining plain text bindings for Qualcomm sound drivers to dt schema, so device trees can be validated against them.
2024-07-11firmware: cs_dsp: Some small coding improvementsMark Brown
Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>: Commit series that makes some small improvements to code and the kernel log messages.
2024-07-10Merge branch 'fixes-for-bpf-timer-lockup-and-uaf'Alexei Starovoitov
Kumar Kartikeya Dwivedi says: ==================== Fixes for BPF timer lockup and UAF The following patches contain fixes for timer lockups and a use-after-free scenario. This set proposes to fix the following lockup situation for BPF timers. CPU 1 CPU 2 bpf_timer_cb bpf_timer_cb timer_cb1 timer_cb2 bpf_timer_cancel(timer_cb2) bpf_timer_cancel(timer_cb1) hrtimer_cancel hrtimer_cancel In this case, both callbacks will continue waiting for each other to finish synchronously, causing a lockup. The proposed fix adds support for tracking in-flight cancellations *begun by other timer callbacks* for a particular BPF timer. Whenever preparing to call hrtimer_cancel, a callback will increment the target timer's counter, then inspect its in-flight cancellations, and if non-zero, return -EDEADLK to avoid situations where the target timer's callback is waiting for its completion. This does mean that in cases where a callback is fired and cancelled, it will be unable to cancel any timers in that execution. This can be alleviated by maintaining the list of waiting callbacks in bpf_hrtimer and searching through it to avoid interdependencies, but this may introduce additional delays in bpf_timer_cancel, in addition to requiring extra state at runtime which may need to be allocated or reused from bpf_hrtimer storage. Moreover, extra synchronization is needed to delete these elements from the list of waiting callbacks once hrtimer_cancel has finished. The second patch is for a deadlock situation similar to above in bpf_timer_cancel_and_free, but also a UAF scenario that can occur if timer is armed before entering it, if hrtimer_running check causes the hrtimer_cancel call to be skipped. As seen above, synchronous hrtimer_cancel would lead to deadlock (if same callback tries to free its timer, or two timers free each other), therefore we queue work onto the global workqueue to ensure outstanding timers are cancelled before bpf_hrtimer state is freed. Further details are in the patches. ==================== Link: https://lore.kernel.org/r/20240709185440.1104957-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10dt-bindings: clock: Document T-Head TH1520 AP_SUBSYS controllerDrew Fustini
Document bindings for the T-Head TH1520 AP sub-system clock controller. Link: https://openbeagle.org/beaglev-ahead/beaglev-ahead/-/blob/main/docs/TH1520%20System%20User%20Manual.pdf Co-developed-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Drew Fustini <dfustini@tenstorrent.com> Link: https://lore.kernel.org/r/20240623-th1520-clk-v2-1-ad8d6432d9fb@tenstorrent.com Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-10bpf: Defer work in bpf_timer_cancel_and_freeKumar Kartikeya Dwivedi
Currently, the same case as previous patch (two timer callbacks trying to cancel each other) can be invoked through bpf_map_update_elem as well, or more precisely, freeing map elements containing timers. Since this relies on hrtimer_cancel as well, it is prone to the same deadlock situation as the previous patch. It would be sufficient to use hrtimer_try_to_cancel to fix this problem, as the timer cannot be enqueued after async_cancel_and_free. Once async_cancel_and_free has been done, the timer must be reinitialized before it can be armed again. The callback running in parallel trying to arm the timer will fail, and freeing bpf_hrtimer without waiting is sufficient (given kfree_rcu), and bpf_timer_cb will return HRTIMER_NORESTART, preventing the timer from being rearmed again. However, there exists a UAF scenario where the callback arms the timer before entering this function, such that if cancellation fails (due to timer callback invoking this routine, or the target timer callback running concurrently). In such a case, if the timer expiration is significantly far in the future, the RCU grace period expiration happening before it will free the bpf_hrtimer state and along with it the struct hrtimer, that is enqueued. Hence, it is clear cancellation needs to occur after async_cancel_and_free, and yet it cannot be done inline due to deadlock issues. We thus modify bpf_timer_cancel_and_free to defer work to the global workqueue, adding a work_struct alongside rcu_head (both used at _different_ points of time, so can share space). Update existing code comments to reflect the new state of affairs. Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240709185440.1104957-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10bpf: Fail bpf_timer_cancel when callback is being cancelledKumar Kartikeya Dwivedi
Given a schedule: timer1 cb timer2 cb bpf_timer_cancel(timer2); bpf_timer_cancel(timer1); Both bpf_timer_cancel calls would wait for the other callback to finish executing, introducing a lockup. Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps track of all in-flight cancellation requests for a given BPF timer. Whenever cancelling a BPF timer, we must check if we have outstanding cancellation requests, and if so, we must fail the operation with an error (-EDEADLK) since cancellation is synchronous and waits for the callback to finish executing. This implies that we can enter a deadlock situation involving two or more timer callbacks executing in parallel and attempting to cancel one another. Note that we avoid incrementing the cancelling counter for the target timer (the one being cancelled) if bpf_timer_cancel is not invoked from a callback, to avoid spurious errors. The whole point of detecting cur->cancelling and returning -EDEADLK is to not enter a busy wait loop (which may or may not lead to a lockup). This does not apply in case the caller is in a non-callback context, the other side can continue to cancel as it sees fit without running into errors. Background on prior attempts: Earlier versions of this patch used a bool 'cancelling' bit and used the following pattern under timer->lock to publish cancellation status. lock(t->lock); t->cancelling = true; mb(); if (cur->cancelling) return -EDEADLK; unlock(t->lock); hrtimer_cancel(t->timer); t->cancelling = false; The store outside the critical section could overwrite a parallel requests t->cancelling assignment to true, to ensure the parallely executing callback observes its cancellation status. It would be necessary to clear this cancelling bit once hrtimer_cancel is done, but lack of serialization introduced races. Another option was explored where bpf_timer_start would clear the bit when (re)starting the timer under timer->lock. This would ensure serialized access to the cancelling bit, but may allow it to be cleared before in-flight hrtimer_cancel has finished executing, such that lockups can occur again. Thus, we choose an atomic counter to keep track of all outstanding cancellation requests and use it to prevent lockups in case callbacks attempt to cancel each other while executing in parallel. Reported-by: Dohyun Kim <dohyunkim@google.com> Reported-by: Neel Natu <neelnatu@google.com> Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240709185440.1104957-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10bpf: fix order of args in call to bpf_map_kvcallocMohammad Shehar Yaar Tausif
The original function call passed size of smap->bucket before the number of buckets which raises the error 'calloc-transposed-args' on compilation. Vlastimil Babka added: The order of parameters can be traced back all the way to 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") accross several refactorings, and that's why the commit is used as a Fixes: tag. In v6.10-rc1, a different commit 2c321f3f70bc ("mm: change inlined allocation helpers to account at the call site") however exposed the order of args in a way that gcc-14 has enough visibility to start warning about it, because (in !CONFIG_MEMCG case) bpf_map_kvcalloc is then a macro alias for kvcalloc instead of a static inline wrapper. To sum up the warning happens when the following conditions are all met: - gcc-14 is used (didn't see it with gcc-13) - commit 2c321f3f70bc is present - CONFIG_MEMCG is not enabled in .config - CONFIG_WERROR turns this from a compiler warning to error Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") Reviewed-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Mohammad Shehar Yaar Tausif <sheharyaar48@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20240710100521.15061-2-vbabka@suse.cz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10regmap: Implement regmap_multi_reg_read()Mark Brown
Merge series from Guenter Roeck <linux@roeck-us.net>: regmap_multi_reg_read() is similar to regmap_bilk_read() but reads from an array of non-sequential registers. It is helpful if multiple non- sequential registers need to be read in a single operation which would otherwise have to be mutex protected. The name of the new function was chosen to match the existing function regmap_multi_reg_write().
2024-07-10Merge tag 'mm-hotfixes-stable-2024-07-10-13-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes, 15 of which are cc:stable. No identifiable theme here - all are singleton patches, 19 are for MM" * tag 'mm-hotfixes-stable-2024-07-10-13-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio() filemap: replace pte_offset_map() with pte_offset_map_nolock() arch/xtensa: always_inline get_current() and current_thread_info() sched.h: always_inline alloc_tag_{save|restore} to fix modpost warnings MAINTAINERS: mailmap: update Lorenzo Stoakes's email address mm: fix crashes from deferred split racing folio migration lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat mm: gup: stop abusing try_grab_folio nilfs2: fix kernel bug on rename operation of broken directory mm/hugetlb_vmemmap: fix race with speculative PFN walkers cachestat: do not flush stats in recency check mm/shmem: disable PMD-sized page cache if needed mm/filemap: skip to create PMD-sized page cache if needed mm/readahead: limit page cache size in page_cache_ra_order() mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray mm/damon/core: merge regions aggressively when max_nr_regions is unmet Fix userfaultfd_api to return EINVAL as expected mm: vmalloc: check if a hash-index is in cpu_possible_mask mm: prevent derefencing NULL ptr in pfn_section_valid() ...
2024-07-10PCI: vmd: Create domain symlink before pci_bus_add_devices()Jiwei Sun
The vmd driver creates a "domain" symlink in sysfs for each VMD bridge. Previously this symlink was created after pci_bus_add_devices() added devices below the VMD bridge and emitted udev events to announce them to userspace. This led to a race between userspace consumers of the udev events and the kernel creation of the symlink. One such consumer is mdadm, which assembles block devices into a RAID array, and for devices below a VMD bridge, mdadm depends on the "domain" symlink. If mdadm loses the race, it may be unable to assemble a RAID array, which may cause a boot failure or other issues, with complaints like this: (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: Unable to get real path for '/sys/bus/pci/drivers/vmd/0000:c7:00.5/domain/device'' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: /dev/nvme1n1 is not attached to Intel(R) RAID controller.' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: No OROM/EFI properties for /dev/nvme1n1' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: no RAID superblock on /dev/nvme1n1.' (udev-worker)[2149]: nvme1n1: Process '/sbin/mdadm -I /dev/nvme1n1' failed with exit code 1. This symptom prevents the OS from booting successfully. After a NVMe disk is probed/added by the nvme driver, udevd invokes mdadm to detect if there is a mdraid associated with this NVMe disk, and mdadm determines if a NVMe device is connected to a particular VMD domain by checking the "domain" symlink. For example: Thread A Thread B Thread mdadm vmd_enable_domain pci_bus_add_devices __driver_probe_device ... work_on_cpu schedule_work_on : wakeup Thread B nvme_probe : wakeup scan_work to scan nvme disk and add nvme disk then wakeup udevd : udevd executes mdadm command flush_work main : wait for nvme_probe done ... __driver_probe_device find_driver_devices : probe next nvme device : 1) Detect domain symlink ... 2) Find domain symlink ... from vmd sysfs ... 3) Domain symlink not ... created yet; failed sysfs_create_link : create domain symlink Create the VMD "domain" symlink before invoking pci_bus_add_devices() to avoid this race. Suggested-by: Adrian Huang <ahuang12@lenovo.com> Link: https://lore.kernel.org/linux-pci/20240605124844.24293-1-sjiwei@163.com Signed-off-by: Jiwei Sun <sunjw10@lenovo.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com>
2024-07-10Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "One core change that moves a disk start message to a location where it will only be printed once instead of twice plus a couple of error handling race fixes in the ufs driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Do not repeat the starting disk message scsi: ufs: core: Fix ufshcd_abort_one racing issue scsi: ufs: core: Fix ufshcd_clear_cmd racing issue
2024-07-10clk: sophgo: Avoid -Wsometimes-uninitialized in sg2042_clk_pll_set_rate()Nathan Chancellor
Clang warns (or errors with CONFIG_WERROR=y): drivers/clk/sophgo/clk-sg2042-pll.c:396:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] 396 | if (sg2042_pll_enable(pll, 0)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~ drivers/clk/sophgo/clk-sg2042-pll.c:418:9: note: uninitialized use occurs here 418 | return ret; | ^~~ drivers/clk/sophgo/clk-sg2042-pll.c:396:2: note: remove the 'if' if its condition is always false 396 | if (sg2042_pll_enable(pll, 0)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 397 | pr_warn("Can't disable pll(%s), status error\n", pll->hw.init->name); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 398 | goto out; | ~~~~~~~~~ 399 | } | ~ drivers/clk/sophgo/clk-sg2042-pll.c:393:9: note: initialize the variable 'ret' to silence this warning 393 | int ret; | ^ | = 0 1 error generated. sg2042_pll_enable() only ever returns zero, so this situation cannot happen, but clang does not perform interprocedural analysis, so it cannot know this to avoid the warning. Make it clearer to the compiler by making sg2042_pll_enable() void and eliminate the error handling in sg2042_clk_pll_set_rate(), which clears up the warning, as ret will always be initialized. Fixes: 48cf7e01386e ("clk: sophgo: Add SG2042 clock driver") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240710-clk-sg2042-fix-sometimes-uninitialized-pll_set_rate-v1-1-538fa82dd539@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-10clk/sophgo: Using BUG() instead of unreachable() in mmux_get_parent_id()Li Qiang
In general it's a good idea to avoid using bare unreachable() because it introduces undefined behavior in compiled code. but it caused a compilation warning, Using BUG() instead of unreachable() to resolve compilation warnings. Fixes the following warnings: drivers/clk/sophgo/clk-cv18xx-ip.o: warning: objtool: mmux_round_rate() falls through to next function bypass_div_round_rate() Fixes: 80fd61ec46124 ("clk: sophgo: Add clock support for CV1800 SoC") Signed-off-by: Li Qiang <liqiang01@kylinos.cn> Link: https://lore.kernel.org/r/c8e66d51f880127549e2a3e623be6787f62b310d.1720506143.git.liqiang01@kylinos.cn Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-10i2c: rcar: clear NO_RXDMA flag after resettingWolfram Sang
We should allow RXDMA only if the reset was really successful, so clear the flag after the reset call. Fixes: 0e864b552b23 ("i2c: rcar: reset controller is mandatory for Gen3+") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-07-10smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu()Zqiang
For CONFIG_DEBUG_OBJECTS_WORK=y kernels sscs.work defined by INIT_WORK_ONSTACK() is initialized by debug_object_init_on_stack() for the debug check in __init_work() to work correctly. But this lacks the counterpart to remove the tracked object from debug objects again, which will cause a debug object warning once the stack is freed. Add the missing destroy_work_on_stack() invocation to cure that. [ tglx: Massaged changelog ] Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240704065213.13559-1-qiang.zhang1211@gmail.com
2024-07-10riscv: Improve sbi_ecall() code generation by reordering argumentsAlexandre Ghiti
The sbi_ecall() function arguments are not in the same order as the ecall arguments, so we end up re-ordering the registers before the ecall which is useless and costly. So simply reorder the arguments in the same way as expected by ecall. Instead of reordering directly the arguments of sbi_ecall(), use a proxy macro since the current ordering is more natural. Before: Dump of assembler code for function sbi_ecall: 0xffffffff800085e0 <+0>: add sp,sp,-32 0xffffffff800085e2 <+2>: sd s0,24(sp) 0xffffffff800085e4 <+4>: mv t1,a0 0xffffffff800085e6 <+6>: add s0,sp,32 0xffffffff800085e8 <+8>: mv t3,a1 0xffffffff800085ea <+10>: mv a0,a2 0xffffffff800085ec <+12>: mv a1,a3 0xffffffff800085ee <+14>: mv a2,a4 0xffffffff800085f0 <+16>: mv a3,a5 0xffffffff800085f2 <+18>: mv a4,a6 0xffffffff800085f4 <+20>: mv a5,a7 0xffffffff800085f6 <+22>: mv a6,t3 0xffffffff800085f8 <+24>: mv a7,t1 0xffffffff800085fa <+26>: ecall 0xffffffff800085fe <+30>: ld s0,24(sp) 0xffffffff80008600 <+32>: add sp,sp,32 0xffffffff80008602 <+34>: ret After: Dump of assembler code for function __sbi_ecall: 0xffffffff8000b6b2 <+0>: add sp,sp,-32 0xffffffff8000b6b4 <+2>: sd s0,24(sp) 0xffffffff8000b6b6 <+4>: add s0,sp,32 0xffffffff8000b6b8 <+6>: ecall 0xffffffff8000b6bc <+10>: ld s0,24(sp) 0xffffffff8000b6be <+12>: add sp,sp,32 0xffffffff8000b6c0 <+14>: ret Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20240322112629.68170-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10riscv: Add tracepoints for SBI calls and returnsSamuel Holland
These are useful for measuring the latency of SBI calls. The SBI HSM extension is excluded because those functions are called from contexts such as cpuidle where instrumentation is not allowed. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240321230131.1838105-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10Merge tag 'sunxi-dt-for-6.11-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/dt Allwinner SoC device tree changes for 6.11 part 2 One additional peripheral enabled for the H616. - H616 crypto engine added * tag 'sunxi-dt-for-6.11-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: arm64: dts: allwinner: h616: add crypto engine node Link: https://lore.kernel.org/r/Zo7O73Afx7lZcBRi@wens.tw Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-10Merge tag 'sunxi-drivers-for-6.11-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/drivers Allwinner SoC driver changes for 6.11 part 2 One additional minor cleanup - Const-ify |struct regmap_config| in SRAM driver - Const-ify |struct regmap_bus| in Allwinner RSB bus driver * tag 'sunxi-drivers-for-6.11-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: bus: sunxi-rsb: Constify struct regmap_bus soc: sunxi: sram: Constify struct regmap_config Link: https://lore.kernel.org/r/Zo7T4YsfamN0PbYK@wens.tw Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-10riscv: Optimize crc32 with Zbc extensionXiao Wang
As suggested by the B-ext spec, the Zbc (carry-less multiplication) instructions can be used to accelerate CRC calculations. Currently, the crc32 is the most widely used crc function inside kernel, so this patch focuses on the optimization of just the crc32 APIs. Compared with the current table-lookup based optimization, Zbc based optimization can also achieve large stride during CRC calculation loop, meantime, it avoids the memory access latency of the table-lookup based implementation and it reduces memory footprint. If Zbc feature is not supported in a runtime environment, then the table-lookup based implementation would serve as fallback via alternative mechanism. By inspecting the vmlinux built by gcc v12.2.0 with default optimization level (-O2), we can see below instruction count change for each 8-byte stride in the CRC32 loop: rv64: crc32_be (54->31), crc32_le (54->13), __crc32c_le (54->13) rv32: crc32_be (50->32), crc32_le (50->16), __crc32c_le (50->16) The compile target CPU is little endian, extra effort is needed for byte swapping for the crc32_be API, thus, the instruction count change is not as significant as that in the *_le cases. This patch is tested on QEMU VM with the kernel CRC32 selftest for both rv64 and rv32. Running the CRC32 selftest on a real hardware (SpacemiT K1) with Zbc extension shows 65% and 125% performance improvement respectively on crc32_test() and crc32c_test(). Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240621054707.1847548-1-xiao.w.wang@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10drm/amdgpu: reject gang submit on reserved VMIDsChristian König
A gang submit won't work if the VMID is reserved and we can't flush out VM changes from multiple engines at the same time. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 320debca1ba3a81c87247eac84eff976ead09ee0)
2024-07-10clk: mxs: Use clamp() in clk_ref_round_rate() and clk_ref_set_rate()Thorsten Blum
Use clamp() instead of duplicating its implementation. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Link: https://lore.kernel.org/r/20240710143309.706135-2-thorsten.blum@toblux.com Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-10clk: sunxi-ng r40: Constify struct regmap_configJavier Carrasco
`sun8i_r40_ccu_regmap_config` is not modified and can be declared as const to move its data to a read-only section. Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Link: https://lore.kernel.org/r/20240703-clk-const-regmap-v1-9-7d15a0671d6f@gmail.com Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-10Merge branch 'BPF selftests misc fixes'Martin KaFai Lau
Geliang Tang says: ==================== v2: - only check the first "link" (link_nl) in test_mixed_links(). - Drop patch 2 in v1. This patchset fixes a segfault and a bpf object leak in test_progs. It is a resend patch 1 out of "skip ENOTSUPP BPF selftests" set as Eduard suggested. Together with another fix for xdp_adjust_tail. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-07-10selftests/bpf: Close obj in error path in xdp_adjust_tailGeliang Tang
If bpf_object__load() fails in test_xdp_adjust_frags_tail_grow(), "obj" opened before this should be closed. So use "goto out" to close it instead of using "return" here. Fixes: 110221081aac ("bpf: selftests: update xdp_adjust_tail selftest to include xdp frags") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/f282a1ed2d0e3fb38cceefec8e81cabb69cab260.1720615848.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-07-10selftests/bpf: Null checks for links in bpf_tcp_caGeliang Tang
Run bpf_tcp_ca selftests (./test_progs -t bpf_tcp_ca) on a Loongarch platform, some "Segmentation fault" errors occur: ''' test_dctcp:PASS:bpf_dctcp__open_and_load 0 nsec test_dctcp:FAIL:bpf_map__attach_struct_ops unexpected error: -524 #29/1 bpf_tcp_ca/dctcp:FAIL test_cubic:PASS:bpf_cubic__open_and_load 0 nsec test_cubic:FAIL:bpf_map__attach_struct_ops unexpected error: -524 #29/2 bpf_tcp_ca/cubic:FAIL test_dctcp_fallback:PASS:dctcp_skel 0 nsec test_dctcp_fallback:PASS:bpf_dctcp__load 0 nsec test_dctcp_fallback:FAIL:dctcp link unexpected error: -524 #29/4 bpf_tcp_ca/dctcp_fallback:FAIL test_write_sk_pacing:PASS:open_and_load 0 nsec test_write_sk_pacing:FAIL:attach_struct_ops unexpected error: -524 #29/6 bpf_tcp_ca/write_sk_pacing:FAIL test_update_ca:PASS:open 0 nsec test_update_ca:FAIL:attach_struct_ops unexpected error: -524 settcpca:FAIL:setsockopt unexpected setsockopt: \ actual -1 == expected -1 (network_helpers.c:99: errno: No such file or directory) \ Failed to call post_socket_cb start_test:FAIL:start_server_str unexpected start_server_str: \ actual -1 == expected -1 test_update_ca:FAIL:ca1_ca1_cnt unexpected ca1_ca1_cnt: \ actual 0 <= expected 0 #29/9 bpf_tcp_ca/update_ca:FAIL #29 bpf_tcp_ca:FAIL Caught signal #11! Stack trace: ./test_progs(crash_handler+0x28)[0x5555567ed91c] linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x7ffffee408b0] ./test_progs(bpf_link__update_map+0x80)[0x555556824a78] ./test_progs(+0x94d68)[0x5555564c4d68] ./test_progs(test_bpf_tcp_ca+0xe8)[0x5555564c6a88] ./test_progs(+0x3bde54)[0x5555567ede54] ./test_progs(main+0x61c)[0x5555567efd54] /usr/lib64/libc.so.6(+0x22208)[0x7ffff2aaa208] /usr/lib64/libc.so.6(__libc_start_main+0xac)[0x7ffff2aaa30c] ./test_progs(_start+0x48)[0x55555646bca8] Segmentation fault ''' This is because BPF trampoline is not implemented on Loongarch yet, "link" returned by bpf_map__attach_struct_ops() is NULL. test_progs crashs when this NULL link passes to bpf_link__update_map(). This patch adds NULL checks for all links in bpf_tcp_ca to fix these errors. If "link" is NULL, goto the newly added label "out" to destroy the skel. v2: - use "goto out" instead of "return" as Eduard suggested. Fixes: 06da9f3bd641 ("selftests/bpf: Test switching TCP Congestion Control algorithms.") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/b4c841492bd4ed97964e4e61e92827ce51bf1dc9.1720615848.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-07-10mm: zswap: fix zswap_never_enabled() for CONFIG_ZSWAP==NBarry Song
If CONFIG_ZSWAP is set to N, it means zswap cannot be enabled. zswap_never_enabled() should return true. The only effect of this issue is that with Barry's latest large folio swapin patches for zram ("mm: support mTHP swap-in for zRAM-like swapfile"), we will always fallback to order-0 swapin, even mistakenly when !CONFIG_ZSWAP. Basically this bug makes Barry's in progress patches not work at all. The API was created to inform the mm core that zswap has never been enabled, allowing the mm core to perform mTHP swap-in. This is a transitional solution until zswap supports mTHP. If zswap has been enabled, performing mTHP swap-in will result in corrupted data. You may find the answer in the mTHP swap-in series: https://lore.kernel.org/linux-mm/CAJD7tkZ4FQr6HZpduOdvmqgg_-whuZYE-Bz5O2t6yzw6Yg+v1A@mail.gmail.com/ Link: https://lkml.kernel.org/r/20240629232231.42394-1-21cnbao@gmail.com Fixes: 0300e17d67c3 ("mm: zswap: add zswap_never_enabled()") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Chris Li <chrisl@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10mm/vmscan: drop checking if _deferred_list is empty before using TTU_SYNCBarry Song
The optimization of list_empty(&folio->_deferred_list) aimed to prevent increasing the PTL duration when a large folio is partially unmapped, for example, from subpage 0 to subpage (nr - 2). But Ryan's commit 5ed890ce5147 ("mm: vmscan: avoid split during shrink_folio_list()") actually splits this kind of large folios. This makes the "optimization" useless. Additionally, the list_empty() technically required a data_race() annotation. Link: https://lkml.kernel.org/r/20240629234155.53524-1-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10mm/page_alloc: remove prefetchw() on freeing page to buddy systemWei Yang
The prefetchw() is introduced from an ancient patch[1]. The change log says: The basic idea is to free higher order pages instead of going through every single one. Also, some unnecessary atomic operations are done away with and replaced with non-atomic equivalents, and prefetching is done where it helps the most. For a more in-depth discusion of this patch, please see the linux-ia64 archives (topic is "free bootmem feedback patch"). So there are several changes improve the bootmem freeing, in which the most basic idea is freeing higher order pages. And as Matthew says, "Itanium CPUs of this era had no prefetchers." I did 10 round bootup tests before and after this change, the data doesn't prove prefetchw() help speeding up bootmem freeing. The sum of the 10 round bootmem freeing time after prefetchw() removal even 5.2% faster than before. [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/ Link: https://lkml.kernel.org/r/20240702020931.7061-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10kernel/fork.c: put set_max_threads()/task_struct_whitelist() in __init sectionWei Yang
The functions set_max_threads() and task_struct_whitelist() are only used by fork_init() during bootup. Let's add __init tag to them. Link: https://lkml.kernel.org/r/20240701013410.17260-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10kernel/fork.c: get totalram_pages from memblock to calculate max_threadsWei Yang
Since we plan to move the accounting into __free_pages_core(), totalram_pages may not represent the total usable pages on system at this point when defer_init is enabled. Instead we can get the total usable pages from memblock directly. Link: https://lkml.kernel.org/r/20240701013410.17260-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10mm: remove CONFIG_MEMCG_KMEMJohannes Weiner
CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab tracking is enabled. It has been default-enabled and equivalent to CONFIG_MEMCG for almost a decade. We've only grown more kernel memory accounting sites since, and there is no imaginable cgroup usecase going forward that wants to track user pages but not the multitude of user-drivable kernel allocations. Link: https://lkml.kernel.org/r/20240701153148.452230-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10mm: memcg: add cache line padding to mem_cgroup_per_nodeRoman Gushchin
Memcg v1-specific fields serve a buffer function between read-mostly and update often parts of the mem_cgroup_per_node structure. If CONFIG_MEMCG_V1 is not set and these fields are not present, an explicit cacheline padding is needed. Link: https://lkml.kernel.org/r/20240701185932.704807-2-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10mm: memcg: drop obsolete cache line padding in struct mem_cgroupRoman Gushchin
After the grouping of the cgroup v1-related fields and the corresponding reorganization of the struct mem_cgroup, the existing cache line padding doesn't make much sense anymore. Let's drop it for now and put back to new places, if necessary. Link: https://lkml.kernel.org/r/20240701185932.704807-1-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10Docs/mm/damon/index: add links to admin-guide docSeongJae Park
Readers of DAMON subsystem documents index would want to further learn how they can use DAMON from the user-space. Add the link to the admin guide. Link: https://lkml.kernel.org/r/20240701192706.51415-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10Docs/mm/damon/index: add links to designSeongJae Park
DAMON subsystem documents index page provides a short intro of DAMON core concepts. Add links to sections of the design document to let users easily browse to the details. Link: https://lkml.kernel.org/r/20240701192706.51415-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10Docs/mm/damon/design: add links to sections of DAMON sysfs interface usage docSeongJae Park
Readers of the design document would wonder how they can configure and use specific DAMON features. Add links to sections of DAMON sysfs interface usage document that provides the answers for easier browsing. Link: https://lkml.kernel.org/r/20240701192706.51415-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10Docs/mm/damon/design: remove 'Programmable Modules' section in favor of ↵SeongJae Park
'Modules' section 'Programmable Modules' section provides high level descriptions of the DAMON API-based kernel modules layer. But 'Modules' section, which is at the end of the document, provides every detail about the layer including that of 'Programmable Modules' section. Since the brief summary of the layers at the beginning of the document has a link to the 'Modules' section, browsing to the section is not that difficult. Remove 'Programmable Modules' section in favor of 'Modules' section and reducing duplicates. Link: https://lkml.kernel.org/r/20240701192706.51415-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10Docs/mm/damon/design: move 'Configurable Operations Set' section into ↵SeongJae Park
'Operations Set Layer' section 'Configurable Operations Set' section is for providing a description of the pluggability of the operations set layer. Just after that, 'Operations Set Layer' section, which is dedicated for the entire things of the layer, follows. The layout is odd, and some descriptions are duplicated. Move 'Configurable Operations Set' section into 'Operations Set Layer' and re-write some of the detailed descriptions. Link: https://lkml.kernel.org/r/20240701192706.51415-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-10Docs/mm/damon/design: add links from overall architecture to sections of detailsSeongJae Park
DAMON design document briefly explains the overall layers architecture first, and then provides detailed explanations of each layer with dedicated sections. Letting readers go directly to the detailed sections for specific layers could help easy browsing of the not-very-short document. Add links from the overall summary to the sections of details. Link: https://lkml.kernel.org/r/20240701192706.51415-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>