linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-09-24	ACPI: scan: Support multiple DMA windows with different offsets	Jianmin Lv
	In DT systems configurations, of_dma_get_range() returns struct bus_dma_region DMA regions; they are used to set-up devices DMA windows with different offset available for translation between DMA address and CPU address. In ACPI systems configuration, acpi_dma_get_range() does not return DMA regions yet and that precludes setting up the dev->dma_range_map pointer and therefore DMA regions with multiple offsets. Update acpi_dma_get_range() to return struct bus_dma_region DMA regions like of_dma_get_range() does. After updating acpi_dma_get_range(), acpi_arch_dma_setup() is changed for ARM64, where the original dma_addr and size are removed as these arguments are now redundant, and pass 0 and U64_MAX for dma_base and size of arch_setup_dma_ops; this is a simplification consistent with what other ACPI architectures also pass to iommu_setup_dma_ops(). Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-24	Merge tag 'tty-6.0-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small, and late, serial driver fixes for 6.0-rc7 to resolve some reported problems. Included in here are: - tegra icount accounting fixes, including a framework function that other drivers will be converted over to using in 6.1-rc1. - fsl_lpuart reset bugfix - 8250 omap 485 bugfix - sifive serial clock bugfix The last three patches have not shown up in linux-next due to them being added to my tree only 2 days ago, but they are tiny and self-contained and the developers say they resolve issues that they have with 6.0-rc. The other three have been in linux-next for a while with no reported issues" * tag 'tty-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: sifive: enable clocks for UART when probed serial: 8250: omap: Use serial8250_em485_supported serial: fsl_lpuart: Reset prior to registration serial: tegra-tcu: Use uart_xmit_advance(), fixes icount.tx accounting serial: tegra: Use uart_xmit_advance(), fixes icount.tx accounting serial: Create uart_xmit_advance()
2022-09-24	device property: Add const qualifier to device_get_match_data() parameter	Andy Shevchenko
	Add const qualifier to the device_get_match_data() parameter. Some of the future users may utilize this function without forcing the type. All the same, dev_fwnode() may be used with a const qualifier. Reported-by: kernel test robot <lkp@intel.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220922135410.49694-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-24	mtd: allow getting MTD device associated with a specific DT node	Rafał Miłecki
	MTD subsystem API allows interacting with MTD devices (e.g. reading, writing, handling bad blocks). So far a random driver could get MTD device only by its name (get_mtd_device_nm()). This change allows getting them also by a DT node. This API is required for drivers handling DT defined MTD partitions in a specific way (e.g. U-Boot (sub)partition with environment variables). Acked-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20220916122100.170016-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-24	Merge 1707c39ae309 ("Merge tag 'driver-core-6.0-rc7' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core") driver-core-next This merges the driver core changes in 6.0-rc7 into driver-core-next as they are needed here as well for testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-23	Merge branch 'for-6.0-fixes' into for-6.1	Tejun Heo
	for-6.0 has the following fix for cgroup_get_from_id(). 836ac87d ("cgroup: fix cgroup_get_from_id") which conflicts with the following two commits in for-6.1. 4534dee9 ("cgroup: cgroup: Honor caller's cgroup NS when resolving cgroup id") fa7e439c ("cgroup: Homogenize cgroup_get_from_id() return value") While the resolution is straightforward, the code ends up pretty ugly afterwards. Let's pull for-6.0-fixes into for-6.1 so that the code can be fixed up there. Signed-off-by: Tejun Heo <tj@kernel.org>
2022-09-23	ASoC: ssm2518: drop support for platform data	Dmitry Torokhov
	There are currently no users of this driver's platform data in the mainline kernel, so let's drop it. Newer devices should use DT, ACPI, or static software properties to describe the hardware. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Link: https://lore.kernel.org/r/20220920025804.1788667-1-dmitry.torokhov@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-09-23	Merge tag 'driver-core-6.0-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are two tiny driver core fixes for 6.0-rc7 that resolve some oft-reported problems. The first is a revert of the "fw_devlink.strict=1" default option that we keep trying to enable, but we keep finding platforms that this just breaks everything on. So again, we need it reverted and hopefully it can be worked on in future releases. The second is a sysfs file-size bugfix that resolves an issue that many people are starting to hit as the fix it is fixing also was backported to stable kernels. The util-linux developers are starting to get bugreports about sysfs files that contain no data because of this problem, and this fix which has been in linux-next in the bitfield tree for a long time, resolves it. I'm submitting it here as it needs to be merged for 6.0-final, not for 6.1-rc1. Both of these have been in linux-next with no reported issues, only reports were that these fixed problems" * tag 'driver-core-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES Revert "driver core: Set fw_devlink.strict=1 by default"
2022-09-23	Merge tag 'qcom-drivers-for-6.1' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/drivers Qualcomm driver updates for 6.1 The icc-bwmon driver is expected to support measuring LLCC/DDR bandwidth on SDM845 and SC7280. The LLCC driver is extended to provide per-platform register mappings to the LLCC EDAC driver. The QMI encoder/decoder is updated to allow the passed qmi_elem_info to be const. Support for SDM845 is added to the sleep stats driver. Power-domains for the SM6375 platform is added to RPMPD and the platform is added to socinfo, together with the PM6125 pmic id. A couple of of_node reference issues are corrected in the smem state and smsm drivers. The Qualcomm SCM driver binding is converted to YAML. * tag 'qcom-drivers-for-6.1' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (29 commits) soc: qcom: rpmpd: Add SM6375 support dt-bindings: power: rpmpd: Add SM6375 power domains firmware: qcom: scm: remove unused __qcom_scm_init declaration dt-bindings: power: qcom,rpmpd: drop non-working codeaurora.org emails soc: qcom: icc-bwmon: force clear counter/irq registers soc: qcom: icc-bwmon: add support for sc7280 LLCC BWMON dt-bindings: interconnect: qcom,msm8998-bwmon: Add support for sc7280 BWMONs soc: qcom: llcc: Pass LLCC version based register offsets to EDAC driver soc: qcom: llcc: Rename reg_offset structs to reflect LLCC version soc: qcom: qmi: use const for struct qmi_elem_info soc: qcom: icc-bwmon: remove redundant ret variable dt-bindings: soc: qcom: stats: Document SDM845 compatible soc: qcom: stats: Add SDM845 stats config and compatible dt-bindings: firmware: document Qualcomm SM6115 SCM soc: qcom: Make QCOM_RPMPD depend on OF dt-bindings: firmware: convert Qualcomm SCM binding to the yaml soc: qcom: socinfo: Add PM6125 ID soc: qcom: socinfo: Add an ID for SM6375 soc: qcom: smem_state: Add refcounting for the 'state->of_node' soc: qcom: smsm: Fix refcount leak bugs in qcom_smsm_probe() ... Link: https://lore.kernel.org/r/20220921155753.1316308-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-09-23	Merge tag 'v6.0-next-soc' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into arm/drivers pmic-wrapper: - add support for mt8188 SVS: - several driver cleanups power-domain: - several cleanups of the dt-bindings and driver mutex: - add support to mt6795 disp mutex - add support for mt8186 mdp3 mutex * tag 'v6.0-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux: soc: mediatek: Add mmsys func to adapt to dpi output for MT8186 soc: mediatek: mutex: Add support for MT6795 Helio X10 display mutex dt-bindings: soc: mediatek: Add display mutex support for MT6795 soc: mediatek: mutex: Add mt8186 mutex mod settings for mdp3 dt-bindings: soc: mediatek: Add mdp3 mutex support for mt8186 soc: mediatek: pm-domains: Simplify some error message soc: mediatek: mtk-svs: Explicitly include bitfield header soc: mediatek: mtk-svs: Use bitfield access macros where possible soc: mediatek: mtk-svs: Commonize t-calibration-data fuse array read dt-bindings: power: mediatek: Update maintainer list dt-bindings: power: mediatek: Support naming power controller node with unit address dt-bindings: power: mediatek: Refine multiple level power domain nodes soc: mediatek: mtk-svs: Use devm variant for dev_pm_opp_of_add_table() soc: mediatek: mtk-svs: Drop of_match_ptr() for of_match_table soc: mediatek: mtk-svs: Remove hardcoded irqflags soc: mediatek: mtk-svs: Switch to platform_get_irq() dt-bindings: soc: mediatek: pwrap: add compatible for mt8188 soc: mediatek: Let PMIC Wrapper and SCPSYS depend on OF Link: https://lore.kernel.org/r/498fe3e5-a237-121a-d500-fbb0994906cb@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-09-23	Merge tag 'sunxi-drivers-for-6.1-1' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/drivers sunxi SRAM driver changes: - minor code refactor - support for Allwinner D1 * tag 'sunxi-drivers-for-6.1-1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: soc: sunxi: sram: Add support for the D1 system control soc: sunxi: sram: Export the LDO control register soc: sunxi: sram: Save a pointer to the OF match data soc: sunxi: sram: Return void from the release function soc: sunxi: sram: Fix debugfs info for A64 SRAM C soc: sunxi: sram: Fix probe function ordering issues soc: sunxi: sram: Prevent the driver from being unbound soc: sunxi: sram: Actually claim SRAM regions Link: https://lore.kernel.org/r/YyeOthH4y8wy8A8R@kista.localdomain Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-09-23	x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes	James Morse
	resctrl_arch_rmid_read() returns a value in chunks, as read from the hardware. This needs scaling to bytes by mon_scale, as provided by the architecture code. Now that resctrl_arch_rmid_read() performs the overflow and corrections itself, it may as well return a value in bytes directly. This allows the accesses to the architecture specific 'hw' structure to be removed. Move the mon_scale conversion into resctrl_arch_rmid_read(). mbm_bw_count() is updated to calculate bandwidth from bytes. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-22-james.morse@arm.com
2022-09-23	x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data	James Morse
	resctrl_rmid_realloc_threshold can be set by user-space. The maximum value is specified by the architecture. Currently max_threshold_occ_write() reads the maximum value from boot_cpu_data.x86_cache_size, which is not portable to another architecture. Add resctrl_rmid_realloc_limit to describe the maximum size in bytes that user-space can set the threshold to. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-21-james.morse@arm.com
2022-09-23	x86/resctrl: Rename and change the units of resctrl_cqm_threshold	James Morse
	resctrl_cqm_threshold is stored in a hardware specific chunk size, but exposed to user-space as bytes. This means the filesystem parts of resctrl need to know how the hardware counts, to convert the user provided byte value to chunks. The interface between the architecture's resctrl code and the filesystem ought to treat everything as bytes. Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read() still returns its value in chunks, so this needs converting to bytes. As all the users have been touched, rename the variable to resctrl_rmid_realloc_threshold, which describes what the value is for. Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power of 2, so the existing code introduces a rounding error from resctrl's theoretical fraction of the cache usage. This behaviour is kept as it ensures the user visible value matches the value read from hardware when the rmid will be reallocated. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-20-james.morse@arm.com
2022-09-23	x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()	James Morse
	resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a hardware register. Currently the function returns the MBM values in chunks directly from hardware. To convert this to bytes, some correction and overflow calculations are needed. These depend on the resource and domain structures. Overflow detection requires the old chunks value. None of this is available to resctrl_arch_rmid_read(). MPAM requires the resource and domain structures to find the MMIO device that holds the registers. Pass the resource and domain to resctrl_arch_rmid_read(). This makes rmid_dirty() too big. Instead merge it with its only caller, and the name is kept as a local variable. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-17-james.morse@arm.com
2022-09-23	x86/resctrl: Abstract __rmid_read()	James Morse
	__rmid_read() selects the specified eventid and returns the counter value from the MSR. The error handling is architecture specific, and handled by the callers, rdtgroup_mondata_show() and __mon_event_count(). Error handling should be handled by architecture specific code, as a different architecture may have different requirements. MPAM's counters can report that they are 'not ready', requiring a second read after a short delay. This should be hidden from resctrl. Make __rmid_read() the architecture specific function for reading a counter. Rename it resctrl_arch_rmid_read() and move the error handling into it. A read from a counter that hardware supports but resctrl does not now returns -EINVAL instead of -EIO from the default case in __mon_event_count(). It isn't possible for user-space to see this change as resctrl doesn't expose counters it doesn't support. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-16-james.morse@arm.com
2022-09-23	net: phylink: Adjust advertisement based on rate matching	Sean Anderson
	This adds support for adjusting the advertisement for pause-based rate matching. This may result in a lossy link, since the final link settings are not adjusted. Asymmetric pause support is necessary. It would be possible for a MAC supporting only symmetric pause to use pause-based rate adaptation, but only if pause reception was enabled as well. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23	net: phylink: Adjust link settings based on rate matching	Sean Anderson
	If the phy is configured to use pause-based rate matching, ensure that the link is full duplex with pause frame reception enabled. As suggested, if pause-based rate matching is enabled by the phy, then pause reception is unconditionally enabled. The interface duplex is determined based on the rate matching type. When rate matching is enabled, so is the speed. We assume the maximum interface speed is used. This is only relevant for MLO_AN_PHY. For MLO_AN_INBAND, the MAC/PCS's view of the interface speed will be used. Although there are no RATE_ADAPT_CRS phys in-tree, it has been added for comparison (and the implementation is quite simple). Co-developed-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23	net: phy: Add support for rate matching	Sean Anderson
	This adds support for rate matching (also known as rate adaptation) to the phy subsystem. The general idea is that the phy interface runs at one speed, and the MAC throttles the rate at which it sends packets to the link speed. There's a good overview of several techniques for achieving this at [1]. This patch adds support for three: pause-frame based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and 2BASE-TL), and open-loop-based (such as in 10GBASE-W). This patch makes a few assumptions and a few non assumptions about the types of rate matching available. First, it assumes that different phys may use different forms of rate matching. Second, it assumes that phys can use rate matching for any of their supported link speeds (e.g. if a phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T). Third, it does not assume that all interface modes will use the same form of rate matching. Fourth, it does not assume that all phy devices will support rate matching (even if some do). Relaxing or strengthening these (non-)assumptions could result in a different API. For example, if all interface modes were assumed to use the same form of rate matching, then a bitmask of interface modes supportting rate matching would suffice. For some better visibility into the process, the current rate matching mode is exposed as part of the ethtool ksettings. For the moment, only read access is supported. I'm not sure what userspace might want to configure yet (disable it altogether, disable just one mode, specify the mode to use, etc.). For the moment, since only pause-based rate adaptation support is added in the next few commits, rate matching can be disabled altogether by adjusting the advertisement. 802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and "rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls this feature "rate adaptation". I chose "rate matching" because it is shorter, and because Russell doesn't think "adaptation" is correct in this context. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23	net: phylink: Generate caps and convert to linkmodes separately	Sean Anderson
	If we call phylink_caps_to_linkmodes directly from phylink_get_linkmodes, it is difficult to re-use this functionality in MAC drivers. This is because MAC drivers must then work with an ethtool linkmode bitmap, instead of with mac capabilities. Instead, let the caller of phylink_get_linkmodes do the conversion. To reflect this change, rename the function to phylink_get_capabilities. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23	net: phylink: Export phylink_caps_to_linkmodes	Sean Anderson
	This function is convenient for MAC drivers. They can use it to add or remove particular link modes based on capabilities (such as if half duplex is not supported for a particular interface mode). Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23	net: phylink: Document MAC_(A)SYM_PAUSE	Sean Anderson
	This documents the possible MLO_PAUSE_* settings which can result from different combinations of MAC_(A)SYM_PAUSE. Special note is paid to settings which can result from user configuration (MLO_PAUSE_AN). The autonegotiation results are more-or-less a direct consequence of IEEE 802.3 Table 28B-2. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23	x86/resctrl: Allow per-rmid arch private storage to be reset	James Morse
	To abstract the rmid counters into a helper that returns the number of bytes counted, architecture specific per-rmid state is needed. It needs to be possible to reset this hidden state, as the values may outlive the life of an rmid, or the mount time of the filesystem. mon_event_read() is called with first = true when an rmid is first allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid() and call it from __mon_event_count()'s rr->first check. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-15-james.morse@arm.com
2022-09-23	mm/slub: enable debugging memory wasting of kmalloc	Feng Tang
	kmalloc's API family is critical for mm, with one nature that it will round up the request size to a fixed one (mostly power of 2). Say when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so in worst case, there is around 50% memory space waste. The wastage is not a big issue for requests that get allocated/freed quickly, but may cause problems with objects that have longer life time. We've met a kernel boot OOM panic (v5.10), and from the dumped slab info: [ 26.062145] kmalloc-2k 814056KB 814056KB From debug we found there are huge number of 'struct iova_magazine', whose size is 1032 bytes (1024 + 8), so each allocation will waste 1016 bytes. Though the issue was solved by giving the right (bigger) size of RAM, it is still nice to optimize the size (either use a kmalloc friendly size or create a dedicated slab for it). And from lkml archive, there was another crash kernel OOM case [1] back in 2019, which seems to be related with the similar slab waste situation, as the log is similar: [ 4.332648] iommu: Adding device 0000:20:02.0 to group 16 [ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL\|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 ... [ 4.857565] kmalloc-2048 59164KB 59164KB The crash kernel only has 256M memory, and 59M is pretty big here. (Note: the related code has been changed and optimised in recent kernel [2], these logs are just picked to demo the problem, also a patch changing its size to 1024 bytes has been merged) So add an way to track each kmalloc's memory waste info, and leverage the existing SLUB debug framework (specifically SLUB_STORE_USER) to show its call stack of original allocation, so that user can evaluate the waste situation, identify some hot spots and optimize accordingly, for a better utilization of memory. The waste info is integrated into existing interface: '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of 'kmalloc-4k' after boot is: 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1 __kmem_cache_alloc_node+0x11f/0x4e0 __kmalloc_node+0x4e/0x140 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe] ixgbe_probe+0x165f/0x1d20 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 ... which means in 'kmalloc-4k' slab, there are 126 requests of 2240 bytes which got a 4KB space (wasting 1856 bytes each and 233856 bytes in total), from ixgbe_alloc_q_vector(). And when system starts some real workload like multiple docker instances, there could are more severe waste. [1]. https://lkml.org/lkml/2019/8/12/266 [2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/ [Thanks Hyeonggon for pointing out several bugs about sorting/format] [Thanks Vlastimil for suggesting way to reduce memory usage of orig_size and keep it only for kmalloc objects] Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: John Garry <john.garry@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-22	net/mlx5e: Support MACsec offload extended packet number (EPN)	Emeel Hakim
	MACsec EPN splits the packet number (PN) into two 32-bits fields, epn_lsb (32 least significant bits (LSBs) of PN) and epn_msb (32 most significant bits (MSBs) of PN). Epn_msb bits are managed by SW and for that HW is required to send an object change event of type EPN event notifying the SW to update the epn_msb in addition, once epn_msb is updated SW update HW with the new epn_msb value for HW to perform replay protection. To prevent HW from stopping while handling the event, SW manages another bit for HW called epn_overlap, HW uses the latter to get an indication regarding how to read the epn_msb value correctly while still receiving packets. Add epn event handling that updates the epn_overlap and epn_msb for every 2^31 packets according to the following logic: if epn_lsb crosses 2^31 (half sequence number wraparound) upon HW relevant event, SW updates the esn_overlap value to OLD (value = 1). When the epn_lsb crosses 2^32 (full sequence number wraparound) upon HW relevant event, SW updates the esn_overlap to NEW (value = 0) and increment the esn_msb. When using MACsec EPN a salt and short secure channel id (ssci) needs to be provided by the user, when offloading EPN need to pass this salt and ssci to the HW to be used in the initial vector (IV) calculations. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22	net/mlx5: Add ifc bits for MACsec extended packet number (EPN) and replay ↵	Emeel Hakim
	protection Add ifc bits related to advanced steering operations (ASO) and general object modify for macsec to use as part of offloading EPN and replay protection features. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22	net/mlx5: Fix fields name prefix in MACsec	Emeel Hakim
	Fix ifc fields name to be consistent with the device spec document. Fixes: 8385c51ff5bc ("net/mlx5: Introduce MACsec Connect-X offload hardware bits and structures") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	drivers/net/ethernet/freescale/fec.h 7b15515fc1ca ("Revert "fec: Restart PPS after link state change"") 40c79ce13b03 ("net: fec: add stop mode support for imx8 platform") https://lore.kernel.org/all/20220921105337.62b41047@canb.auug.org.au/ drivers/pinctrl/pinctrl-ocelot.c c297561bc98a ("pinctrl: ocelot: Fix interrupt controller") 181f604b33cd ("pinctrl: ocelot: add ability to be used in a non-mmio configuration") https://lore.kernel.org/all/20220921110032.7cd28114@canb.auug.org.au/ tools/testing/selftests/drivers/net/bonding/Makefile bbb774d921e2 ("net: Add tests for bonding and team address list management") 152e8ec77640 ("selftests/bonding: add a test for bonding lladdr target") https://lore.kernel.org/all/20220921110437.5b7dbd82@canb.auug.org.au/ drivers/net/can/usb/gs_usb.c 5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition") 45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support") https://lore.kernel.org/all/84f45a7d-92b6-4dc5-d7a1-072152fab6ff@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22	Merge tag 'soc-fixes-6.0-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Another set of fixes for fixes for the soc tree: - A fix for the interrupt number on at91/lan966 ethernet PHYs - A second round of fixes for NXP i.MX series, including a couple of build issues, and board specific DT corrections on TQMa8MPQL, imx8mp-venice-gw74xx and imx8mm-verdin for reliability and partially broken functionality - Several fixes for Rockchip SoCs, addressing a USB issue on BPI-R2-Pro, wakeup on Gru-Bob and reliability of high-speed SD cards, among other minor issues - A fix for a long-running naming mistake that prevented the moxart mmc driver from working at all - Multiple Arm SCMI firmware fixes for hardening some corner cases" * tag 'soc-fixes-6.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (30 commits) arm64: dts: imx8mp-venice-gw74xx: fix port/phy validation ARM: dts: lan966x: Fix the interrupt number for internal PHYs arm64: dts: imx8mp-venice-gw74xx: fix ksz9477 cpu port arm64: dts: imx8mp-venice-gw74xx: fix CAN STBY polarity dt-bindings: memory-controllers: fsl,imx8m-ddrc: drop Leonard Crestez arm64: dts: tqma8mqml: Include phy-imx8-pcie.h header arm64: defconfig: enable ARCH_NXP arm64: dts: imx8mp-tqma8mpql-mba8mpxl: add missing pinctrl for RTC alarm ARM: dts: fix Moxa SDIO 'compatible', remove 'sdhci' misnomer arm64: dts: imx8mm-verdin: extend pmic voltages arm64: dts: rockchip: Remove 'enable-active-low' from rk3566-quartz64-a arm64: dts: rockchip: Remove 'enable-active-low' from rk3399-puma arm64: dts: rockchip: fix property for usb2 phy supply on rk3568-evb1-v10 arm64: dts: rockchip: fix property for usb2 phy supply on rock-3a arm64: dts: imx8ulp: add #reset-cells for pcc arm64: dts: tqma8mpxl-ba8mpxl: Fix button GPIOs arm64: dts: imx8mn: remove GPU power domain reset arm64: dts: rockchip: Set RK3399-Gru PCLK_EDP to 24 MHz arm64: dts: imx8mm: Reverse CPLD_Dn GPIO label mapping on MX8Menlo arm64: dts: rockchip: fix upper usb port on BPI-R2-Pro ...
2022-09-22	x86/resctrl: Allow update_mba_bw() to update controls directly	James Morse
	update_mba_bw() calculates a new control value for the MBA resource based on the user provided mbps_val and the current measured bandwidth. Some control values need remapping by delay_bw_map(). It does this by calling wrmsrl() directly. This needs splitting up to be done by an architecture specific helper, so that the remainder can eventually be moved to /fs/. Add resctrl_arch_update_one() to apply one configuration value to the provided resource and domain. This avoids the staging and cross-calling that is only needed with changes made by user-space. delay_bw_map() moves to be part of the arch code, to maintain the 'percentage control' view of MBA resources in resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-12-james.morse@arm.com
2022-09-22	x86/resctrl: Create mba_sc configuration in the rdt_domain	James Morse
	To support resctrl's MBA software controller, the architecture must provide a second configuration array to hold the mbps_val[] from user-space. This complicates the interface between the architecture specific code and the filesystem portions of resctrl that will move to /fs/, to allow multiple architectures to support resctrl. Make the filesystem parts of resctrl create an array for the mba_sc values. The software controller can be changed to use this, allowing the architecture code to only consider the values configured in hardware. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-9-james.morse@arm.com
2022-09-22	serial: 8250: Toggle IER bits on only after irq has been set up	Ilpo Järvinen
	Invoking TIOCVHANGUP on 8250_mid port on Ice Lake-D and then reopening the port triggers these faults during serial8250_do_startup(): DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Write NO_PASID] Request device [00:1a.0] fault addr 0x0 [fault reason 0x05] PTE Write access is not set If the IRQ hasn't been set up yet, the UART will have zeroes in its MSI address/data registers. Disabling the IRQ at the interrupt controller won't stop the UART from performing a DMA write to the address programmed in its MSI address register (zero) when it wants to signal an interrupt. The UARTs (in Ice Lake-D) implement PCI 2.1 style MSI without masking capability, so there is no way to mask the interrupt at the source PCI function level, except disabling the MSI capability entirely, but that would cause it to fall back to INTx# assertion, and the PCI specification prohibits disabling the MSI capability as a way to mask a function's interrupt service request. The MSI address register is zeroed by the hangup as the irq is freed. The interrupt is signalled during serial8250_do_startup() performing a THRE test that temporarily toggles THRI in IER. The THRE test currently occurs before UART's irq (and MSI address) is properly set up. Refactor serial8250_do_startup() such that irq is set up before the THRE test. The current irq setup code is intermixed with the timer setup code. As THRE test must be performed prior to the timer setup, extract it into own function and call it only after the THRE test. The ->setup_timer() needs to be part of the struct uart_8250_ops in order to not create circular dependency between 8250 and 8250_base modules. Fixes: 40b36daad0ac ("[PATCH] 8250 UART backup timer") Reported-by: Lennert Buytenhek <buytenh@arista.com> Tested-by: Lennert Buytenhek <buytenh@arista.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220922070005.2965-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-22	serial: 8250: Switch UART port flags to using BIT_ULL	Maciej W. Rozycki
	Use BIT_ULL rather than encoding bits explicitly where applicable with UART port flags. This makes a (__force upf_t) cast redundant, but keep it for visual consistency with the flags defined in terms of userspace macros. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2209210007030.41633@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-22	serial: 8250: Let drivers request full 16550A feature probing	Maciej W. Rozycki
	A SERIAL_8250_16550A_VARIANTS configuration option has been recently defined that lets one request the 8250 driver not to probe for 16550A device features so as to reduce the driver's device startup time in virtual machines. Some actual hardware devices require these features to have been fully determined however for their driver to work correctly, so define a flag to let drivers request full 16550A feature probing on a device-by-device basis if required regardless of the SERIAL_8250_16550A_VARIANTS option setting chosen. Fixes: dc56ecb81a0a ("serial: 8250: Support disabling mdelay-filled probes of 16550A variants") Cc: stable@vger.kernel.org # v5.6+ Reported-by: Anders Blomdell <anders.blomdell@control.lth.se> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2209202357520.41633@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-22	tty: remove TTY_DRIVER_MAGIC	наб
	According to Greg, in the context of magic numbers as defined in magic-number.rst, "the tty layer should not need this and I'll gladly take patches" Acked-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Ref: https://lore.kernel.org/linux-doc/YyMlovoskUcHLEb7@kroah.com/ Link: https://lore.kernel.org/r/723478a270a3858f27843cbec621df4d5d44efcc.1663288066.git.nabijaczleweli@nabijaczleweli.xyz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-22	tty: remove TTY_MAGIC	наб
	According to Greg, in the context of magic numbers as defined in magic-number.rst, "the tty layer should not need this and I'll gladly take patches" Acked-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Ref: https://lore.kernel.org/linux-doc/YyMlovoskUcHLEb7@kroah.com/ Link: https://lore.kernel.org/r/476d024cd6b04160a5de381ea2b9856b60088cbd.1663288066.git.nabijaczleweli@nabijaczleweli.xyz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-22	x86/resctrl: Add domain offline callback for resctrl work	James Morse
	Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to free the memory. Move the monitor subdir removal and the cancelling of the mbm/limbo works into a new resctrl_offline_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-6-james.morse@arm.com
2022-09-22	x86/resctrl: Add domain online callback for resctrl work	James Morse
	Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to allocate the memory. Move domain_setup_mon_state(), the monitor subdir creation call and the mbm/limbo workers into a new resctrl_online_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-4-james.morse@arm.com
2022-09-22	net: ethernet: mtk_eth_wed: add axi bus support	Lorenzo Bianconi
	Other than pcie bus, introduce support for axi bus to mtk wed driver. Axi bus is used to connect mt7986-wmac soc chip available on mt7986 device. Tested-by: Daniel Golle <daniel@makrotopia.org> Co-developed-by: Bo Jiao <Bo.Jiao@mediatek.com> Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com> Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-22	net: ethernet: mtk_eth_wed: add wed support for mt7986 chipset	Lorenzo Bianconi
	Introduce Wireless Etherne Dispatcher support on transmission side for mt7986 chipset Tested-by: Daniel Golle <daniel@makrotopia.org> Co-developed-by: Bo Jiao <Bo.Jiao@mediatek.com> Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com> Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-22	x86/resctrl: Merge mon_capable and mon_enabled	James Morse
	mon_enabled and mon_capable are always set as a pair by rdt_get_mon_l3_config(). There is no point having two values. Merge them together. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-3-james.morse@arm.com
2022-09-22	x86/resctrl: Kill off alloc_enabled	James Morse
	rdt_resources_all[] used to have extra entries for L2CODE/L2DATA. These were hidden from resctrl by the alloc_enabled value. Now that the L2/L2CODE/L2DATA resources have been merged together, alloc_enabled doesn't mean anything, it always has the same value as alloc_capable which indicates allocation is supported by this resource. Remove alloc_enabled and its helpers. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-2-james.morse@arm.com
2022-09-22	counter: Realign counter_comp comment block to 80 characters	William Breathitt Gray
	The member documentation comment lines for struct counter_comp extend past the 80-characters column boundary due to extra identation at the start of each section. This patch realigns the comment block within the 80-characters boundary by removing these superfluous indents. Reviewed-by: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20220902120839.4260-1-william.gray@linaro.org/ Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Link: https://lore.kernel.org/r/8294b04153c33602e9c3dd21ac90c1e99bd0fdaf.1663844776.git.william.gray@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-22	drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES	Phil Auld
	As PAGE_SIZE is unsigned long, -1 > PAGE_SIZE when NR_CPUS <= 3. This leads to very large file sizes: topology$ ls -l total 0 -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 core_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 core_cpus_list -r--r--r-- 1 root root 4096 Sep 5 10:58 core_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 10:10 core_siblings -r--r--r-- 1 root root 4096 Sep 5 11:59 core_siblings_list -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 die_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 die_cpus_list -r--r--r-- 1 root root 4096 Sep 5 11:59 die_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 package_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 package_cpus_list -r--r--r-- 1 root root 4096 Sep 5 10:58 physical_package_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 10:10 thread_siblings -r--r--r-- 1 root root 4096 Sep 5 11:59 thread_siblings_list Adjust the inequality to catch the case when NR_CPUS is configured to a small value. Fixes: 7ee951acd31a ("drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist") Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Yury Norov <yury.norov@gmail.com> Cc: stable@vger.kernel.org Cc: feng xiangjun <fengxj325@gmail.com> Reported-by: feng xiangjun <fengxj325@gmail.com> Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20220906203542.1796629-1-pauld@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-21	fscrypt: work on block_devices instead of request_queues	Christoph Hellwig
	request_queues are a block layer implementation detail that should not leak into file systems. Change the fscrypt inline crypto code to retrieve block devices instead of request_queues from the file system. As part of that, clean up the interaction with multi-device file systems by returning both the number of devices and the actual device array in a single method call. Signed-off-by: Christoph Hellwig <hch@lst.de> [ebiggers: bug fixes and minor tweaks] Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220901193208.138056-4-ebiggers@kernel.org
2022-09-21	fscrypt: stop using keyrings subsystem for fscrypt_master_key	Eric Biggers
	The approach of fs/crypto/ internally managing the fscrypt_master_key structs as the payloads of "struct key" objects contained in a "struct key" keyring has outlived its usefulness. The original idea was to simplify the code by reusing code from the keyrings subsystem. However, several issues have arisen that can't easily be resolved: - When a master key struct is destroyed, blk_crypto_evict_key() must be called on any per-mode keys embedded in it. (This started being the case when inline encryption support was added.) Yet, the keyrings subsystem can arbitrarily delay the destruction of keys, even past the time the filesystem was unmounted. Therefore, currently there is no easy way to call blk_crypto_evict_key() when a master key is destroyed. Currently, this is worked around by holding an extra reference to the filesystem's request_queue(s). But it was overlooked that the request_queue reference is not guaranteed to pin the corresponding blk_crypto_profile too; for device-mapper devices that support inline crypto, it doesn't. This can cause a use-after-free. - When the last inode that was using an incompletely-removed master key is evicted, the master key removal is completed by removing the key struct from the keyring. Currently this is done via key_invalidate(). Yet, key_invalidate() takes the key semaphore. This can deadlock when called from the shrinker, since in fscrypt_ioctl_add_key(), memory is allocated with GFP_KERNEL under the same semaphore. - More generally, the fact that the keyrings subsystem can arbitrarily delay the destruction of keys (via garbage collection delay, or via random processes getting temporary key references) is undesirable, as it means we can't strictly guarantee that all secrets are ever wiped. - Doing the master key lookups via the keyrings subsystem results in the key_permission LSM hook being called. fscrypt doesn't want this, as all access control for encrypted files is designed to happen via the files themselves, like any other files. The workaround which SELinux users are using is to change their SELinux policy to grant key search access to all domains. This works, but it is an odd extra step that shouldn't really have to be done. The fix for all these issues is to change the implementation to what I should have done originally: don't use the keyrings subsystem to keep track of the filesystem's fscrypt_master_key structs. Instead, just store them in a regular kernel data structure, and rework the reference counting, locking, and lifetime accordingly. Retain support for RCU-mode key lookups by using a hash table. Replace fscrypt_sb_free() with fscrypt_sb_delete(), which releases the keys synchronously and runs a bit earlier during unmount, so that block devices are still available. A side effect of this patch is that neither the master keys themselves nor the filesystem keyrings will be listed in /proc/keys anymore. ("Master key users" and the master key users keyrings will still be listed.) However, this was mostly an implementation detail, and it was intended just for debugging purposes. I don't know of anyone using it. This patch does not change how "master key users" (->mk_users) works; that still uses the keyrings subsystem. That is still needed for key quotas, and changing that isn't necessary to solve the issues listed above. If we decide to change that too, it would be a separate patch. I've marked this as fixing the original commit that added the fscrypt keyring, but as noted above the most important issue that this patch fixes wasn't introduced until the addition of inline encryption support. Fixes: 22d94f493bfb ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl") Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220901193208.138056-2-ebiggers@kernel.org
2022-09-21	bpf: Prevent bpf program recursion for raw tracepoint probes	Jiri Olsa
	We got report from sysbot [1] about warnings that were caused by bpf program attached to contention_begin raw tracepoint triggering the same tracepoint by using bpf_trace_printk helper that takes trace_printk_lock lock. Call Trace: <TASK> ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90 bpf_trace_printk+0x2b/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 __unfreeze_partials+0x5b/0x160 ... The can be reproduced by attaching bpf program as raw tracepoint on contention_begin tracepoint. The bpf prog calls bpf_trace_printk helper. Then by running perf bench the spin lock code is forced to take slow path and call contention_begin tracepoint. Fixing this by skipping execution of the bpf program if it's already running, Using bpf prog 'active' field, which is being currently used by trampoline programs for the same reason. Moving bpf_prog_inc_misses_counter to syscall.c because trampoline.c is compiled in just for CONFIG_BPF_JIT option. Reviewed-by: Stanislav Fomichev <sdf@google.com> Reported-by: syzbot+2251879aa068ad9c960d@syzkaller.appspotmail.com [1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220916071914.7156-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21	bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncs	Roberto Sassu
	Add the bpf_lookup_user_key(), bpf_lookup_system_key() and bpf_key_put() kfuncs, to respectively search a key with a given key handle serial number and flags, obtain a key from a pre-determined ID defined in include/linux/verification.h, and cleanup. Introduce system_keyring_id_check() to validate the keyring ID parameter of bpf_lookup_system_key(). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20220920075951.929132-8-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21	KEYS: Move KEY_LOOKUP_ to include/linux/key.h and define KEY_LOOKUP_ALL	Roberto Sassu
	In preparation for the patch that introduces the bpf_lookup_user_key() eBPF kfunc, move KEY_LOOKUP_ definitions to include/linux/key.h, to be able to validate the kfunc parameters. Add them to enum key_lookup_flag, so that all the current ones and the ones defined in the future are automatically exported through BTF and available to eBPF programs. Also, add KEY_LOOKUP_ALL to the enum, with the logical OR of currently defined flags as value, to facilitate checking whether a variable contains only those flags. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/r/20220920075951.929132-7-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21	bpf: Export bpf_dynptr_get_size()	Roberto Sassu
	Export bpf_dynptr_get_size(), so that kernel code dealing with eBPF dynamic pointers can obtain the real size of data carried by this data structure. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220920075951.929132-6-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>