summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2023-02-04Merge tag 'rtc-6.2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC fixes from Alexandre Belloni: "Here are a few fixes for 6.2. The EFI one is the most important as it allows some RTCs to actually work. The other two are warnings that are worth fixing. - efi: make WAKEUP services optional - sunplus: fix format string warning" * tag 'rtc-6.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: sunplus: fix format string for printing resource dt-bindings: rtc: qcom-pm8xxx: allow 'wakeup-source' property rtc: efi: Enable SET/GET WAKEUP services as optional
2023-02-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM64: - Yet another fix for non-CPU accesses to the memory backing the VGICv3 subsystem - A set of fixes for the setlftest checking for the S1PTW behaviour after the fix that went in ealier in the cycle" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: aarch64: Test read-only PT memory regions KVM: selftests: aarch64: Fix check of dirty log PT write KVM: selftests: aarch64: Do not default to dirty PTE pages on all S1PTWs KVM: selftests: aarch64: Relax userfaultfd read vs. write checks KVM: arm64: Allow no running vcpu on saving vgic3 pending table KVM: arm64: Allow no running vcpu on restoring vgic3 LPI pending status KVM: arm64: Add helper vgic_write_guest_lock()
2023-02-04Merge branch '20230201041853.1934355-1-quic_bjorande@quicinc.com' into ↵Bjorn Andersson
drivers-for-6.3
2023-02-04soc: qcom: pmic_glink: Introduce base PMIC GLINK driverBjorn Andersson
The PMIC GLINK service runs on one of the co-processors of some modern Qualcomm platforms and implements USB-C and battery managements. It uses a message based protocol over GLINK for communication with the OS, hence the name. The driver implemented provides the rpmsg device for communication and uses auxiliary bus to spawn off individual devices in respective subsystem. The auxiliary devices are spawned off from a platform_device, so that the drm_bridge is available early, to allow the DisplayPort driver to probe even before the remoteproc has spun up. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Tested-by: Konrad Dybcio <konrad.dybcio@linaro.org> # SM8350 PDX215 Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-MTP & SM8450-HDK Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230201041853.1934355-3-quic_bjorande@quicinc.com
2023-02-04Merge tag 'kvmarm-fixes-6.2-3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.2, take #3 - Yet another fix for non-CPU accesses to the memory backing the VGICv3 subsystem - A set of fixes for the setlftest checking for the S1PTW behaviour after the fix that went in ealier in the cycle
2023-02-04mfd: intel_soc_pmic_chtwc: Add Lenovo Yoga Tab 3 X90F to intel_cht_wc_modelsHans de Goede
The drivers for various CHT Whiskey Cove PMIC child-devices need to know the model, since they have model specific behavior. The DMI match table for this is shared between the child-device-drivers inside the MFD driver. Add the Lenovo Yoga Tab 3 X90F, which is a previously unknown tablet model with a CHT Whiskey Cove PMIC, to the intel_cht_wc_models enum and to the DMI match table. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20230126153823.22146-2-hdegoede@redhat.com
2023-02-04net/mlx5: Add firmware support for MTUTC scaled_ppm frequency adjustmentsRahul Rameshbabu
When device is capable of handling scaled ppm values for adjusting frequency, conversion to ppb will not be done by the driver. Instead, the scaled ppm value will be passed directly to the device for the frequency adjustment operation. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04ALSA: fireface: add field for the number of messages copied to user spaceTakashi Sakamoto
Current structure includes no field to express the number of messages copied to user space, thus user space application needs to information out of the structure to parse the content of structure. This commit adds a field to express the number of messages copied to user space since It is more preferable to use self-contained structure. Kees Cook proposed an idea of annotation for bound of flexible arrays in his future improvement for flexible-length array in kernel. The additional field for message count is suitable to the idea as well. Reference: https://people.kernel.org/kees/bounded-flexible-arrays-in-c Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20230202133708.163936-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-02-04efi: Discover BTI support in runtime services regionsArd Biesheuvel
Add the generic plumbing to detect whether or not the runtime code regions were constructed with BTI/IBT landing pads by the firmware, permitting the OS to enable enforcement when mapping these regions into the OS's address space. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org>
2023-02-03bpf: fix typo in header for bpf_perf_prog_read_valueFlorian Lehner
Fix a simple typo in the documentation for bpf_perf_prog_read_value. Signed-off-by: Florian Lehner <dev@der-flo.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230203121439.25884-1-dev@der-flo.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-03raw: use net_hash_mix() in hash functionEric Dumazet
Some applications seem to rely on RAW sockets. If they use private netns, we can avoid piling all RAW sockets bound to a given protocol into a single bucket. Also place (struct raw_hashinfo).lock into its own cache line to limit false sharing. Alternative would be to have per-netns hashtables, but this seems too expensive for most netns where RAW sockets are not used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-03mm: extend max struct page size for kmsanArnd Bergmann
After x86 enabled support for KMSAN, it has become possible to have larger 'struct page' than was expected when commit 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") was merged: include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' switch (sizeof(struct page)) { Extend the maximum accordingly. Link: https://lkml.kernel.org/r/20230130130739.563628-1-arnd@kernel.org Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03efi/cper, cxl: Remove cxl_err.hDan Williams
While going to create include/linux/cxl.h for some cross-subsystem CXL definitions I noticed that include/linux/cxl_err.h was already present. That header has no reason to be global, and it duplicates the RAS Capability Structure definitions in drivers/cxl/cxl.h. A follow-on patch can consider unifying the CXL native error tracing with the CPER error printing. Also fixed up the spec reference as the latest released spec is v3.0. Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-02-03cpufreq: amd-pstate: implement amd pstate cpu online and offline callbackPerry Yuan
Adds online and offline driver callback support to allow cpu cores go offline and help to restore the previous working states when core goes back online later for EPP driver mode. Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Mario Limonciello <Mario.Limonciello@amd.com> Reviewed-by: Wyes Karny <wyes.karny@amd.com> Tested-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-03cpufreq: amd-pstate: implement Pstate EPP support for the AMD processorsPerry Yuan
Add EPP driver support for AMD SoCs which support a dedicated MSR for CPPC. EPP is used by the DPM controller to configure the frequency that a core operates at during short periods of activity. The SoC EPP targets are configured on a scale from 0 to 255 where 0 represents maximum performance and 255 represents maximum efficiency. The amd-pstate driver exports profile string names to userspace that are tied to specific EPP values. The balance_performance string (0x80) provides the best balance for efficiency versus power on most systems, but users can choose other strings to meet their needs as well. $ cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available_preferences default performance balance_performance balance_power power $ cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference balance_performance To enable the driver,it needs to add `amd_pstate=active` to kernel command line and kernel will load the active mode epp driver Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Mario Limonciello <Mario.Limonciello@amd.com> Reviewed-by: Wyes Karny <wyes.karny@amd.com> Tested-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-03cpufreq: amd-pstate: optimize driver working mode selection in ↵Wyes Karny
amd_pstate_param() The amd-pstate driver may support multiple working modes. Introduce a variable to keep track of which mode is currently enabled. Here we use cppc_state var to indicate which mode is enabled. This change will help to simplify the the amd_pstate_param() to choose which mode used for the following driver registration. Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-03ACPI: CPPC: Add AMD pstate energy performance preference cppc controlPerry Yuan
Add support for setting and querying EPP preferences to the generic CPPC driver. This enables downstream drivers such as amd-pstate to discover and use these values. Downstream drivers that want to use the new symbols cppc_get_epp_caps and cppc_set_epp_perf for querying and setting EPP preferences will need to call cppc_set_epp_perf to enable the EPP function firstly. Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Wyes Karny <wyes.karny@amd.com> Tested-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-03Merge branch 'vfio-no-iommu' into iommufd.git for-nextJason Gunthorpe
Shared branch with VFIO for the no-iommu support. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-03vfio: Support VFIO_NOIOMMU with iommufdJason Gunthorpe
Add a small amount of emulation to vfio_compat to accept the SET_IOMMU to VFIO_NOIOMMU_IOMMU and have vfio just ignore iommufd if it is working on a no-iommu enabled device. Move the enable_unsafe_noiommu_mode module out of container.c into vfio_main.c so that it is always available even if VFIO_CONTAINER=n. This passes Alex's mini-test: https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c Link: https://lore.kernel.org/r/0-v3-480cd64a16f7+1ad0-iommufd_noiommu_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-03Merge tag 'ceph-for-6.2-rc7' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fix from Ilya Dryomov: "A safeguard to prevent the kernel client from further damaging the filesystem after running into a case of an invalid snap trace. The root cause of this metadata corruption is still being investigated but it appears to be stemming from the MDS. As such, this is the best we can do for now" * tag 'ceph-for-6.2-rc7' of https://github.com/ceph/ceph-client: ceph: blocklist the kclient when receiving corrupted snap trace ceph: move mount state enum to super.h
2023-02-03Merge tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "25 hotfixes, mainly for MM. 13 are cc:stable" * tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits) mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() Kconfig.debug: fix the help description in SCHED_DEBUG mm/swapfile: add cond_resched() in get_swap_pages() mm: use stack_depot_early_init for kmemleak Squashfs: fix handling and sanity checking of xattr_ids count sh: define RUNTIME_DISCARD_EXIT highmem: round down the address passed to kunmap_flush_on_unmap() migrate: hugetlb: check for hugetlb shared PMD in node migration mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Revert "mm: kmemleak: alloc gray object for reserved region with direct map" freevxfs: Kconfig: fix spelling maple_tree: should get pivots boundary by type .mailmap: update e-mail address for Eugen Hristev mm, mremap: fix mremap() expanding for vma's with vm_ops->close() squashfs: harden sanity check in squashfs_read_xattr_id_table ia64: fix build error due to switch case label appearing next to declaration mm: multi-gen LRU: fix crash during cgroup migration Revert "mm: add nodes= arg to memory.reclaim" zsmalloc: fix a race with deferred_handles storing ...
2023-02-03efi: Drop minimum EFI version check at bootArd Biesheuvel
We currently pass a minimum major version to the generic EFI helper that checks the system table magic and version, and refuse to boot if the value is lower. The motivation for this check is unknown, and even the code that uses major version 2 as the minimum (ARM, arm64 and RISC-V) should make it past this check without problems, and boot to a point where we have access to a console or some other means to inform the user that the firmware's major revision number made us unhappy. (Revision 2.0 of the UEFI specification was released in January 2006, whereas ARM, arm64 and RISC-V support where added in 2009, 2013 and 2017, respectively, so checking for major version 2 or higher is completely arbitrary) So just drop the check. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-02-03ALSA: hda: Fix the control element identification for multiple codecsJaroslav Kysela
Some motherboards have multiple HDA codecs connected to the serial bus. The current code may create multiple mixer controls with the almost identical identification. The current code use id.device field from the control element structure to store the codec address to avoid such clashes for multiple codecs. Unfortunately, the user space do not handle this correctly. For mixer controls, only name and index are used for the identifiers. This patch fixes this problem to compose the index using the codec address as an offset in case, when the control already exists. It is really unlikely that one codec will create 10 similar controls. This patch adds new kernel module parameter 'ctl_dev_id' to allow select the old behaviour, too. The CONFIG_SND_HDA_CTL_DEV_ID Kconfig option sets the default value. BugLink: https://github.com/alsa-project/alsa-lib/issues/294 BugLink: https://github.com/alsa-project/alsa-lib/issues/205 Fixes: 54d174031576 ("[ALSA] hda-codec - Fix connection list parsing") Fixes: 1afe206ab699 ("ALSA: hda - Try to find an empty control index when it's occupied") Signed-off-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230202092013.4066998-1-perex@perex.cz Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-02-03block: add a bvec_set_virt helperChristoph Hellwig
A small wrapper around bvec_set_page for callers that have a virtual address. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230203150634.3199647-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03block: add a bvec_set_folio helperChristoph Hellwig
A smaller wrapper around bvec_set_page that takes a folio instead. There are only two potential users for this in the tree, but the number will grow in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230203150634.3199647-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03block: factor out a bvec_set_page helperChristoph Hellwig
Add a helper to initialize a bvec based of a page pointer. This will help removing various open code bvec initializations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230203150634.3199647-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03blk-cgroup: move the cgroup information to struct gendiskChristoph Hellwig
cgroup information only makes sense on a live gendisk that allows file system I/O (which includes the raw block device). So move over the cgroup related members. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03blk-cgroup: store a gendisk to throttle in struct task_structChristoph Hellwig
Switch from a request_queue pointer and reference to a gendisk once for the throttle information in struct task_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Link: https://lore.kernel.org/r/20230203150400.3199230-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against ↵Matt Evans
zero-extended 'old' value __generic_cmpxchg_local takes unsigned long old/new arguments which might end up being up-cast from smaller signed types (which will sign-extend). The loaded compare value must be compared against a truncated smaller type, so down-cast appropriately for each size. The issue is apparent on 64-bit machines with code, such as atomic_dec_unless_positive(), that sign-extends from int. 64-bit machines generally don't use the generic cmpxchg but development/early ports might make use of it, so make it correct. Signed-off-by: Matt Evans <mev@rivosinc.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-03Merge tag 'v6.2-next-soc' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into soc/drivers Introduce MediaTek regulator coupler driver to ensure that the SRAM voltage in par with the GPU voltage. This allows for a stable use of the GPU. mtk-mutex: - add support for MT8188 vdosys0 path - allow it to be build as module - add support for MT8195 vdosys1 path mmsys: - add MT8188 vdosys0 path - allow to be build as a module - add MT8195 vdosys1 path - add support for CMDQ - allow for up to 64 reset bits - add supprot for the MT8195 vppsys[0,1] pathes pm-domains: - keep power for the MT8186 ADSP on by default - add support for MT8188 - add support for buck isolation needed in specific pm-domains for MT8188 and MT8192 mtk-svs: - enable IRQ later to allow using kexec - several improvments on the code base - fix modalias pmic wrapper: - convert binding to yaml. As this is thightly coupled to the MT6357 PMIC, I took patches regarding it as well. * tag 'v6.2-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux: (41 commits) soc: mediatek: mtk-svs: add missing MODULE_DEVICE_TABLE soc: mediatek: mtk-devapc: Switch to devm_clk_get_enabled() soc: mtk-svs: mt8183: refactor o_slope calculation soc: mediatek: mtk-svs: delete superfluous platform data entries soc: mediatek: mtk-svs: move svs_platform_probe into probe soc: mediatek: mtk-svs: improve readability of platform_probe soc: mediatek: mtk-svs: clean up platform probing soc: mediatek: mtk-svs: keep svs alive if CONFIG_DEBUG_FS not supported soc: mediatek: mtk-svs: Use pm_runtime_resume_and_get() in svs_init01() soc: mediatek: mtk-svs: reset svs when svs_resume() fail soc: mediatek: mtk-svs: restore default voltages when svs_init02() fail soc: mediatek: mmsys: add support for MT8195 VPPSYS dt-bindings: arm: mediatek: mmsys: Add support for MT8195 VPPSYS soc: mediatek: Introduce mediatek-regulator-coupler driver soc: mediatek: mtk-svs: Enable the IRQ later soc: mediatek: add mtk-mutex support for mt8195 vdosys1 soc: mediatek: add mtk-mutex component - dp_intf1 soc: mediatek: mmsys: add reset control for MT8195 vdosys1 soc: mediatek: mmsys: add mmsys for support 64 reset bits soc: mediatek: add cmdq support of mtk-mmsys config API for mt8195 vdosys1 ... Link: https://lore.kernel.org/r/396d51fc-81f3-4a2b-d7a7-b966bfe3002a@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-03ASoC: amd: update ps platform acp header fileVijendar Mukunda
Rename Audio buffer and soundwire manager instance registers. Remove scratch registers as these registers can be accessed using ACP_SCRATCH_REG_0 register relative offset. Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Link: https://lore.kernel.org/r/20230201165626.3169041-1-Vijendar.Mukunda@amd.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-02-03livepatch,x86: Clear relocation targets on a module removalSong Liu
Josh reported a bug: When the object to be patched is a module, and that module is rmmod'ed and reloaded, it fails to load with: module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8) livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd' The livepatch module has a relocation which references a symbol in the _previous_ loading of nfsd. When apply_relocate_add() tries to replace the old relocation with a new one, it sees that the previous one is nonzero and it errors out. He also proposed three different solutions. We could remove the error check in apply_relocate_add() introduced by commit eda9cec4c9a1 ("x86/module: Detect and skip invalid relocations"). However the check is useful for detecting corrupted modules. We could also deny the patched modules to be removed. If it proved to be a major drawback for users, we could still implement a different approach. The solution would also complicate the existing code a lot. We thus decided to reverse the relocation patching (clear all relocation targets on x86_64). The solution is not universal and is too much arch-specific, but it may prove to be simpler in the end. Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Originally-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Song Liu <song@kernel.org> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230125185401.279042-2-song@kernel.org
2023-02-03iommu/vt-d: Support cpumask for IOMMU perfmonKan Liang
The perf subsystem assumes that all counters are by default per-CPU. So the user space tool reads a counter from each CPU. However, the IOMMU counters are system-wide and can be read from any CPU. Here we use a CPU mask to restrict counting to one CPU to handle the issue. (with CPU hotplug notifier to choose a different CPU if the chosen one is taken off-line). The CPU is exposed to /sys/bus/event_source/devices/dmar*/cpumask for the user space perf tool. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-6-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Support size of the register set in DRHDKan Liang
A new field, which indicates the size of the remapping hardware register set for this remapping unit, is introduced in the DMA-remapping hardware unit definition (DRHD) structure with the VT-d Spec 4.0. With this information, SW doesn't need to 'guess' the size of the register set anymore. Update the struct acpi_dmar_hardware_unit to reflect the field. Store the size of the register set in struct dmar_drhd_unit for each dmar device. The 'size' information is ResvZ for the old BIOS and platforms. Fall back to the old guessing method. There is nothing changed. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-2-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Remove include/linux/intel-svm.hLu Baolu
There's no need to have a public header for Intel SVA implementation. The device driver should interact with Intel SVA implementation via the IOMMU generic APIs. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230109014955.147068-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03netfilter: flowtable: cache info of last offloadVlad Buslov
Modify flow table offload to cache the last ct info status that was passed to the driver offload callbacks by extending enum nf_flow_flags with new "NF_FLOW_HW_ESTABLISHED" flag. Set the flag if ctinfo was 'established' during last act_ct meta actions fill call. This infrastructure change is necessary to optimize promoting of UDP connections from 'new' to 'established' in following patches in this series. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-03netfilter: flowtable: allow unidirectional rulesVlad Buslov
Modify flow table offload to support unidirectional connections by extending enum nf_flow_flags with new "NF_FLOW_HW_BIDIRECTIONAL" flag. Only offload reply direction when the flag is set. This infrastructure change is necessary to support offloading UDP NEW connections in original direction in following patches in series. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-03media: v4l2-core: Make the v4l2-core code enable/disable the privacy LED if ↵Hans de Goede
present Make v4l2_async_register_subdev_sensor() try to get a privacy LED associated with the sensor and extend the call_s_stream() wrapper to enable/disable the privacy LED if found. This makes the core handle privacy LED control, rather then having to duplicate this code in all the sensor drivers. Suggested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://lore.kernel.org/r/20230127203729.10205-2-hdegoede@redhat.com
2023-02-03Merge tag 'ib-leds-led_get-v6.3' into HEADHans de Goede
Immutable branch from LEDs due for the v6.3 merge window
2023-02-02kexec: introduce sysctl parameters kexec_load_limit_*Ricardo Ribalda
kexec allows replacing the current kernel with a different one. This is usually a source of concerns for sysadmins that want to harden a system. Linux already provides a way to disable loading new kexec kernel via kexec_load_disabled, but that control is very coard, it is all or nothing and does not make distinction between a panic kexec and a normal kexec. This patch introduces new sysctl parameters, with finer tuning to specify how many times a kexec kernel can be loaded. The sysadmin can set different limits for kexec panic and kexec reboot kernels. The value can be modified at runtime via sysctl, but only with a stricter value. With these new parameters on place, a system with loadpin and verity enabled, using the following kernel parameters: sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a good warranty that if initrd tries to load a panic kernel, a malitious user will have small chances to replace that kernel with a different one, even if they can trigger timeouts on the disk where the panic kernel lives. Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-3-6a8531a09b9a@chromium.org Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02kexec: factor out kexec_load_permittedRicardo Ribalda
Both syscalls (kexec and kexec_file) do the same check, let's factor it out. Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-2-6a8531a09b9a@chromium.org Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Guilherme G. Piccoli <gpiccoli@igalia.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02util_macros.h: add missing inclusionAndy Shevchenko
The header is the direct user of definitions from the math.h, include it. Link: https://lkml.kernel.org/r/20230103121937.32085-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02include/linux/percpu_counter.h: race in uniprocessor percpu_counter_add()Manfred Spraul
The percpu interface is supposed to be preempt and irq safe. But: The uniprocessor implementation of percpu_counter_add() is not irq safe: if an interrupt happens during the +=, then the result is undefined. Therefore: switch from preempt_disable() to local_irq_save(). This prevents interrupts from interrupting the +=, and as a side effect prevents preemption. Link: https://lkml.kernel.org/r/20221216150441.200533-2-manfred@colorfullife.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: "Sun, Jiebin" <jiebin.sun@intel.com> Cc: <1vier1@web.de> Cc: Alexander Sverdlin <alexander.sverdlin@siemens.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02docs: fault-injection: add requirements of error injectable functionsMasami Hiramatsu (Google)
Add a section about the requirements of the error injectable functions and the type of errors. Since this section must be read before using ALLOW_ERROR_INJECTION() macro, that section is referred from the comment of the macro too. Link: https://lkml.kernel.org/r/167081321427.387937.15475445689482551048.stgit@devnote3 Link: https://lore.kernel.org/all/20221211115218.2e6e289bb85f8cf53c11aa97@kernel.org/T/#u Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Chris Mason <clm@meta.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Florent Revest <revest@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02error-injection: remove EI_ETYPE_NONEMasami Hiramatsu (Google)
Patch series "error-injection: Clarify the requirements of error injectable functions". Patches for clarifying the requirement of error injectable functions and to remove the confusing EI_ETYPE_NONE. This patch (of 2): Since the EI_ETYPE_NONE is confusing type, replace it with appropriate errno. The EI_ETYPE_NONE has been introduced for a dummy (error) value, but it can mislead people that they can use ALLOW_ERROR_INJECTION(func, NONE). So remove it from the EI_ETYPE and use appropriate errno instead. [akpm@linux-foundation.org: include/linux/error-injection.h needs errno.h] Link: https://lkml.kernel.org/r/167081319306.387937.10079195394503045678.stgit@devnote3 Link: https://lkml.kernel.org/r/167081320421.387937.4259807348852421112.stgit@devnote3 Fixes: 663faf9f7bee ("error-injection: Add injectable error types") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Chris Mason <clm@meta.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Florent Revest <revest@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02fs: convert writepage_t callback to pass a folioMatthew Wilcox (Oracle)
Patch series "Convert writepage_t to use a folio". More folioisation. I split out the mpage work from everything else because it completely dominated the patch, but some implementations I just converted outright. This patch (of 2): We always write back an entire folio, but that's currently passed as the head page. Convert all filesystems that use write_cache_pages() to expect a folio instead of a page. Link: https://lkml.kernel.org/r/20230126201255.1681189-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230126201255.1681189-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: add memcpy_from_file_folio()Matthew Wilcox (Oracle)
This is the equivalent of memcpy_from_page(). It differs in that it takes the position in a file instead of offset in a folio, it accepts the total number of bytes to be copied (instead of the number of bytes to be copied from this folio) and it returns how many bytes were copied from the folio, rather than making the caller calculate that and then checking if the caller got it right. [akpm@linux-foundation.org: fix typo in comment] Link: https://lkml.kernel.org/r/20230126201552.1681588-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02block: remove ->rw_pageChristoph Hellwig
The ->rw_page method is a special purpose bypass of the usual bio handling path that is limited to single-page reads and writes and synchronous which causes a lot of extra code in the drivers, callers and the block layer. The only remaining user is the MM swap code. Switch that swap code to simply submit a single-vec on-stack bio an synchronously wait on it based on a newly added QUEUE_FLAG_SYNCHRONOUS flag set by the drivers that currently implement ->rw_page instead. While this touches one extra cache line and executes extra code, it simplifies the block layer and drivers and ensures that all feastures are properly supported by all drivers, e.g. right now ->rw_page bypassed cgroup writeback entirely. [akpm@linux-foundation.org: fix comment typo, per Dan] Link: https://lkml.kernel.org/r/20230125133436.447864-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: memory-failure: add memory failure stats to sysfsJiaqi Yan
Patch series "Introduce per NUMA node memory error statistics", v2. Background ========== In the RFC for Kernel Support of Memory Error Detection [1], one advantage of software-based scanning over hardware patrol scrubber is the ability to make statistics visible to system administrators. The statistics include 2 categories: * Memory error statistics, for example, how many memory error are encountered, how many of them are recovered by the kernel. Note these memory errors are non-fatal to kernel: during the machine check exception (MCE) handling kernel already classified MCE's severity to be unnecessary to panic (but either action required or optional). * Scanner statistics, for example how many times the scanner have fully scanned a NUMA node, how many errors are first detected by the scanner. The memory error statistics are useful to userspace and actually not specific to scanner detected memory errors, and are the focus of this patchset. Motivation ========== Memory error stats are important to userspace but insufficient in kernel today. Datacenter administrators can better monitor a machine's memory health with the visible stats. For example, while memory errors are inevitable on servers with 10+ TB memory, starting server maintenance when there are only 1~2 recovered memory errors could be overreacting; in cloud production environment maintenance usually means live migrate all the workload running on the server and this usually causes nontrivial disruption to the customer. Providing insight into the scope of memory errors on a system helps to determine the appropriate follow-up action. In addition, the kernel's existing memory error stats need to be standardized so that userspace can reliably count on their usefulness. Today kernel provides following memory error info to userspace, but they are not sufficient or have disadvantages: * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, not per NUMA node stats though * ras:memory_failure_event: only available after explicitly enabled * /dev/mcelog provides many useful info about the MCEs, but doesn't capture how memory_failure recovered memory MCEs * kernel logs: userspace needs to process log text Exposing memory error stats is also a good start for the in-kernel memory error detector. Today the data source of memory error stats are either direct memory error consumption, or hardware patrol scrubber detection (either signaled as UCNA or SRAO). Once in-kernel memory scanner is implemented, it will be the main source as it is usually configured to scan memory DIMMs constantly and faster than hardware patrol scrubber. How Implemented =============== As Naoya pointed out [2], exposing memory error statistics to userspace is useful independent of software or hardware scanner. Therefore we implement the memory error statistics independent of the in-kernel memory error detector. It exposes the following per NUMA node memory error counters: /sys/devices/system/node/node${X}/memory_failure/total /sys/devices/system/node/node${X}/memory_failure/recovered /sys/devices/system/node/node${X}/memory_failure/ignored /sys/devices/system/node/node${X}/memory_failure/failed /sys/devices/system/node/node${X}/memory_failure/delayed These counters describe how many raw pages are poisoned and after the attempted recoveries by the kernel, their resolutions: how many are recovered, ignored, failed, or delayed respectively. This approach can be easier to extend for future use cases than /proc/meminfo, trace event, and log. The following math holds for the statistics: * total = recovered + ignored + failed + delayed These memory error stats are reset during machine boot. The 1st commit introduces these sysfs entries. The 2nd commit populates memory error stats every time memory_failure attempts memory error recovery. The 3rd commit adds documentations for introduced stats. [1] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#mc22959244f5388891c523882e61163c6e4d703af [2] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#m52d8d7a333d8536bd7ce74253298858b1c0c0ac6 This patch (of 3): Today kernel provides following memory error info to userspace, but each has its own disadvantage * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, not per NUMA node stats though * ras:memory_failure_event: only available after explicitly enabled * /dev/mcelog provides many useful info about the MCEs, but doesn't capture how memory_failure recovered memory MCEs * kernel logs: userspace needs to process log text Exposes per NUMA node memory error stats as sysfs entries: /sys/devices/system/node/node${X}/memory_failure/total /sys/devices/system/node/node${X}/memory_failure/recovered /sys/devices/system/node/node${X}/memory_failure/ignored /sys/devices/system/node/node${X}/memory_failure/failed /sys/devices/system/node/node${X}/memory_failure/delayed These counters describe how many raw pages are poisoned and after the attempted recoveries by the kernel, their resolutions: how many are recovered, ignored, failed, or delayed respectively. The following math holds for the statistics: * total = recovered + ignored + failed + delayed Link: https://lkml.kernel.org/r/20230120034622.2698268-1-jiaqiyan@google.com Link: https://lkml.kernel.org/r/20230120034622.2698268-2-jiaqiyan@google.com Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02mm: multi-gen LRU: section for memcg LRUT.J. Alumbaugh
Move memcg LRU code into a dedicated section. Improve the design doc to outline its architecture. Link: https://lkml.kernel.org/r/20230118001827.1040870-5-talumbau@google.com Signed-off-by: T.J. Alumbaugh <talumbau@google.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>