summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-02-16x86/resctrl: Allow RMID allocation to be scoped by CLOSIDJames Morse
MPAMs RMID values are not unique unless the CLOSID is considered as well. alloc_rmid() expects the RMID to be an independent number. Pass the CLOSID in to alloc_rmid(). Use this to compare indexes when allocating. If the CLOSID is not relevant to the index, this ends up comparing the free RMID with itself, and the first free entry will be used. With MPAM the CLOSID is included in the index, so this becomes a walk of the free RMID entries, until one that matches the supplied CLOSID is found. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-8-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Access per-rmid structures by indexJames Morse
x86 systems identify traffic using the CLOSID and RMID. The CLOSID is used to lookup the control policy, the RMID is used for monitoring. For x86 these are independent numbers. Arm's MPAM has equivalent features PARTID and PMG, where the PARTID is used to lookup the control policy. The PMG in contrast is a small number of bits that are used to subdivide PARTID when monitoring. The cache-occupancy monitors require the PARTID to be specified when monitoring. This means MPAM's PMG field is not unique. There are multiple PMG-0, one per allocated CLOSID/PARTID. If PMG is treated as equivalent to RMID, it cannot be allocated as an independent number. Bitmaps like rmid_busy_llc need to be sized by the number of unique entries for this resource. Treat the combined CLOSID and RMID as an index, and provide architecture helpers to pack and unpack an index. This makes the MPAM values unique. The domain's rmid_busy_llc and rmid_ptrs[] are then sized by index, as are domain mbm_local[] and mbm_total[]. x86 can ignore the CLOSID field when packing and unpacking an index, and report as many indexes as RMID. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-7-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Track the closid with the rmidJames Morse
x86's RMID are independent of the CLOSID. An RMID can be allocated, used and freed without considering the CLOSID. MPAM's equivalent feature is PMG, which is not an independent number, it extends the CLOSID/PARTID space. For MPAM, only PMG-bits worth of 'RMID' can be allocated for a single CLOSID. i.e. if there is 1 bit of PMG space, then each CLOSID can have two monitor groups. To allow resctrl to disambiguate RMID values for different CLOSID, everything in resctrl that keeps an RMID value needs to know the CLOSID too. This will always be ignored on x86. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Xin Hao <xhao@linux.alibaba.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-6-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Move RMID allocation out of mkdir_rdt_prepare()James Morse
RMIDs are allocated for each monitor or control group directory, because each of these needs its own RMID. For control groups, rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID. MPAM's equivalent of RMID is not an independent number, so can't be allocated until the CLOSID is known. An RMID allocation for one CLOSID may fail, whereas another may succeed depending on how many monitor groups a control group has. The RMID allocation needs to move to be after the CLOSID has been allocated. Move the RMID allocation out of mkdir_rdt_prepare() to occur in its caller, after the mkdir_rdt_prepare() call. This allows the RMID allocator to know the CLOSID. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-5-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Create helper for RMID allocation and mondata dir creationJames Morse
When monitoring is supported, each monitor and control group is allocated an RMID. For control groups, rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID. MPAM's equivalent of RMID are not an independent number, so can't be allocated until the CLOSID is known. An RMID allocation for one CLOSID may fail, whereas another may succeed depending on how many monitor groups a control group has. The RMID allocation needs to move to be after the CLOSID has been allocated. Move the RMID allocation and mondata dir creation to a helper. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-4-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Free rmid_ptrs from resctrl_exit()James Morse
rmid_ptrs[] is allocated from dom_data_init() but never free()d. While the exit text ends up in the linker script's DISCARD section, the direction of travel is for resctrl to be/have loadable modules. Add resctrl_put_mon_l3_config() to cleanup any memory allocated by rdt_get_mon_l3_config(). There is no reason to backport this to a stable kernel. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-3-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdefJames Morse
tick_nohz_full_mask lists the CPUs that are nohz_full. This is only needed when CONFIG_NO_HZ_FULL is defined. tick_nohz_full_cpu() allows a specific CPU to be tested against the mask, and evaluates to false when CONFIG_NO_HZ_FULL is not defined. The resctrl code needs to pick a CPU to run some work on, a new helper prefers housekeeping CPUs by examining the tick_nohz_full_mask. Hiding the declaration behind #ifdef CONFIG_NO_HZ_FULL forces all the users to be behind an #ifdef too. Move the tick_nohz_full_mask declaration, this lets callers drop the #ifdef, and guard access to tick_nohz_full_mask with IS_ENABLED() or something like tick_nohz_full_cpu(). The definition does not need to be moved as any callers should be removed at compile time unless CONFIG_NO_HZ_FULL is defined. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> # for resctrl dependency Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-2-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16drm/nouveau/mmu/r535: uninitialized variable in r535_bar_new_()Dan Carpenter
If gf100_bar_new_() fails then "bar" is not initialized. Fixes: 5bf0257136a2 ("drm/nouveau/mmu/r535: initial support") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/dab21df7-4d90-4479-97d8-97e5d228c714@moroto.mountain
2024-02-16nouveau: fix function cast warningsArnd Bergmann
clang-16 warns about casting between incompatible function types: drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c:161:10: error: cast from 'void (*)(const struct firmware *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] 161 | .fini = (void(*)(void *))release_firmware, This one was done to use the generic shadow_fw_release() function as a callback for struct nvbios_source. Change it to use the same prototype as the other five instances, with a trivial helper function that actually calls release_firmware. Fixes: 70c0f263cc2e ("drm/nouveau/bios: pull in basic vbios subdev, more to come later") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240213095753.455062-1-arnd@kernel.org
2024-02-16Merge tag 'zonefs-6.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: - Fix direct write error handling to avoid a race between failed IO completion and the submission path itself which can result in an invalid file size exposed to the user after the failed IO. * tag 'zonefs-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Improve error handling
2024-02-16Merge tag 'kvmarm-fixes-6.8-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.8, take #2 - Avoid dropping the page refcount twice when freeing an unlinked page-table subtree.
2024-02-16Merge tag 'kvmarm-fixes-6.8-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.8, take #1 - Don't source the VFIO Kconfig twice - Fix protected-mode locking order between kvm and vcpus
2024-02-16Merge tag 'sound-6.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of device-specific fixes. It became a bit bigger than wished, but all look reasonably small and safe to apply. - A few Cirrus Logic CS35L56 and CS42L43 driver fixes - ASoC SOF fixes and workarounds - Various ASoC Intel fixes - Lots of HD-, USB-audio and AMD ACP quirks" * tag 'sound-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits) ALSA: usb-audio: More relaxed check of MIDI jack names ALSA: hda/realtek: fix mute/micmute LED For HP mt645 ALSA: hda/realtek: cs35l41: Fix order and duplicates in quirks table ALSA: hda/realtek: cs35l41: Fix device ID / model name ALSA: hda/realtek: cs35l41: Add internal speaker support for ASUS UM3402 with missing DSD ASoC: cs35l56: Workaround for ACPI with broken spk-id-gpios property ALSA: hda: Add Lenovo Legion 7i gen7 sound quirk ASoC: SOF: IPC3: fix message bounds on ipc ops ASoC: SOF: ipc4-pcm: Workaround for crashed firmware on system suspend ASoC: q6dsp: fix event handler prototype ASoC: SOF: Intel: pci-lnl: Change the topology path to intel/sof-ipc4-tplg ASoC: SOF: Intel: pci-tgl: Change the default paths and firmware names ASoC: amd: yc: Fix non-functional mic on Lenovo 82UU ASoC: rt5645: Add DMI quirk for inverted jack-detect on MeeGoPad T8 ASoC: rt5645: Make LattePanda board DMI match more precise ASoC: SOF: amd: Fix locking in ACP IRQ handler ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work() ASoC: Intel: cht_bsw_rt5645: Cleanup codec_name handling ASoC: Intel: Boards: Fix NULL pointer deref in BYT/CHT boards ASoC: cs35l56: Remove default from IRQ1_CFG register ...
2024-02-16Merge tag 'gpio-fixes-for-v6.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - add missing stubs for functions that are not built with GPIOLIB disabled * tag 'gpio-fixes-for-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: add gpio_device_get_label() stub for !GPIOLIB gpiolib: add gpio_device_get_base() stub for !GPIOLIB gpiolib: add gpiod_to_gpio_device() stub for !GPIOLIB
2024-02-16workqueue, irq_work: Build fix for !CONFIG_IRQ_WORKTejun Heo
2f34d7337d98 ("workqueue: Fix queue_work_on() with BH workqueues") added irq_work usage to workqueue; however, it turns out irq_work is actually optional and the change breaks build on configuration which doesn't have CONFIG_IRQ_WORK enabled. Fix build by making workqueue use irq_work only when CONFIG_SMP and enabling CONFIG_IRQ_WORK when CONFIG_SMP is set. It's reasonable to argue that it may be better to just always enable it. However, this still saves a small bit of memory for tiny UP configs and also the least amount of change, so, for now, let's keep it conditional. Verified to do the right thing for x86_64 allnoconfig and defconfig, and aarch64 allnoconfig, allnoconfig + prink disable (SMP but nothing selects IRQ_WORK) and a modified aarch64 Kconfig where !SMP and nothing selects IRQ_WORK. v2: `depends on SMP` leads to Kconfig warnings when CONFIG_IRQ_WORK is selected by something else when !CONFIG_SMP. Use `def_bool y if SMP` instead. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Fixes: 2f34d7337d98 ("workqueue: Fix queue_work_on() with BH workqueues") Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2024-02-16Merge tag 'drm-fixes-2024-02-16' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular weekly fixes, nothing too major, mostly amdgpu, then i915, xe, msm and nouveau with some scattered bits elsewhere. crtc: - fix uninit variable prime: - support > 4GB page arrays buddy: - fix error handling in allocations i915: - fix blankscreen on JSL chromebooks - stable fix to limit DP sst link rates xe: - Fix an out-of-bounds shift. - Fix the display code thinking xe uses shmem - Fix a warning about index out-of-bound - Fix a clang-16 compilation warning amdgpu: - PSR fixes - Suspend/resume fixes - Link training fix - Aspect ratio fix - DCN 3.5 fixes - VCN 4.x fix - GFX 11 fix - Misc display fixes - Misc small fixes amdkfd: - Cache size reporting fix - SIMD distribution fix msm: - GPU: - dmabuf vmap fix - a610 UBWC corruption fix (incorrect hbb) - revert a commit that was making GPU recovery unreliable - tlb invalidation fix ivpu: - suspend/resume fix nouveau: - fix scheduler cleanup path - fix pointless scheduler creation - fix kvalloc argument order rockchip: - vop2 locking fix" * tag 'drm-fixes-2024-02-16' of git://anongit.freedesktop.org/drm/drm: (38 commits) drm/amdgpu: Fix implicit assumtion in gfx11 debug flags drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards drm/amd/display: Increase ips2_eval delay for DCN35 drm/amdgpu/display: Initialize gamma correction mode variable in dcn30_get_gamcor_current() drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution drm/amd/display: fixed integer types and null check locations drm/amd/display: Fix array-index-out-of-bounds in dcn35_clkmgr drm/amd/display: Preserve original aspect ratio in create stream drm/amd/display: Fix possible NULL dereference on device remove/driver unload Revert "drm/amd/display: increased min_dcfclk_mhz and min_fclk_mhz" drm/amd/display: Add align done check Revert "drm/amd: flush any delayed gfxoff on suspend entry" drm/amd: Stop evicting resources on APUs in suspend drm/amd/display: Fix possible buffer overflow in 'find_dcfclk_for_voltage()' drm/amd/display: Fix possible use of uninitialized 'max_chunks_fbc_mode' in 'calculate_bandwidth()' drm/amd/display: Initialize 'wait_time_microsec' variable in link_dp_training_dpia.c drm/amd/display: Fix && vs || typos drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3 drm/amdgpu: make damage clips support configurable drm/msm: Wire up tlb ops ...
2024-02-16Merge tag 'lsm-pr-20240215' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm fix from Paul Moore: "One small LSM patch to fix a potential integer overflow in the newly added lsm_set_self_attr() syscall" * tag 'lsm-pr-20240215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: fix integer overflow in lsm_set_self_attr() syscall
2024-02-16x86/cpu/topology: Get rid of cpuinfo::x86_max_coresThomas Gleixner
Now that __num_cores_per_package and __num_threads_per_package are available, cpuinfo::x86_max_cores and the related math all over the place can be replaced with the ready to consume data. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240213210253.176147806@linutronix.de
2024-02-16integrity: eliminate unnecessary "Problem loading X.509 certificate" msgCoiby Xu
Currently when the kernel fails to add a cert to the .machine keyring, it will throw an error immediately in the function integrity_add_key. Since the kernel will try adding to the .platform keyring next or throw an error (in the caller of integrity_add_key i.e. add_to_machine_keyring), so there is no need to throw an error immediately in integrity_add_key. Reported-by: itrymybest80@protonmail.com Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2239331 Fixes: d19967764ba8 ("integrity: Introduce a Linux keyring called machine") Reviewed-by: Eric Snowberg <eric.snowberg@oracle.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2024-02-16dmaengine: fsl-edma: correct max_segment_size settingFrank Li
Correcting the previous setting of 0x3fff to the actual value of 0x7fff. Introduced new macro 'EDMA_TCD_ITER_MASK' for improved code clarity and utilization of FIELD_GET to obtain the accurate maximum value. Cc: stable@vger.kernel.org Fixes: e06748539432 ("dmaengine: fsl-edma: support edma memcpy") Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240207194733.2112870-1-Frank.Li@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-02-16dmaengine: idxd: Remove shadow Event Log head stored in idxdFenghua Yu
head is defined in idxd->evl as a shadow of head in the EVLSTATUS register. There are two issues related to the shadow head: 1. Mismatch between the shadow head and the state of the EVLSTATUS register: If Event Log is supported, upon completion of the Enable Device command, the Event Log head in the variable idxd->evl->head should be cleared to match the state of the EVLSTATUS register. But the variable is not reset currently, leading mismatch between the variable and the register state. The mismatch causes incorrect processing of Event Log entries. 2. Unnecessary shadow head definition: The shadow head is unnecessary as head can be read directly from the EVLSTATUS register. Reading head from the register incurs no additional cost because event log head and tail are always read together and tail is already read directly from the register as required by hardware. Remove the shadow Event Log head stored in idxd->evl to address the mentioned issues. Fixes: 244da66cda35 ("dmaengine: idxd: setup event log configuration") Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240215024931.1739621-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-02-16x86/CPU/AMD: Do the common init on future Zens tooBorislav Petkov (AMD)
There's no need to enable the common Zen init stuff for each new family - just do it by default on everything >= 0x17 family. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240201161024.30839-1-bp@alien8.de
2024-02-16drm/buddy: Modify duplicate list_splice_tail callArunpravin Paneer Selvam
Remove the duplicate list_splice_tail call when the total_allocated < size condition is true. Cc: <stable@vger.kernel.org> # 6.7+ Fixes: 8746c6c9dfa3 ("drm/buddy: Fix alloc_range() error handling code") Reported-by: Bert Karwatzki <spasswolf@web.de> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240216100048.4101-1-Arunpravin.PaneerSelvam@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-02-16phy: qcom-qmp-usb: fix v3 offsets dataDmitry Baryshkov
The MSM8996 platform has registers setup different to the rest of QMP v3 USB platforms. It has PCS region at 0x600 and no PCS_MISC region, while other platforms have PCS region at 0x800 and PCS_MISC at 0x600. This results in the malfunctioning USB host on some of the platforms. The commit f74c35b630d4 ("phy: qcom-qmp-usb: fix register offsets for ipq8074/ipq6018") fixed the issue for IPQ platforms, but missed the SDM845 which has the same register layout. To simplify future platform addition and to make the driver more future proof, rename qmp_usb_offsets_v3 to qmp_usb_offsets_v3_msm8996 (to mark its peculiarity), rename qmp_usb_offsets_ipq8074 to qmp_usb_offsets_v3 and use it for SDM845 platform. Fixes: 2be22aae6b18 ("phy: qcom-qmp-usb: populate offsets configuration") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240213133824.2218916-1-dmitry.baryshkov@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-02-16arm64: tegra: Set the correct PHY mode for MGBEThierry Reding
The PHY is configured in 10GBASE-R, so make sure to reflect that in DT. Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2024-02-16net/sched: act_mirred: don't override retval if we already lost the skbJakub Kicinski
If we're redirecting the skb, and haven't called tcf_mirred_forward(), yet, we need to tell the core to drop the skb by setting the retcode to SHOT. If we have called tcf_mirred_forward(), however, the skb is out of our hands and returning SHOT will lead to UaF. Move the retval override to the error path which actually need it. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Fixes: e5cf1baf92cb ("act_mirred: use TC_ACT_REINSERT when possible") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16net/sched: act_mirred: use the backlog for mirred ingressJakub Kicinski
The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog for nested calls to mirred ingress") hangs our testing VMs every 10 or so runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by lockdep. The problem as previously described by Davide (see Link) is that if we reverse flow of traffic with the redirect (egress -> ingress) we may reach the same socket which generated the packet. And we may still be holding its socket lock. The common solution to such deadlocks is to put the packet in the Rx backlog, rather than run the Rx path inline. Do that for all egress -> ingress reversals, not just once we started to nest mirred calls. In the past there was a concern that the backlog indirection will lead to loss of error reporting / less accurate stats. But the current workaround does not seem to address the issue. Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions") Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Suggested-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16net: ethernet: adi: requires PHYLIB supportRandy Dunlap
This driver uses functions that are supplied by the Kconfig symbol PHYLIB, so select it to ensure that they are built as needed. When CONFIG_ADIN1110=y and CONFIG_PHYLIB=m, there are multiple build (linker) errors that are resolved by this Kconfig change: ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_open': drivers/net/ethernet/adi/adin1110.c:933: undefined reference to `phy_start' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_probe_netdevs': drivers/net/ethernet/adi/adin1110.c:1603: undefined reference to `get_phy_device' ld: drivers/net/ethernet/adi/adin1110.c:1609: undefined reference to `phy_connect' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy': drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect' ld: drivers/net/ethernet/adi/adin1110.o: in function `devm_mdiobus_alloc': include/linux/phy.h:455: undefined reference to `devm_mdiobus_alloc_size' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_register_mdiobus': drivers/net/ethernet/adi/adin1110.c:529: undefined reference to `__devm_mdiobus_register' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_stop': drivers/net/ethernet/adi/adin1110.c:958: undefined reference to `phy_stop' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy': drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_adjust_link': drivers/net/ethernet/adi/adin1110.c:1077: undefined reference to `phy_print_status' ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_ioctl': drivers/net/ethernet/adi/adin1110.c:790: undefined reference to `phy_do_ioctl' ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf60): undefined reference to `phy_ethtool_get_link_ksettings' ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf68): undefined reference to `phy_ethtool_set_link_ksettings' Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402070626.eZsfVHG5-lkp@intel.com/ Cc: Lennart Franzen <lennart@lfdomain.com> Cc: Alexandru Tachici <alexandru.tachici@analog.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Reviewed-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16dccp/tcp: Unhash sk from ehash for tb2 alloc failure after check_estalblished().Kuniyuki Iwashima
syzkaller reported a warning [0] in inet_csk_destroy_sock() with no repro. WARN_ON(inet_sk(sk)->inet_num && !inet_csk(sk)->icsk_bind_hash); However, the syzkaller's log hinted that connect() failed just before the warning due to FAULT_INJECTION. [1] When connect() is called for an unbound socket, we search for an available ephemeral port. If a bhash bucket exists for the port, we call __inet_check_established() or __inet6_check_established() to check if the bucket is reusable. If reusable, we add the socket into ehash and set inet_sk(sk)->inet_num. Later, we look up the corresponding bhash2 bucket and try to allocate it if it does not exist. Although it rarely occurs in real use, if the allocation fails, we must revert the changes by check_established(). Otherwise, an unconnected socket could illegally occupy an ehash entry. Note that we do not put tw back into ehash because sk might have already responded to a packet for tw and it would be better to free tw earlier under such memory presure. [0]: WARNING: CPU: 0 PID: 350830 at net/ipv4/inet_connection_sock.c:1193 inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) Modules linked in: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) Code: 41 5c 41 5d 41 5e e9 2d 4a 3d fd e8 28 4a 3d fd 48 89 ef e8 f0 cd 7d ff 5b 5d 41 5c 41 5d 41 5e e9 13 4a 3d fd e8 0e 4a 3d fd <0f> 0b e9 61 fe ff ff e8 02 4a 3d fd 4c 89 e7 be 03 00 00 00 e8 05 RSP: 0018:ffffc9000b21fd38 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000009e78 RCX: ffffffff840bae40 RDX: ffff88806e46c600 RSI: ffffffff840bb012 RDI: ffff88811755cca8 RBP: ffff88811755c880 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000000009e78 R11: 0000000000000000 R12: ffff88811755c8e0 R13: ffff88811755c892 R14: ffff88811755c918 R15: 0000000000000000 FS: 00007f03e5243800(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32f21000 CR3: 0000000112ffe001 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ? inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) dccp_close (net/dccp/proto.c:1078) inet_release (net/ipv4/af_inet.c:434) __sock_release (net/socket.c:660) sock_close (net/socket.c:1423) __fput (fs/file_table.c:377) __fput_sync (fs/file_table.c:462) __x64_sys_close (fs/open.c:1557 fs/open.c:1539 fs/open.c:1539) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f03e53852bb Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 43 c9 f5 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 c9 f5 ff 8b 44 RSP: 002b:00000000005dfba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f03e53852bb RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000167c R10: 0000000008a79680 R11: 0000000000000293 R12: 00007f03e4e43000 R13: 00007f03e4e43170 R14: 00007f03e4e43178 R15: 00007f03e4e43170 </TASK> [1]: FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 0 PID: 350833 Comm: syz-executor.1 Not tainted 6.7.0-12272-g2121c43f88f5 #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153) should_failslab (mm/slub.c:3748) kmem_cache_alloc (mm/slub.c:3763 mm/slub.c:3842 mm/slub.c:3867) inet_bind2_bucket_create (net/ipv4/inet_hashtables.c:135) __inet_hash_connect (net/ipv4/inet_hashtables.c:1100) dccp_v4_connect (net/dccp/ipv4.c:116) __inet_stream_connect (net/ipv4/af_inet.c:676) inet_stream_connect (net/ipv4/af_inet.c:747) __sys_connect_file (net/socket.c:2048 (discriminator 2)) __sys_connect (net/socket.c:2065) __x64_sys_connect (net/socket.c:2072) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f03e5284e5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007f03e4641cc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f03e5284e5d RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 000000000000000b R14: 00007f03e52e5530 R15: 0000000000000000 </TASK> Reported-by: syzkaller <syzkaller@googlegroups.com> Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16Merge branch 'bridge-mdb-events'David S. Miller
Tobias Waldekranz says: ==================== net: bridge: switchdev: Ensure MDB events are delivered exactly once When a device is attached to a bridge, drivers will request a replay of objects that were created before the device joined the bridge, that are still of interest to the joining port. Typical examples include FDB entries and MDB memberships on other ports ("foreign interfaces") or on the bridge itself. Conversely when a device is detached, the bridge will synthesize deletion events for all those objects that are still live, but no longer applicable to the device in question. This series eliminates two races related to the synching and unsynching phases of a bridge's MDB with a joining or leaving device, that would cause notifications of such objects to be either delivered twice (1/2), or not at all (2/2). A similar race to the one solved by 1/2 still remains for the FDB. This is much harder to solve, due to the lockless operation of the FDB's rhashtable, and is therefore knowingly left out of this series. v1 -> v2: - Squash the previously separate addition of switchdev_port_obj_act_is_deferred into first consumer. - Use ether_addr_equal to compare MAC addresses. - Document switchdev_port_obj_act_is_deferred (renamed from switchdev_port_obj_is_deferred in v1, to indicate that we also match on the action). - Delay allocations of MDB objects until we know they're needed. - Use non-RCU version of the hash list iterator, now that the MDB is not scanned while holding the RCU read lock. - Add Fixes tag to commit message v2 -> v3: - Fix unlocking in error paths - Access RCU protected port list via mlock_dereference, since MDB is guaranteed to remain constant for the duration of the scan. v3 -> v4: - Limit the search for exiting deferred events in 1/2 to only apply to additions, since the problem does not exist in the deletion case. - Add 2/2, to plug a related race when unoffloading an indirectly associated device. v4 -> v5: - Fix grammatical errors in kerneldoc of switchdev_port_obj_act_is_deferred ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16net: bridge: switchdev: Ensure deferred event delivery on unoffloadTobias Waldekranz
When unoffloading a device, it is important to ensure that all relevant deferred events are delivered to it before it disassociates itself from the bridge. Before this change, this was true for the normal case when a device maps 1:1 to a net_bridge_port, i.e. br0 / swp0 When swp0 leaves br0, the call to switchdev_deferred_process() in del_nbp() makes sure to process any outstanding events while the device is still associated with the bridge. In the case when the association is indirect though, i.e. when the device is attached to the bridge via an intermediate device, like a LAG... br0 / lag0 / swp0 ...then detaching swp0 from lag0 does not cause any net_bridge_port to be deleted, so there was no guarantee that all events had been processed before the device disassociated itself from the bridge. Fix this by always synchronously processing all deferred events before signaling completion of unoffloading back to the driver. Fixes: 4e51bf44a03a ("net: bridge: move the switchdev object replay helpers to "push" mode") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16net: bridge: switchdev: Skip MDB replays of deferred events on offloadTobias Waldekranz
Before this change, generation of the list of MDB events to replay would race against the creation of new group memberships, either from the IGMP/MLD snooping logic or from user configuration. While new memberships are immediately visible to walkers of br->mdb_list, the notification of their existence to switchdev event subscribers is deferred until a later point in time. So if a replay list was generated during a time that overlapped with such a window, it would also contain a replay of the not-yet-delivered event. The driver would thus receive two copies of what the bridge internally considered to be one single event. On destruction of the bridge, only a single membership deletion event was therefore sent. As a consequence of this, drivers which reference count memberships (at least DSA), would be left with orphan groups in their hardware database when the bridge was destroyed. This is only an issue when replaying additions. While deletion events may still be pending on the deferred queue, they will already have been removed from br->mdb_list, so no duplicates can be generated in that scenario. To a user this meant that old group memberships, from a bridge in which a port was previously attached, could be reanimated (in hardware) when the port joined a new bridge, without the new bridge's knowledge. For example, on an mv88e6xxx system, create a snooping bridge and immediately add a port to it: root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \ > ip link set dev x3 up master br0 And then destroy the bridge: root@infix-06-0b-00:~$ ip link del dev br0 root@infix-06-0b-00:~$ mvls atu ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a DEV:0 Marvell 88E6393X 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . . 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . . ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a root@infix-06-0b-00:~$ The two IPv6 groups remain in the hardware database because the port (x3) is notified of the host's membership twice: once via the original event and once via a replay. Since only a single delete notification is sent, the count remains at 1 when the bridge is destroyed. Then add the same port (or another port belonging to the same hardware domain) to a new bridge, this time with snooping disabled: root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \ > ip link set dev x3 up master br1 All multicast, including the two IPv6 groups from br0, should now be flooded, according to the policy of br1. But instead the old memberships are still active in the hardware database, causing the switch to only forward traffic to those groups towards the CPU (port 0). Eliminate the race in two steps: 1. Grab the write-side lock of the MDB while generating the replay list. This prevents new memberships from showing up while we are generating the replay list. But it leaves the scenario in which a deferred event was already generated, but not delivered, before we grabbed the lock. Therefore: 2. Make sure that no deferred version of a replay event is already enqueued to the switchdev deferred queue, before adding it to the replay list, when replaying additions. Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16net/iucv: fix the allocation size of iucv_path_table arrayAlexander Gordeev
iucv_path_table is a dynamically allocated array of pointers to struct iucv_path items. Yet, its size is calculated as if it was an array of struct iucv_path items. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16thunderbolt: Fix NULL pointer dereference in tb_port_update_credits()Mika Westerberg
Olliver reported that his system crashes when plugging in Thunderbolt 1 device: BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:tb_port_do_update_credits+0x1b/0x130 [thunderbolt] Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? exc_page_fault+0x7f/0x180 ? asm_exc_page_fault+0x26/0x30 ? tb_port_do_update_credits+0x1b/0x130 ? tb_switch_update_link_attributes+0x83/0xd0 tb_switch_add+0x7a2/0xfe0 tb_scan_port+0x236/0x6f0 tb_handle_hotplug+0x6db/0x900 process_one_work+0x171/0x340 worker_thread+0x27b/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This is due the fact that some Thunderbolt 1 devices only have one lane adapter. Fix this by checking for the lane 1 before we read its credits. Reported-by: Olliver Schinagl <oliver@schinagl.nl> Closes: https://lore.kernel.org/linux-usb/c24c7882-6254-4e68-8f22-f3e8f65dc84f@schinagl.nl/ Fixes: 81af2952e606 ("thunderbolt: Add support for asymmetric link") Cc: stable@vger.kernel.org Cc: Gil Fine <gil.fine@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2024-02-16Merge tag 'drm-msm-fixes-2024-02-15' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-fixes Fixes for v6.8-rc5 GPU: - dmabuf vmap fix - a610 UBWC corruption fix (incorrect hbb) - revert a commit that was making GPU recovery unreliable - tlb invalidation fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGszDSiw66+a=ttBr-hat+zrcBtfc_cZ4LQqXu89DJ0UeQ@mail.gmail.com
2024-02-16Merge tag 'amd-drm-fixes-6.8-2024-02-15-2' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.8-2024-02-15-2: amdgpu: - PSR fixes - Suspend/resume fixes - Link training fix - Aspect ratio fix - DCN 3.5 fixes - VCN 4.x fix - GFX 11 fix - Misc display fixes - Misc small fixes amdkfd: - Cache size reporting fix - SIMD distribution fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240215192452.11805-1-alexander.deucher@amd.com
2024-02-16Merge tag 'drm-xe-fixes-2024-02-15' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix an out-of-bounds shift. - Fix the display code thinking xe uses shmem - Fix a warning about index out-of-bound - Fix a clang-16 compilation warning Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zc4GpcrbFVqdK9Ws@fedora
2024-02-16Merge tag 'drm-intel-fixes-2024-02-15' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Fix for #10172: Blank screen on JSL Chromebooks. Stable fix to limit DP SST link rate to <=8.1Gbps. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zc37W27F5OvoeSkG@jlahtine-mobl.ger.corp.intel.com
2024-02-16Merge tag 'drm-misc-fixes-2024-02-15' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A suspend/resume error fix for ivpu, a couple of scheduler fixes for nouveau, a patch to support large page arrays in prime, a uninitialized variable fix in crtc, a locking fix in rockchip/vop2 and a buddy allocator error reporting fix. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/b4ffqzigtfh6cgzdpwuk6jlrv3dnk4hu6etiizgvibysqgtl2p@42n2gdfdd5eu
2024-02-15smb: Fix regression in writes when non-standard maximum write size negotiatedSteve French
The conversion to netfs in the 6.3 kernel caused a regression when maximum write size is set by the server to an unexpected value which is not a multiple of 4096 (similarly if the user overrides the maximum write size by setting mount parm "wsize", but sets it to a value that is not a multiple of 4096). When negotiated write size is not a multiple of 4096 the netfs code can skip the end of the final page when doing large sequential writes, causing data corruption. This section of code is being rewritten/removed due to a large netfs change, but until that point (ie for the 6.3 kernel until now) we can not support non-standard maximum write sizes. Add a warning if a user specifies a wsize on mount that is not a multiple of 4096 (and round down), also add a change where we round down the maximum write size if the server negotiates a value that is not a multiple of 4096 (we also have to check to make sure that we do not round it down to zero). Reported-by: R. Diez" <rdiez-2006@rd10.de> Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") Suggested-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Acked-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Cc: stable@vger.kernel.org # v6.3+ Cc: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-15Merge branch 'fix-the-read-of-vsyscall-page-through-bpf'Alexei Starovoitov
Hou Tao says: ==================== Fix the read of vsyscall page through bpf From: Hou Tao <houtao1@huawei.com> Hi, As reported by syzboot [1] and [2], when trying to read vsyscall page by using bpf_probe_read_kernel() or bpf_probe_read(), oops may happen. Thomas Gleixner had proposed a test patch [3], but it seems that no formal patch is posted after about one month [4], so I post it instead and add an Originally-by tag in patch #2. Patch #1 makes is_vsyscall_vaddr() being a common helper. Patch #2 fixes the problem by disallowing vsyscall page read for copy_from_kernel_nofault(). Patch #3 adds one test case to ensure the read of vsyscall page through bpf is rejected. Please see individual patches for more details. Comments are always welcome. [1]: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com/ [2]: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com/ [3]: https://lore.kernel.org/bpf/87r0jwquhv.ffs@tglx/ [4]: https://lore.kernel.org/bpf/e24b125c-8ff4-9031-6c53-67ff2e01f316@huaweicloud.com/ Change Log: v3: * rephrase commit message for patch #1 & #2 (Sohil) * reword comments in copy_from_kernel_nofault_allowed() (Sohil) * add Rvb tag for patch #1 and Acked-by tag for patch #3 (Sohil, Yonghong) v2: https://lore.kernel.org/bpf/20240126115423.3943360-1-houtao@huaweicloud.com/ * move is_vsyscall_vaddr to asm/vsyscall.h instead (Sohil) * elaborate on the reason for disallowing of vsyscall page read in copy_from_kernel_nofault_allowed() (Sohil) * update the commit message of patch #2 to more clearly explain how the oops occurs. (Sohil) * update the commit message of patch #3 to explain the expected return values of various bpf helpers (Yonghong) v1: https://lore.kernel.org/bpf/20240119073019.1528573-1-houtao@huaweicloud.com/ ==================== Link: https://lore.kernel.org/r/20240202103935.3154011-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-15selftest/bpf: Test the read of vsyscall page under x86-64Hou Tao
Under x86-64, when using bpf_probe_read_kernel{_str}() or bpf_probe_read{_str}() to read vsyscall page, the read may trigger oops, so add one test case to ensure that the problem is fixed. Beside those four bpf helpers mentioned above, testing the read of vsyscall page by using bpf_probe_read_user{_str} and bpf_copy_from_user{_task}() as well. The test case passes the address of vsyscall page to these six helpers and checks whether the returned values are expected: 1) For bpf_probe_read_kernel{_str}()/bpf_probe_read{_str}(), the expected return value is -ERANGE as shown below: bpf_probe_read_kernel_common copy_from_kernel_nofault // false, return -ERANGE copy_from_kernel_nofault_allowed 2) For bpf_probe_read_user{_str}(), the expected return value is -EFAULT as show below: bpf_probe_read_user_common copy_from_user_nofault // false, return -EFAULT __access_ok 3) For bpf_copy_from_user(), the expected return value is -EFAULT: // return -EFAULT bpf_copy_from_user copy_from_user _copy_from_user // return false access_ok 4) For bpf_copy_from_user_task(), the expected return value is -EFAULT: // return -EFAULT bpf_copy_from_user_task access_process_vm // return 0 vma_lookup() // return 0 expand_stack() The occurrence of oops depends on the availability of CPU SMAP [1] feature and there are three possible configurations of vsyscall page in the boot cmd-line: vsyscall={xonly|none|emulate}, so there are a total of six possible combinations. Under all these combinations, the test case runs successfully. [1]: https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20240202103935.3154011-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-15x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()Hou Tao
When trying to use copy_from_kernel_nofault() to read vsyscall page through a bpf program, the following oops was reported: BUG: unable to handle page fault for address: ffffffffff600000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:copy_from_kernel_nofault+0x6f/0x110 ...... Call Trace: <TASK> ? copy_from_kernel_nofault+0x6f/0x110 bpf_probe_read_kernel+0x1d/0x50 bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d trace_call_bpf+0xc5/0x1c0 perf_call_bpf_enter.isra.0+0x69/0xb0 perf_syscall_enter+0x13e/0x200 syscall_trace_enter+0x188/0x1c0 do_syscall_64+0xb5/0xe0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 </TASK> ...... ---[ end trace 0000000000000000 ]--- The oops is triggered when: 1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall page and invokes copy_from_kernel_nofault() which in turn calls __get_user_asm(). 2) Because the vsyscall page address is not readable from kernel space, a page fault exception is triggered accordingly. 3) handle_page_fault() considers the vsyscall page address as a user space address instead of a kernel space address. This results in the fix-up setup by bpf not being applied and a page_fault_oops() is invoked due to SMAP. Considering handle_page_fault() has already considered the vsyscall page address as a userspace address, fix the problem by disallowing vsyscall page read for copy_from_kernel_nofault(). Originally-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: syzbot+72aa0161922eba61b50e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240202103935.3154011-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-15x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.hHou Tao
Move is_vsyscall_vaddr() into asm/vsyscall.h to make it available for copy_from_kernel_nofault_allowed() in arch/x86/mm/maccess.c. Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20240202103935.3154011-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-16zonefs: Improve error handlingDamien Le Moal
Write error handling is racy and can sometime lead to the error recovery path wrongly changing the inode size of a sequential zone file to an incorrect value which results in garbage data being readable at the end of a file. There are 2 problems: 1) zonefs_file_dio_write() updates a zone file write pointer offset after issuing a direct IO with iomap_dio_rw(). This update is done only if the IO succeed for synchronous direct writes. However, for asynchronous direct writes, the update is done without waiting for the IO completion so that the next asynchronous IO can be immediately issued. However, if an asynchronous IO completes with a failure right before the i_truncate_mutex lock protecting the update, the update may change the value of the inode write pointer offset that was corrected by the error path (zonefs_io_error() function). 2) zonefs_io_error() is called when a read or write error occurs. This function executes a report zone operation using the callback function zonefs_io_error_cb(), which does all the error recovery handling based on the current zone condition, write pointer position and according to the mount options being used. However, depending on the zoned device being used, a report zone callback may be executed in a context that is different from the context of __zonefs_io_error(). As a result, zonefs_io_error_cb() may be executed without the inode truncate mutex lock held, which can lead to invalid error processing. Fix both problems as follows: - Problem 1: Perform the inode write pointer offset update before a direct write is issued with iomap_dio_rw(). This is safe to do as partial direct writes are not supported (IOMAP_DIO_PARTIAL is not set) and any failed IO will trigger the execution of zonefs_io_error() which will correct the inode write pointer offset to reflect the current state of the one on the device. - Problem 2: Change zonefs_io_error_cb() into zonefs_handle_io_error() and call this function directly from __zonefs_io_error() after obtaining the zone information using blkdev_report_zones() with a simple callback function that copies to a local stack variable the struct blk_zone obtained from the device. This ensures that error handling is performed holding the inode truncate mutex. This change also simplifies error handling for conventional zone files by bypassing the execution of report zones entirely. This is safe to do because the condition of conventional zones cannot be read-only or offline and conventional zone files are always fully mapped with a constant file size. Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
2024-02-15io_uring/napi: enable even with a timeout of 0Jens Axboe
1 usec is not as short as it used to be, and it makes sense to allow 0 for a busy poll timeout - this means just do one loop to check if we have anything available. Add a separate ->napi_enabled to check if napi has been enabled or not. While at it, move the writing of the ctx napi values after we've copied the old values back to userspace. This ensures that if the call fails, we'll be in the same state as we were before, rather than some indeterminate state. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-15md: Don't suspend the array for interrupted reshapeYu Kuai
md_start_sync() will suspend the array if there are spares that can be added or removed from conf, however, if reshape is still in progress, this won't happen at all or data will be corrupted(remove_and_add_spares won't be called from md_choose_sync_action for reshape), hence there is no need to suspend the array if reshape is not done yet. Meanwhile, there is a potential deadlock for raid456: 1) reshape is interrupted; 2) set one of the disk WantReplacement, and add a new disk to the array, however, recovery won't start until the reshape is finished; 3) then issue an IO across reshpae position, this IO will wait for reshape to make progress; 4) continue to reshape, then md_start_sync() found there is a spare disk that can be added to conf, mddev_suspend() is called; Step 4 and step 3 is waiting for each other, deadlock triggered. Noted this problem is found by code review, and it's not reporduced yet. Fix this porblem by don't suspend the array for interrupted reshape, this is safe because conf won't be changed until reshape is done. Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240201092559.910982-6-yukuai1@huaweicloud.com
2024-02-15md: Don't register sync_thread for reshape directlyYu Kuai
Currently, if reshape is interrupted, then reassemble the array will register sync_thread directly from pers->run(), in this case 'MD_RECOVERY_RUNNING' is set directly, however, there is no guarantee that md_do_sync() will be executed, hence stop_sync_thread() will hang because 'MD_RECOVERY_RUNNING' can't be cleared. Last patch make sure that md_do_sync() will set MD_RECOVERY_DONE, however, following hang can still be triggered by dm-raid test shell/lvconvert-raid-reshape.sh occasionally: [root@fedora ~]# cat /proc/1982/stack [<0>] stop_sync_thread+0x1ab/0x270 [md_mod] [<0>] md_frozen_sync_thread+0x5c/0xa0 [md_mod] [<0>] raid_presuspend+0x1e/0x70 [dm_raid] [<0>] dm_table_presuspend_targets+0x40/0xb0 [dm_mod] [<0>] __dm_destroy+0x2a5/0x310 [dm_mod] [<0>] dm_destroy+0x16/0x30 [dm_mod] [<0>] dev_remove+0x165/0x290 [dm_mod] [<0>] ctl_ioctl+0x4bb/0x7b0 [dm_mod] [<0>] dm_ctl_ioctl+0x11/0x20 [dm_mod] [<0>] vfs_ioctl+0x21/0x60 [<0>] __x64_sys_ioctl+0xb9/0xe0 [<0>] do_syscall_64+0xc6/0x230 [<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74 Meanwhile mddev->recovery is: MD_RECOVERY_RUNNING | MD_RECOVERY_INTR | MD_RECOVERY_RESHAPE | MD_RECOVERY_FROZEN Fix this problem by remove the code to register sync_thread directly from raid10 and raid5. And let md_check_recovery() to register sync_thread. Fixes: f67055780caa ("[PATCH] md: Checkpoint and allow restart of raid5 reshape") Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240201092559.910982-5-yukuai1@huaweicloud.com
2024-02-15md: Make sure md_do_sync() will set MD_RECOVERY_DONEYu Kuai
stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must set MD_RECOVERY_DONE, so that follow up md_check_recovery() will unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up stop_sync_thread(). If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4 ("md: fix stopping sync thread"), dm-raid switch from md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread from md_stop() and md_stop_writes(), causing the test shell/lvconvert-raid-reshape.sh hang. We shouldn't switch back to md_reap_sync_thread() because it's problematic in the first place. Fix the problem by making sure md_do_sync() will set MD_RECOVERY_DONE. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/ Fixes: d5d885fd514f ("md: introduce new personality funciton start()") Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock") Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240201092559.910982-4-yukuai1@huaweicloud.com
2024-02-15md: Don't ignore read-only array in md_check_recovery()Yu Kuai
Usually if the array is not read-write, md_check_recovery() won't register new sync_thread in the first place. And if the array is read-write and sync_thread is registered, md_set_readonly() will unregister sync_thread before setting the array read-only. md/raid follow this behavior hence there is no problem. After commit f52f5c71f3d4 ("md: fix stopping sync thread"), following hang can be triggered by test shell/integrity-caching.sh: 1) array is read-only. dm-raid update super block: rs_update_sbs ro = mddev->ro mddev->ro = 0 -> set array read-write md_update_sb 2) register new sync thread concurrently. 3) dm-raid set array back to read-only: rs_update_sbs mddev->ro = ro 4) stop the array: raid_dtr md_stop stop_sync_thread set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_wakeup_thread_directly(mddev->sync_thread); wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) 5) sync thread done: md_do_sync set_bit(MD_RECOVERY_DONE, &mddev->recovery); md_wakeup_thread(mddev->thread); 6) daemon thread can't unregister sync thread: md_check_recovery if (!md_is_rdwr(mddev) && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) return; -> -> MD_RECOVERY_RUNNING can't be cleared, hence step 4 hang; The root cause is that dm-raid manipulate 'mddev->ro' by itself, however, dm-raid really should stop sync thread before setting the array read-only. Unfortunately, I need to read more code before I can refacter the handler of 'mddev->ro' in dm-raid, hence let's fix the problem the easy way for now to prevent dm-raid regression. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Closes: https://lore.kernel.org/all/9801e40-8ac7-e225-6a71-309dcf9dc9aa@redhat.com/ Fixes: ecbfb9f118bc ("dm raid: add raid level takeover support") Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240201092559.910982-3-yukuai1@huaweicloud.com