summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-09net: bgmac: fix BCM5358 support by setting correct flagsRafał Miłecki
Code blocks handling BCMA_CHIP_ID_BCM5357 and BCMA_CHIP_ID_BCM53572 were incorrectly unified. Chip package values are not unique and cannot be checked independently. They are meaningful only in a context of a given chip. Packages BCM5358 and BCM47188 share the same value but then belong to different chips. Code unification resulted in treating BCM5358 as BCM47188 and broke its initialization. Link: https://github.com/openwrt/openwrt/issues/8278 Fixes: cb1b0f90acfe ("net: ethernet: bgmac: unify code of the same family") Cc: Jon Mason <jdmason@kudzu.us> Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230208091637.16291-1-zajec5@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09Merge tag 'drm-fixes-2023-02-10' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly fixes. The amdgpu had a few small fixes to display flicker on certain configurations, however it was found the the flicker was lessened but there were other unintended consequences, so for now they've been reverted and replaced with an option for users to test with so future fixes can be developed. Otherwise apart from the usual bunch of i915 and amdgpu, there's a client, virtio-gpu and an nvidiafb fix that reorders its loading to avoid failure. client: - refcount fix amdgpu: - a bunch of attempted flicker fixes that regressed turned into a user workaround option for now - Properly fix S/G display with AGP aperture enabled - Fix cursor offset with 180 rotation - SMU13 fixes - Use TGID for GPUVM traces - Fix oops on in fence error path - Don't run IB tests on hw rings when sw rings are in use - memory leak fix i915: - Display watermark fix - fbdev fix for PSR, FBC, DRRS - Move fd_install after last use of fence - Initialize the obj flags for shmem objects - Fix VBT DSI DVO port handling virtio-gpu: - fence fix nvidiafb: - regression fix for driver load when no hw supported" * tag 'drm-fixes-2023-02-10' of git://anongit.freedesktop.org/drm/drm: (27 commits) Revert "drm/amd/display: disable S/G display on DCN 3.1.5" Revert "drm/amd/display: disable S/G display on DCN 2.1.0" Revert "drm/amd/display: disable S/G display on DCN 3.1.2/3" drm/amdgpu: add S/G display parameter drm/amdgpu/smu: skip pptable init under sriov amd/amdgpu: remove test ib on hw ring drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini drm/amdgpu: Use the TGID for trace_amdgpu_vm_update_ptes drm/amdgpu: Add unique_id support for GC 11.0.1/2 drm/amd/pm: bump SMU 13.0.7 driver_if header version drm/amd/pm: bump SMU 13.0.0 driver_if header version drm/amd/pm: add SMU 13.0.7 missing GetPptLimit message mapping drm/amd/display: fix cursor offset on rotation 180 drm/amd/amdgpu: enable athub cg 11.0.3 Revert "drm/amd/display: disable S/G display on DCN 3.1.4" drm/amd/display: properly handling AGP aperture in vm setup drm/amd/display: disable S/G display on DCN 3.1.2/3 drm/amd/display: disable S/G display on DCN 2.1.0 drm/i915: Fix VBT DSI DVO port handling drm/client: fix circular reference counting issue ...
2023-02-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "The usual collection of small driver bug fixes: - Fix error unwind bugs in hfi1, irdma rtrs - Old bug with IPoIB children interfaces possibly using the wrong number of queues - Really old bug in usnic calling iommu_map in an atomic context - Recent regression from the DMABUF locking rework - Missing user data validation in MANA" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/rtrs: Don't call kobject_del for srv_path->kobj RDMA/mana_ib: Prevent array underflow in mana_ib_create_qp_raw() IB/hfi1: Assign npages earlier RDMA/umem: Use dma-buf locked API to solve deadlock RDMA/usnic: use iommu_map_atomic() under spin_lock() RDMA/irdma: Fix potential NULL-ptr-dereference IB/IPoIB: Fix legacy IPoIB due to wrong number of queues IB/hfi1: Restore allocated resources on failed copyout
2023-02-09s390/dasd: Fix potential memleak in dasd_eckd_init()Qiheng Lin
`dasd_reserve_req` is allocated before `dasd_vol_info_req`, and it also needs to be freed before the error returns, just like the other cases in this function. Fixes: 9e12e54c7a8f ("s390/dasd: Handle out-of-space constraint") Signed-off-by: Qiheng Lin <linqiheng@huawei.com> Link: https://lore.kernel.org/r/20221208133809.16796-1-linqiheng@huawei.com Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20230210000253.1644903-3-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09s390/dasd: sort out physical vs virtual pointers usageAlexander Gordeev
This does not fix a real bug, since virtual addresses are currently indentical to physical ones. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20230210000253.1644903-2-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09block: Remove the ALLOC_CACHE_SLACK constantBart Van Assche
Commit b99182c501c3 ("bio: add pcpu caching for non-polling bio_put") removed the code that uses this constant. Hence also remove the constant itself. Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230209230135.3475829-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09of: reserved_mem: Have kmemleak ignore dynamically allocated reserved memIsaac J. Manjarres
Patch series "Fix kmemleak crashes when scanning CMA regions", v2. When trying to boot a device with an ARM64 kernel with the following config options enabled: CONFIG_DEBUG_PAGEALLOC=y CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y CONFIG_DEBUG_KMEMLEAK=y a crash is encountered when kmemleak starts to scan the list of gray or allocated objects that it maintains. Upon closer inspection, it was observed that these page-faults always occurred when kmemleak attempted to scan a CMA region. At the moment, kmemleak is made aware of CMA regions that are specified through the devicetree to be dynamically allocated within a range of addresses. However, kmemleak should not need to scan CMA regions or any reserved memory region, as those regions can be used for DMA transfers between drivers and peripherals, and thus wouldn't contain anything useful for kmemleak. Additionally, since CMA regions are unmapped from the kernel's address space when they are freed to the buddy allocator at boot when CONFIG_DEBUG_PAGEALLOC is enabled, kmemleak shouldn't attempt to access those memory regions, as that will trigger a crash. Thus, kmemleak should ignore all dynamically allocated reserved memory regions. This patch (of 1): Currently, kmemleak ignores dynamically allocated reserved memory regions that don't have a kernel mapping. However, regions that do retain a kernel mapping (e.g. CMA regions) do get scanned by kmemleak. This is not ideal for two reasons: 1 kmemleak works by scanning memory regions for pointers to allocated objects to determine if those objects have been leaked or not. However, reserved memory regions can be used between drivers and peripherals for DMA transfers, and thus, would not contain pointers to allocated objects, making it unnecessary for kmemleak to scan these reserved memory regions. 2 When CONFIG_DEBUG_PAGEALLOC is enabled, along with kmemleak, the CMA reserved memory regions are unmapped from the kernel's address space when they are freed to buddy at boot. These CMA reserved regions are still tracked by kmemleak, however, and when kmemleak attempts to scan them, a crash will happen, as accessing the CMA region will result in a page-fault, since the regions are unmapped. Thus, use kmemleak_ignore_phys() for all dynamically allocated reserved memory regions, instead of those that do not have a kernel mapping associated with them. Link: https://lkml.kernel.org/r/20230208232001.2052777-1-isaacmanjarres@google.com Link: https://lkml.kernel.org/r/20230208232001.2052777-2-isaacmanjarres@google.com Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com> Cc: Nick Kossifidis <mick@ics.forth.gr> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Rob Herring <robh@kernel.org> Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Saravana Kannan <saravanak@google.com> Cc: <stable@vger.kernel.org> [5.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09scripts/gdb: fix 'lx-current' for x86Jeff Xie
When printing the name of the current process, it will report an error: (gdb) p $lx_current().comm Python Exception <class 'gdb.error'> No symbol "current_task" in current context.: Error occurred in Python: No symbol "current_task" in current context. Because e57ef2ed97c1 ("x86: Put hot per CPU variables into a struct") changed it. Link: https://lkml.kernel.org/r/20230204090139.1789264-1-xiehuan09@gmail.com Fixes: e57ef2ed97c1 ("x86: Put hot per CPU variables into a struct") Signed-off-by: Jeff Xie <xiehuan09@gmail.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09lib: parser: optimize match_NUMBER apis to use local arrayLi Lingfeng
Memory will be allocated to store substring_t in match_strdup(), which means the caller of match_strdup() may need to be scheduled out to wait for reclaiming memory. smatch complains that this can cuase sleeping in an atoic context. Using local array to store substring_t to remove the restriction. Link: https://lkml.kernel.org/r/20230120032352.242767-1-lilingfeng3@huawei.com Link: https://lore.kernel.org/all/20221104023938.2346986-5-yukuai1@huaweicloud.com/ Link: https://lkml.kernel.org/r/20230120032352.242767-1-lilingfeng3@huawei.com Fixes: 2c0647988433 ("blk-iocost: don't release 'ioc->lock' while updating params") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reported-by: Yu Kuai <yukuai1@huaweicloud.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: BingJing Chang <bingjingc@synology.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Hou Tao <houtao1@huawei.com> Cc: James Smart <james.smart@broadcom.com> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: yangerkun <yangerkun@huawei.com> Cc: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: shrinkers: fix deadlock in shrinker debugfsQi Zheng
The debugfs_remove_recursive() is invoked by unregister_shrinker(), which is holding the write lock of shrinker_rwsem. It will waits for the handler of debugfs file complete. The handler also needs to hold the read lock of shrinker_rwsem to do something. So it may cause the following deadlock: CPU0 CPU1 debugfs_file_get() shrinker_debugfs_count_show()/shrinker_debugfs_scan_write() unregister_shrinker() --> down_write(&shrinker_rwsem); debugfs_remove_recursive() // wait for (A) --> wait_for_completion(); // wait for (B) --> down_read_killable(&shrinker_rwsem) debugfs_file_put() -- (A) up_write() -- (B) The down_read_killable() can be killed, so that the above deadlock can be recovered. But it still requires an extra kill action, otherwise it will block all subsequent shrinker-related operations, so it's better to fix it. [akpm@linux-foundation.org: fix CONFIG_SHRINKER_DEBUG=n stub] Link: https://lkml.kernel.org/r/20230202105612.64641-1-zhengqi.arch@bytedance.com Fixes: 5035ebc644ae ("mm: shrinkers: introduce debugfs interface for memory shrinkers") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09mm: hwpoison: support recovery from ksm_might_need_to_copy()Kefeng Wang
When the kernel copies a page from ksm_might_need_to_copy(), but runs into an uncorrectable error, it will crash since poisoned page is consumed by kernel, this is similar to the issue recently fixed by Copy-on-write poison recovery. When an error is detected during the page copy, return VM_FAULT_HWPOISON in do_swap_page(), and install a hwpoison entry in unuse_pte() when swapoff, which help us to avoid system crash. Note, memory failure on a KSM page will be skipped, but still call memory_failure_queue() to be consistent with general memory failure process, and we could support KSM page recovery in the feature. [wangkefeng.wang@huawei.com: enhance unuse_pte(), fix issue found by lkp] Link: https://lkml.kernel.org/r/20221213120523.141588-1-wangkefeng.wang@huawei.com [wangkefeng.wang@huawei.com: update changelog, alter ksm_might_need_to_copy(), restore unlikely() in unuse_pte()] Link: https://lkml.kernel.org/r/20230201074433.96641-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20221209072801.193221-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09kasan: fix Oops due to missing calls to kasan_arch_is_ready()Christophe Leroy
On powerpc64, you can build a kernel with KASAN as soon as you build it with RADIX MMU support. However if the CPU doesn't have RADIX MMU, KASAN isn't enabled at init and the following Oops is encountered. [ 0.000000][ T0] KASAN not enabled as it requires radix! [ 4.484295][ T26] BUG: Unable to handle kernel data access at 0xc00e000000804a04 [ 4.485270][ T26] Faulting instruction address: 0xc00000000062ec6c [ 4.485748][ T26] Oops: Kernel access of bad area, sig: 11 [#1] [ 4.485920][ T26] BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 4.486259][ T26] Modules linked in: [ 4.486637][ T26] CPU: 0 PID: 26 Comm: kworker/u2:2 Not tainted 6.2.0-rc3-02590-gf8a023b0a805 #249 [ 4.486907][ T26] Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,HEAD pSeries [ 4.487445][ T26] Workqueue: eval_map_wq .tracer_init_tracefs_work_func [ 4.488744][ T26] NIP: c00000000062ec6c LR: c00000000062bb84 CTR: c0000000002ebcd0 [ 4.488867][ T26] REGS: c0000000049175c0 TRAP: 0380 Not tainted (6.2.0-rc3-02590-gf8a023b0a805) [ 4.489028][ T26] MSR: 8000000002009032 <SF,VEC,EE,ME,IR,DR,RI> CR: 44002808 XER: 00000000 [ 4.489584][ T26] CFAR: c00000000062bb80 IRQMASK: 0 [ 4.489584][ T26] GPR00: c0000000005624d4 c000000004917860 c000000001cfc000 1800000000804a04 [ 4.489584][ T26] GPR04: c0000000003a2650 0000000000000cc0 c00000000000d3d8 c00000000000d3d8 [ 4.489584][ T26] GPR08: c0000000049175b0 a80e000000000000 0000000000000000 0000000017d78400 [ 4.489584][ T26] GPR12: 0000000044002204 c000000003790000 c00000000435003c c0000000043f1c40 [ 4.489584][ T26] GPR16: c0000000043f1c68 c0000000043501a0 c000000002106138 c0000000043f1c08 [ 4.489584][ T26] GPR20: c0000000043f1c10 c0000000043f1c20 c000000004146c40 c000000002fdb7f8 [ 4.489584][ T26] GPR24: c000000002fdb834 c000000003685e00 c000000004025030 c000000003522e90 [ 4.489584][ T26] GPR28: 0000000000000cc0 c0000000003a2650 c000000004025020 c000000004025020 [ 4.491201][ T26] NIP [c00000000062ec6c] .kasan_byte_accessible+0xc/0x20 [ 4.491430][ T26] LR [c00000000062bb84] .__kasan_check_byte+0x24/0x90 [ 4.491767][ T26] Call Trace: [ 4.491941][ T26] [c000000004917860] [c00000000062ae70] .__kasan_kmalloc+0xc0/0x110 (unreliable) [ 4.492270][ T26] [c0000000049178f0] [c0000000005624d4] .krealloc+0x54/0x1c0 [ 4.492453][ T26] [c000000004917990] [c0000000003a2650] .create_trace_option_files+0x280/0x530 [ 4.492613][ T26] [c000000004917a90] [c000000002050d90] .tracer_init_tracefs_work_func+0x274/0x2c0 [ 4.492771][ T26] [c000000004917b40] [c0000000001f9948] .process_one_work+0x578/0x9f0 [ 4.492927][ T26] [c000000004917c30] [c0000000001f9ebc] .worker_thread+0xfc/0x950 [ 4.493084][ T26] [c000000004917d60] [c00000000020be84] .kthread+0x1a4/0x1b0 [ 4.493232][ T26] [c000000004917e10] [c00000000000d3d8] .ret_from_kernel_thread+0x58/0x60 [ 4.495642][ T26] Code: 60000000 7cc802a6 38a00000 4bfffc78 60000000 7cc802a6 38a00001 4bfffc68 60000000 3d20a80e 7863e8c2 792907c6 <7c6348ae> 20630007 78630fe0 68630001 [ 4.496704][ T26] ---[ end trace 0000000000000000 ]--- The Oops is due to kasan_byte_accessible() not checking the readiness of KASAN. Add missing call to kasan_arch_is_ready() and bail out when not ready. The same problem is observed with ____kasan_kfree_large() so fix it the same. Also, as KASAN is not available and no shadow area is allocated for linear memory mapping, there is no point in allocating shadow mem for vmalloc memory as shown below in /sys/kernel/debug/kernel_page_tables ---[ kasan shadow mem start ]--- 0xc00f000000000000-0xc00f00000006ffff 0x00000000040f0000 448K r w pte valid present dirty accessed 0xc00f000000860000-0xc00f00000086ffff 0x000000000ac10000 64K r w pte valid present dirty accessed 0xc00f3ffffffe0000-0xc00f3fffffffffff 0x0000000004d10000 128K r w pte valid present dirty accessed ---[ kasan shadow mem end ]--- So, also verify KASAN readiness before allocating and poisoning shadow mem for VMAs. Link: https://lkml.kernel.org/r/150768c55722311699fdcf8f5379e8256749f47d.1674716617.git.christophe.leroy@csgroup.eu Fixes: 41b7a347bf14 ("powerpc: Book3S 64-bit outline-only KASAN support") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reported-by: Nathan Lynch <nathanl@linux.ibm.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> [5.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-10Merge tag 'amd-drm-fixes-6.2-2023-02-09' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.2-2023-02-09: amdgpu: - Add a parameter to disable S/G display - Re-enable S/G display on all DCNs Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230209174504.7577-1-alexander.deucher@amd.com
2023-02-10Merge tag 'drm-intel-fixes-2023-02-09' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Display watermark fix (Ville) - fbdev fix for PSR, FBC, DRRS (Jouni) - Move fd_install after last use of fence (Rob) - Initialize the obj flags for shmem objects (Aravind) - Fix VBT DSI DVO port handling (Ville) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Y+UZ0rh2YlhTrE4t@intel.com
2023-02-10Merge tag 'drm-misc-fixes-2023-02-09' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A fix for a circular refcounting in drm/client, one for a memory leak in amdgpu and a virtio fence fix when interrupted Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20230209083600.7hi6roht6xxgldgz@houat
2023-02-09ACPI: CPPC: Fix some kernel-doc commentsYang Li
Add the description of @pcc_ss_id in pcc_data_alloc(). Add the description of @cpu_num in cppc_get_transition_latency(). clear the below warnings: drivers/acpi/cppc_acpi.c:607: warning: Function parameter or member 'pcc_ss_id' not described in 'pcc_data_alloc' drivers/acpi/cppc_acpi.c:1616: warning: Function parameter or member 'cpu_num' not described in 'cppc_get_transition_latency' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3983 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> [ rjw: Dropped redundant empty code lines, minor edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09cpuidle: sysfs: make kobj_type structures constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09thermal: intel: powerclamp: Add two module parametersSrinivas Pandruvada
In some use cases, it is desirable to only inject idle on certain set of CPUs. For example on Alder Lake systems, it is possible that we force idle only on P-Cores for thermal reasons. Also the idle percent can be more than 50% if we only choose partial set of CPUs in the system. Introduce 2 new module parameters for this purpose. They can be only changed when the cooling device is inactive. cpumask (Read/Write): A bit mask of CPUs to inject idle. The format of this bitmask is same as used in other subsystems like in /proc/irq/*/smp_affinity. The mask is comma separated 32 bit groups. Each CPU is one bit. For example for 256 CPU system the full mask is: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff The rightmost mask is for CPU 0-32. max_idle (Read/Write): Maximum injected idle time to the total CPU time ratio in percent range from 1 to 100. Even if the cooling device max_state is always 100 (100%), this parameter allows to add a max idle percent limit. The default is 50, to match the current implementation of powerclamp driver. Also doesn't allow value more than 75, if the cpumask includes every CPU present in the system. Also when the cpumask doesn't include every CPU, there is no use of compensation using package C-state idle counters. Hence don't start package C-state polling thread even for a single package or a single die system in this case. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09Documentation: admin-guide: Move intel_powerclamp documentationSrinivas Pandruvada
Create a folder "thermal" under Documentation/admin-guide and move intel_powerclamp documentation to this folder. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09cpuidle: add ARCH_SUSPEND_POSSIBLE dependenciesArnd Bergmann
Some ARMv4 processors don't support suspend, which leads to a build failure with the tegra and qualcomm cpuidle driver: WARNING: unmet direct dependencies detected for ARM_CPU_SUSPEND Depends on [n]: ARCH_SUSPEND_POSSIBLE [=n] Selected by [y]: - ARM_TEGRA_CPUIDLE [=y] && CPU_IDLE [=y] && (ARM [=y] || ARM64) && (ARCH_TEGRA [=n] || COMPILE_TEST [=y]) && !ARM64 && MMU [=y] arch/arm/kernel/sleep.o: in function `__cpu_suspend': (.text+0x68): undefined reference to `cpu_sa110_suspend_size' (.text+0x68): undefined reference to `cpu_fa526_suspend_size' Add an explicit dependency to make randconfig builds avoid this combination. Fixes: faae6c9f2e68 ("cpuidle: tegra: Enable compile testing") Fixes: a871be6b8eee ("cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver") Link: https://lore.kernel.org/all/20211013160125.772873-1-arnd@kernel.org/ Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09riscv: Fixup race condition on PG_dcache_clean in flush_icache_pteGuo Ren
In commit 588a513d3425 ("arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()"), we found RISC-V has the same issue as the previous arm64. The previous implementation didn't guarantee the correct sequence of operations, which means flush_icache_all() hasn't been called when the PG_dcache_clean was set. That would cause a risk of page synchronization. Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230127035306.1819561-1-guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-09thermal: core: Use sysfs_emit_at() instead of scnprintf()ye xingchen
Follow the advice in Documentation/filesystems/sysfs.rst that show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09PM: EM: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09riscv: kprobe: Fixup misaligned load textGuo Ren
The current kprobe would cause a misaligned load for the probe point. This patch fixup it with two half-word loads instead. Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/linux-riscv/878rhig9zj.fsf@all.your.base.are.belong.to.us/ Reported-by: Bjorn Topel <bjorn.topel@gmail.com> Reviewed-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20230204063531.740220-1-guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-09PM: domains: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09cpufreq: Make kobj_type structure constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09cpufreq: davinci: Fix clk use after freeUwe Kleine-König
The remove function first frees the clks and only then calls cpufreq_unregister_driver(). If one of the cpufreq callbacks is called just before cpufreq_unregister_driver() is run, the freed clks might be used. Fixes: 6601b8030de3 ("davinci: add generic CPUFreq driver for DaVinci") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09cpufreq: amd-pstate: avoid uninitialized variable useArnd Bergmann
The new epp support causes warnings about three separate but related bugs: 1) failing before allocation should just return an error: drivers/cpufreq/amd-pstate.c:951:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (!dev) ^~~~ drivers/cpufreq/amd-pstate.c:1018:9: note: uninitialized use occurs here return ret; ^~~ 2) wrong variable to store return code: drivers/cpufreq/amd-pstate.c:963:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (rc) ^~ drivers/cpufreq/amd-pstate.c:1019:9: note: uninitialized use occurs here return ret; ^~~ drivers/cpufreq/amd-pstate.c:963:2: note: remove the 'if' if its condition is always false if (rc) ^~~~~~~ 3) calling amd_pstate_set_epp() in cleanup path after determining that it should not be called: drivers/cpufreq/amd-pstate.c:1055:6: error: variable 'epp' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (cpudata->epp_policy == cpudata->policy) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/cpufreq/amd-pstate.c:1080:30: note: uninitialized use occurs here amd_pstate_set_epp(cpudata, epp); ^~~ All three are trivial to fix, but most likely there are additional bugs in this function when the error handling was not really tested. Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Wyes Karny <wyes.karny@amd.com> Reviewed-by: Yuan Perry <Perry.Yuan@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09cpufreq: Make cpufreq_unregister_driver() return voidUwe Kleine-König
All but a few drivers ignore the return value of cpufreq_unregister_driver(). Those few that don't only call it after cpufreq_register_driver() succeeded, in which case the call doesn't fail. Make the function return no value and add a WARN_ON for the case that the function is called in an invalid situation (i.e. without a previous successful call to cpufreq_register_driver()). Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> # brcmstb-avs-cpufreq.c Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09time/debug: Fix memory leak with using debugfs_lookup()Greg Kroah-Hartman
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230202151214.2306822-1-gregkh@linuxfoundation.org
2023-02-09Revert "s390/mem_detect: do not update output parameters on failure"Heiko Carstens
This reverts commit cbc29f107e51b1cc7d1e7b0bbe0691a1224205f1. Get rid of the following smatch warnings: arch/s390/include/asm/mem_detect.h:86 get_mem_detect_end() error: uninitialized symbol 'end'. arch/s390/include/asm/mem_detect.h:86 get_mem_detect_end() error: uninitialized symbol 'end'. arch/s390/boot/vmem.c:256 setup_vmem() error: uninitialized symbol 'start'. arch/s390/boot/vmem.c:258 setup_vmem() error: uninitialized symbol 'end'. Note that there is no bug in the code. This is purely to silence smatch. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09s390/idle: remove arch_cpu_idle_time() and corresponding codeHeiko Carstens
arch_cpu_idle_time() returns the idle time of any given cpu if it is in idle, or zero if not. All if this is racy and partially incorrect. Time stamps taken with store clock extended and store clock fast from different cpus are compared, while the architecture states that this is nothing which can be relied on (see Principles of Operation; Chapter 4, "Setting and Inspecting the Clock"). A more fundamental problem is that the timestamp when a cpu is leaving idle is taken early in the assembler part of the interrupt handler, and this value is only transferred many cycles later to the cpu's per-cpu idle data structure. This per cpu data structure is read by arch_cpu_idle() to tell for which period of time a remote cpu is idle: if only an idle_enter value is present, the assumed idle time of the cpu is calculated by taking a local timestamp and returning the difference of the local timestamp and the idle_enter value. This is potentially incorrect, since the remote cpu may have already left idle, but the taken timestamp may not have been transferred to the per-cpu data structure. This in turn means that too much idle time may be reported for a cpu, and a subsequent calculation of system idle time may result in a smaller value. Instead of coming up with even more complex code trying to fix this, just remove this code, and only account idle time of a cpu, after idle state is left. Another minor bug is that it is assumed that timestamps are non-zero, which is not necessarily the case for timestamps taken with store clock fast. This however is just a very minor problem, since this can only happen when the epoch increases. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09s390/vx: use simple assignments to access __vector128 membersHeiko Carstens
Use simple assignments to access __vector128 members instead of hard to read casts. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09s390/vx: add 64 and 128 bit members to __vector128 structHeiko Carstens
Add 64 and 128 bit members to __vector128 struct in order to allow reading of the complete value, or the higher or lower part of vector register contents instead of having to use casts. Add an explicit __aligned(4) statement to avoid that the alignment of the structure changes from 4 to 8. This should make sure that no breakage happens because of this change. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09MAINTAINERS: add diag288_wdt driver to s390 maintained filesHeiko Carstens
The diag288_wdt watchdog driver is s390 specific. Document who is responsible for this driver. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09MAINTAINERS: add entry for s390 SCM driverVineeth Vijayan
Storage Class Memory driver support for s390 architecture has been there for a while. The original author of this work, Sebastian Ott has left IBM and I am taking over this module. Adding myself as the upstream maintainer for SCM on s390 architecture. Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09s390/processor: always inline cpu flag helper functionsHeiko Carstens
arch_cpu_idle() is marked noinstr and therefore must only call functions which are also not instrumented. Make sure that cpu flag helper functions are always inlined to avoid that the compiler generates an out-of-line function for e.g. the call within arch_cpu_idle(). Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09s390/idle: mark arch_cpu_idle() noinstrHeiko Carstens
linux-next commit ("cpuidle: tracing: Warn about !rcu_is_watching()") adds a new warning which hits on s390's arch_cpu_idle() function: RCU not on for: arch_cpu_idle+0x0/0x28 WARNING: CPU: 2 PID: 0 at include/linux/trace_recursion.h:162 arch_ftrace_ops_list_func+0x24c/0x258 Modules linked in: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.2.0-rc6-next-20230202 #4 Hardware name: IBM 8561 T01 703 (z/VM 7.3.0) Krnl PSW : 0404d00180000000 00000000002b55c0 (arch_ftrace_ops_list_func+0x250/0x258) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: c0000000ffffbfff 0000000080000002 0000000000000026 0000000000000000 0000037ffffe3a28 0000037ffffe3a20 0000000000000000 0000000000000000 0000000000000000 0000000000f4acf6 00000000001044f0 0000037ffffe3cb0 0000000000000000 0000000000000000 00000000002b55bc 0000037ffffe3bb8 Krnl Code: 00000000002b55b0: c02000840051 larl %r2,0000000001335652 00000000002b55b6: c0e5fff512d1 brasl %r14,0000000000157b58 #00000000002b55bc: af000000 mc 0,0 >00000000002b55c0: a7f4ffe7 brc 15,00000000002b558e 00000000002b55c4: 0707 bcr 0,%r7 00000000002b55c6: 0707 bcr 0,%r7 00000000002b55c8: eb6ff0480024 stmg %r6,%r15,72(%r15) 00000000002b55ce: b90400ef lgr %r14,%r15 Call Trace: [<00000000002b55c0>] arch_ftrace_ops_list_func+0x250/0x258 ([<00000000002b55bc>] arch_ftrace_ops_list_func+0x24c/0x258) [<0000000000f5f0fc>] ftrace_common+0x1c/0x20 [<00000000001044f6>] arch_cpu_idle+0x6/0x28 [<0000000000f4acf6>] default_idle_call+0x76/0x128 [<00000000001cc374>] do_idle+0xf4/0x1b0 [<00000000001cc6ce>] cpu_startup_entry+0x36/0x40 [<0000000000119d00>] smp_start_secondary+0x140/0x150 [<0000000000f5d2ae>] restart_int_handler+0x6e/0x90 Mark arch_cpu_idle() noinstr like all other architectures with CONFIG_ARCH_WANTS_NO_INSTR (should) have it to fix this. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09s390/idle: move idle time accounting to account_idle_time_irq()Heiko Carstens
There is no reason to do idle time accounting in arch_cpu_idle(). Do idle time accounting in account_idle_time_irq(), where it belongs to. The accounted values don't change between account_idle_time_irq() and arch_cpu_idle(); so the result is the same. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09Merge branch 'cmpxchg_user_key' into featuresHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-09thermal: intel: powerclamp: Fix duration module parameterSrinivas Pandruvada
After the switch to use the powercap/idle-inject framework in the Intel powerclamp driver, the idle duration unit is microsecond. However, the module parameter for idle duration is in milliseconds, so convert it to microseconds in the "set" callback and back to milliseconds in a new "get" callback. While here, also use mutex protection for setting and getting "duration". The other uses of "duration" are already protected by the mutex. Fixes: 8526eb7fc75a ("thermal: intel: powerclamp: Use powercap idle-inject feature") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09Merge tag 'pm-6.2-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix the incorrect value returned by cpufreq driver's ->get() callback for Qualcomm platforms (Douglas Anderson)" * tag 'pm-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: qcom-hw: Fix cpufreq_driver->get() for non-LMH systems
2023-02-09Merge back cpufreq material for 6.3-rc1.Rafael J. Wysocki
2023-02-09Merge tag 'net-6.2-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can and ipsec subtrees. Current release - regressions: - sched: fix off by one in htb_activate_prios() - eth: mana: fix accessing freed irq affinity_hint - eth: ice: fix out-of-bounds KASAN warning in virtchnl Current release - new code bugs: - eth: mtk_eth_soc: enable special tag when any MAC uses DSA Previous releases - always broken: - core: fix sk->sk_txrehash default - neigh: make sure used and confirmed times are valid - mptcp: be careful on subflow status propagation on errors - xfrm: prevent potential spectre v1 gadget in xfrm_xlate32_attr() - phylink: move phy_device_free() to correctly release phy device - eth: mlx5: - fix crash unsetting rx-vlan-filter in switchdev mode - fix hang on firmware reset - serialize module cleanup with reload and remove" * tag 'net-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) selftests: forwarding: lib: quote the sysctl values net: mscc: ocelot: fix all IPv6 getting trapped to CPU when PTP timestamping is used rds: rds_rm_zerocopy_callback() use list_first_entry() net: txgbe: Update support email address selftests: Fix failing VXLAN VNI filtering test selftests: mptcp: stop tests earlier selftests: mptcp: allow more slack for slow test-case mptcp: be careful on subflow status propagation on errors mptcp: fix locking for in-kernel listener creation mptcp: fix locking for setsockopt corner-case mptcp: do not wait for bare sockets' timeout net: ethernet: mtk_eth_soc: fix DSA TX tag hwaccel for switch port 0 nfp: ethtool: fix the bug of setting unsupported port speed txhash: fix sk->sk_txrehash default net: ethernet: mtk_eth_soc: fix wrong parameters order in __xdp_rxq_info_reg() net: ethernet: mtk_eth_soc: enable special tag when any MAC uses DSA net: sched: sch: Fix off by one in htb_activate_prios() igc: Add ndo_tx_timeout support net: mana: Fix accessing freed irq affinity_hint hv_netvsc: Allocate memory in netvsc_dma_map() with GFP_ATOMIC ...
2023-02-09Merge tag 'for-linus-2023020901' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Benjamin Tissoires: - fix potential infinite loop with a badly crafted HID device (Xin Zhao) - fix regression from 6.1 in USB logitech devices potentially making their mouse wheel not working (Bastien Nocera) - clean up in AMD sensors, which fixes a long time resume bug (Mario Limonciello) - few device small fixes and quirks * tag 'for-linus-2023020901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: Ignore battery for ELAN touchscreen 29DF on HP HID: amd_sfh: if no sensors are enabled, clean up HID: logitech: Disable hi-res scrolling on USB HID: core: Fix deadloop in hid_apply_multiplier. HID: Ignore battery for Elan touchscreen on Asus TP420IA HID: elecom: add support for TrackBall 056E:011C
2023-02-09Merge tag '6.2-rc8-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifx fix from Steve French: "Small fix for use after free" * tag '6.2-rc8-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix use-after-free in rdata->read_into_pages()
2023-02-09block: make kobj_type structures constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20230208-kobj_type-block-v1-1-0b3eafd7d983@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09btrfs: free device in btrfs_close_devices for a single device filesystemAnand Jain
We have this check to make sure we don't accidentally add older devices that may have disappeared and re-appeared with an older generation from being added to an fs_devices (such as a replace source device). This makes sense, we don't want stale disks in our file system. However for single disks this doesn't really make sense. I've seen this in testing, but I was provided a reproducer from a project that builds btrfs images on loopback devices. The loopback device gets cached with the new generation, and then if it is re-used to generate a new file system we'll fail to mount it because the new fs is "older" than what we have in cache. Fix this by freeing the cache when closing the device for a single device filesystem. This will ensure that the mount command passed device path is scanned successfully during the next mount. CC: stable@vger.kernel.org # 5.10+ Reported-by: Daan De Meyer <daandemeyer@fb.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-09btrfs: lock the inode in shared mode before starting fiemapFilipe Manana
Currently fiemap does not take the inode's lock (VFS lock), it only locks a file range in the inode's io tree. This however can lead to a deadlock if we have a concurrent fsync on the file and fiemap code triggers a fault when accessing the user space buffer with fiemap_fill_next_extent(). The deadlock happens on the inode's i_mmap_lock semaphore, which is taken both by fsync and btrfs_page_mkwrite(). This deadlock was recently reported by syzbot and triggers a trace like the following: task:syz-executor361 state:D stack:20264 pid:5668 ppid:5119 flags:0x00004004 Call Trace: <TASK> context_switch kernel/sched/core.c:5293 [inline] __schedule+0x995/0xe20 kernel/sched/core.c:6606 schedule+0xcb/0x190 kernel/sched/core.c:6682 wait_on_state fs/btrfs/extent-io-tree.c:707 [inline] wait_extent_bit+0x577/0x6f0 fs/btrfs/extent-io-tree.c:751 lock_extent+0x1c2/0x280 fs/btrfs/extent-io-tree.c:1742 find_lock_delalloc_range+0x4e6/0x9c0 fs/btrfs/extent_io.c:488 writepage_delalloc+0x1ef/0x540 fs/btrfs/extent_io.c:1863 __extent_writepage+0x736/0x14e0 fs/btrfs/extent_io.c:2174 extent_write_cache_pages+0x983/0x1220 fs/btrfs/extent_io.c:3091 extent_writepages+0x219/0x540 fs/btrfs/extent_io.c:3211 do_writepages+0x3c3/0x680 mm/page-writeback.c:2581 filemap_fdatawrite_wbc+0x11e/0x170 mm/filemap.c:388 __filemap_fdatawrite_range mm/filemap.c:421 [inline] filemap_fdatawrite_range+0x175/0x200 mm/filemap.c:439 btrfs_fdatawrite_range fs/btrfs/file.c:3850 [inline] start_ordered_ops fs/btrfs/file.c:1737 [inline] btrfs_sync_file+0x4ff/0x1190 fs/btrfs/file.c:1839 generic_write_sync include/linux/fs.h:2885 [inline] btrfs_do_write_iter+0xcd3/0x1280 fs/btrfs/file.c:1684 call_write_iter include/linux/fs.h:2189 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x7dc/0xc50 fs/read_write.c:584 ksys_write+0x177/0x2a0 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f7d4054e9b9 RSP: 002b:00007f7d404fa2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f7d405d87a0 RCX: 00007f7d4054e9b9 RDX: 0000000000000090 RSI: 0000000020000000 RDI: 0000000000000006 RBP: 00007f7d405a51d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 61635f65646f6e69 R13: 65646f7475616f6e R14: 7261637369646f6e R15: 00007f7d405d87a8 </TASK> INFO: task syz-executor361:5697 blocked for more than 145 seconds. Not tainted 6.2.0-rc3-syzkaller-00376-g7c6984405241 #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor361 state:D stack:21216 pid:5697 ppid:5119 flags:0x00004004 Call Trace: <TASK> context_switch kernel/sched/core.c:5293 [inline] __schedule+0x995/0xe20 kernel/sched/core.c:6606 schedule+0xcb/0x190 kernel/sched/core.c:6682 rwsem_down_read_slowpath+0x5f9/0x930 kernel/locking/rwsem.c:1095 __down_read_common+0x54/0x2a0 kernel/locking/rwsem.c:1260 btrfs_page_mkwrite+0x417/0xc80 fs/btrfs/inode.c:8526 do_page_mkwrite+0x19e/0x5e0 mm/memory.c:2947 wp_page_shared+0x15e/0x380 mm/memory.c:3295 handle_pte_fault mm/memory.c:4949 [inline] __handle_mm_fault mm/memory.c:5073 [inline] handle_mm_fault+0x1b79/0x26b0 mm/memory.c:5219 do_user_addr_fault+0x69b/0xcb0 arch/x86/mm/fault.c:1428 handle_page_fault arch/x86/mm/fault.c:1519 [inline] exc_page_fault+0x7a/0x110 arch/x86/mm/fault.c:1575 asm_exc_page_fault+0x22/0x30 arch/x86/include/asm/idtentry.h:570 RIP: 0010:copy_user_short_string+0xd/0x40 arch/x86/lib/copy_user_64.S:233 Code: 74 0a 89 (...) RSP: 0018:ffffc9000570f330 EFLAGS: 00050202 RAX: ffffffff843e6601 RBX: 00007fffffffefc8 RCX: 0000000000000007 RDX: 0000000000000000 RSI: ffffc9000570f3e0 RDI: 0000000020000120 RBP: ffffc9000570f490 R08: 0000000000000000 R09: fffff52000ae1e83 R10: fffff52000ae1e83 R11: 1ffff92000ae1e7c R12: 0000000000000038 R13: ffffc9000570f3e0 R14: 0000000020000120 R15: ffffc9000570f3e0 copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline] raw_copy_to_user arch/x86/include/asm/uaccess_64.h:58 [inline] _copy_to_user+0xe9/0x130 lib/usercopy.c:34 copy_to_user include/linux/uaccess.h:169 [inline] fiemap_fill_next_extent+0x22e/0x410 fs/ioctl.c:144 emit_fiemap_extent+0x22d/0x3c0 fs/btrfs/extent_io.c:3458 fiemap_process_hole+0xa00/0xad0 fs/btrfs/extent_io.c:3716 extent_fiemap+0xe27/0x2100 fs/btrfs/extent_io.c:3922 btrfs_fiemap+0x172/0x1e0 fs/btrfs/inode.c:8209 ioctl_fiemap fs/ioctl.c:219 [inline] do_vfs_ioctl+0x185b/0x2980 fs/ioctl.c:810 __do_sys_ioctl fs/ioctl.c:868 [inline] __se_sys_ioctl+0x83/0x170 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f7d4054e9b9 RSP: 002b:00007f7d390d92f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f7d405d87b0 RCX: 00007f7d4054e9b9 RDX: 0000000020000100 RSI: 00000000c020660b RDI: 0000000000000005 RBP: 00007f7d405a51d0 R08: 00007f7d390d9700 R09: 0000000000000000 R10: 00007f7d390d9700 R11: 0000000000000246 R12: 61635f65646f6e69 R13: 65646f7475616f6e R14: 7261637369646f6e R15: 00007f7d405d87b8 </TASK> What happens is the following: 1) Task A is doing an fsync, enters btrfs_sync_file() and flushes delalloc before locking the inode and the i_mmap_lock semaphore, that is, before calling btrfs_inode_lock(); 2) After task A flushes delalloc and before it calls btrfs_inode_lock(), another task dirties a page; 3) Task B starts a fiemap without FIEMAP_FLAG_SYNC, so the page dirtied at step 2 remains dirty and unflushed. Then when it enters extent_fiemap() and it locks a file range that includes the range of the page dirtied in step 2; 4) Task A calls btrfs_inode_lock() and locks the inode (VFS lock) and the inode's i_mmap_lock semaphore in write mode. Then it tries to flush delalloc by calling start_ordered_ops(), which will block, at find_lock_delalloc_range(), when trying to lock the range of the page dirtied at step 2, since this range was locked by the fiemap task (at step 3); 5) Task B generates a page fault when accessing the user space fiemap buffer with a call to fiemap_fill_next_extent(). The fault handler needs to call btrfs_page_mkwrite() for some other page of our inode, and there we deadlock when trying to lock the inode's i_mmap_lock semaphore in read mode, since the fsync task locked it in write mode (step 4) and the fsync task can not progress because it's waiting to lock a file range that is currently locked by us (the fiemap task, step 3). Fix this by taking the inode's lock (VFS lock) in shared mode when entering fiemap. This effectively serializes fiemap with fsync (except the most expensive part of fsync, the log sync), preventing this deadlock. Reported-by: syzbot+cc35f55c41e34c30dcb5@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/00000000000032dc7305f2a66f46@google.com/ CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-09Revert "drm/amd/display: disable S/G display on DCN 3.1.5"Alex Deucher
This reverts commit 3cc67fe1b3aa1ac4720e002f2aa2d08c9199a584. Some users have reported flickerng with S/G display. We've tried extensively to reproduce and debug the issue on a wide variety of platform configurations (DRAM bandwidth, etc.) and a variety of monitors, but so far have not been able to. We disabled S/G display on a number of platforms to address this but that leads to failure to pin framebuffers errors and blank displays when there is memory pressure or no displays at all on systems with limited carveout (e.g., Chromebooks). We have a parameter to disable this as a debugging option as a way for users to disable this, depending on their use case, and for us to help debug this further. Having this enabled seems like the lesser of to evils. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>