linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-12-29	buffer: return bool from grow_dev_folio()	Matthew Wilcox (Oracle)
	Patch series "More buffer_head cleanups", v2. The first patch is a left-over from last cycle. The rest fix "obvious" block size > PAGE_SIZE problems. I haven't tested with a large block size setup (but I have done an ext4 xfstests run). This patch (of 7): Rename grow_dev_page() to grow_dev_folio() and make it return a bool. Document what that bool means; it's more subtle than it first appears. Also rename the 'failed' label to 'unlock' beacuse it's not exactly 'failed'. It just hasn't succeeded. Link: https://lkml.kernel.org/r/20231109210608.2252323-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Hannes Reinecke <hare@suse.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	Merge tag 'gpio-fixes-for-v6.7-rc8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - Andy steps down as GPIO reviewer - Kent becomes a reviewer for GPIO uAPI - add missing intel file to the relevant MAINTAINERS section * tag 'gpio-fixes-for-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: MAINTAINERS: Add a missing file to the INTEL GPIO section MAINTAINERS: Remove Andy from GPIO maintainers MAINTAINERS: split out the uAPI into a new section
2023-12-29	Merge tag 'platform-drivers-x86-v6.7-6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - Intel PMC GBE LTR regression - P2SB / PCI deadlock fix * tag 'platform-drivers-x86-v6.7-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel/pmc: Move GBE LTR ignore to suspend callback platform/x86/intel/pmc: Allow reenabling LTRs platform/x86/intel/pmc: Add suspend callback platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
2023-12-29	Merge tag 'block-6.7-2023-12-29' of git://git.kernel.dk/linux	Linus Torvalds
	Pull block fixes from Jens Axboe: "Fix for a badly numbered flag, and a regression fix for the badblocks updates from this merge window" * tag 'block-6.7-2023-12-29' of git://git.kernel.dk/linux: block: renumber QUEUE_FLAG_HW_WC badblocks: avoid checking invalid range in badblocks_check()
2023-12-29	mailmap: add entries for Mathieu Othacehe	Mathieu Othacehe
	Add my gnu.org mail address. Link: https://lkml.kernel.org/r/20231223144226.25740-1-othacehe@gnu.org Signed-off-by: Mathieu Othacehe <othacehe@gnu.org> Cc: Bjorn Andersson <quic_bjorande@quicinc.com> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> Cc: Matthieu Baerts <matttbe@kernel.org> Cc: Matt Ranostay <matt@ranostay.sg> Cc: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	MAINTAINERS: change vmware.com addresses to broadcom.com	Zack Rusin
	Update the email addresses for vmwgfx and vmmouse to reflect the fact that VMware is now part of Broadcom. Add a .mailmap entry because the vmware.com address will start bouncing soon. Link: https://lkml.kernel.org/r/20231224052036.603621-1-zack.rusin@broadcom.com Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Cc: Ian Forbes <ian.forbes@broadcom.com> Cc: Martin Krastev <martin.krastev@broadcom.com> Cc: Maaz Mombasawala <maaz.mombasawala@broadcom.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	arch/mm/fault: fix major fault accounting when retrying under per-VMA lock	Suren Baghdasaryan
	A test [1] in Android test suite started failing after [2] was merged. It turns out that after handling a major fault under per-VMA lock, the process major fault counter does not register that fault as major. Before [2] read faults would be done under mmap_lock, in which case FAULT_FLAG_TRIED flag is set before retrying. That in turn causes mm_account_fault() to account the fault as major once retry completes. With per-VMA locks we often retry because a fault can't be handled without locking the whole mm using mmap_lock. Therefore such retries do not set FAULT_FLAG_TRIED flag. This logic does not work after [2] because we can now handle read major faults under per-VMA lock and upon retry the fact there was a major fault gets lost. Fix this by setting FAULT_FLAG_TRIED after retrying under per-VMA lock if VM_FAULT_MAJOR was returned. Ideally we would use an additional VM_FAULT bit to indicate the reason for the retry (could not handle under per-VMA lock vs other reason) but this simpler solution seems to work, so keeping it simple. [1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp [2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/ Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm/mglru: skip special VMAs in lru_gen_look_around()	Yu Zhao
	Special VMAs like VM_PFNMAP can contain anon pages from COW. There isn't much profit in doing lookaround on them. Besides, they can trigger the pte_special() warning in get_pte_pfn(). Skip them in lru_gen_look_around(). Link: https://lkml.kernel.org/r/20231223045647.1566043-1-yuzhao@google.com Fixes: 018ee47f1489 ("mm: multi-gen LRU: exploit locality in rmap") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: syzbot+03fd9b3f71641f0ebf2d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/000000000000f9ff00060d14c256@google.com/ Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	MAINTAINERS: hand over hwpoison maintainership to Miaohe Lin	Naoya Horiguchi
	Miaohe Lin has contributed to hwpoison subsystem as a reviewer for more than 1.5 year, and has made many patch contributions in hwpoison subsystem and the memory management subsystem. So I'd like to pass on the hwpoison maintainership to Miaohe. [nao.horiguchi@gmail.com: update to keep myself as a reviewer] Link: https://lkml.kernel.org/r/20231223031115.GA2883156@u2004 Link: https://lkml.kernel.org/r/20231222024024.1601043-1-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	MAINTAINERS: remove hugetlb maintainer Mike Kravetz	Mike Kravetz
	I am stepping away from my role as hugetlb maintainer. There should be no gap in coverage as Muchun Song is also a hugetlb maintainer. [akpm@linux-foundation.org: update CREDITS] Link: https://lkml.kernel.org/r/20231220220843.73586-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm: fix unmap_mapping_range high bits shift bug	Jiajun Xie
	The bug happens when highest bit of holebegin is 1, suppose holebegin is 0x8000000111111000, after shift, hba would be 0xfff8000000111111, then vma_interval_tree_foreach would look it up fail or leads to the wrong result. error call seq e.g.: - mmap(..., offset=0x8000000111111000) \|- syscall(mmap, ... unsigned long, off): \|- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT); here pgoff is correctly shifted to 0x8000000111111, but pass 0x8000000111111000 as holebegin to unmap would then cause terrible result, as shown below: - unmap_mapping_range(..., loff_t const holebegin) \|- pgoff_t hba = holebegin >> PAGE_SHIFT; /* hba = 0xfff8000000111111 unexpectedly / The issue happens in Heterogeneous computing, where the device(e.g. gpu) and host share the same virtual address space. A simple workflow pattern which hit the issue is: / host / 1. userspace first mmap a file backed VA range with specified offset. e.g. (offset=0x800..., mmap return: va_a) 2. write some data to the corresponding sys page e.g. (va_a = 0xAABB) / device / 3. gpu workload touches VA, triggers gpu fault and notify the host. / host / 4. reviced gpu fault notification, then it will: 4.1 unmap host pages and also takes care of cpu tlb (use unmap_mapping_range with offset=0x800...) 4.2 migrate sys page to device 4.3 setup device page table and resolve device fault. / device / 5. gpu workload continued, it accessed va_a and got 0xAABB. 6. gpu workload continued, it wrote 0xBBCC to va_a. / host */ 7. userspace access va_a, as expected, it will: 7.1 trigger cpu vm fault. 7.2 driver handling fault to migrate gpu local page to host. 8. userspace then could correctly get 0xBBCC from va_a 9. done But in step 4.1, if we hit the bug this patch mentioned, then userspace would never trigger cpu fault, and still get the old value: 0xAABB. Making holebegin unsigned first fixes the bug. Link: https://lkml.kernel.org/r/20231220052839.26970-1-jiajun.xie.sh@gmail.com Signed-off-by: Jiajun Xie <jiajun.xie.sh@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm: memcg: fix split queue list crash when large folio migration	Baolin Wang
	When running autonuma with enabling multi-size THP, I encountered the following kernel crash issue: [ 134.290216] list_del corruption. prev->next should be fffff9ad42e1c490, but was dead000000000100. (prev=fffff9ad42399890) [ 134.290877] kernel BUG at lib/list_debug.c:62! [ 134.291052] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 134.291210] CPU: 56 PID: 8037 Comm: numa01 Kdump: loaded Tainted: G E 6.7.0-rc4+ #20 [ 134.291649] RIP: 0010:__list_del_entry_valid_or_report+0x97/0xb0 ...... [ 134.294252] Call Trace: [ 134.294362] <TASK> [ 134.294440] ? die+0x33/0x90 [ 134.294561] ? do_trap+0xe0/0x110 ...... [ 134.295681] ? __list_del_entry_valid_or_report+0x97/0xb0 [ 134.295842] folio_undo_large_rmappable+0x99/0x100 [ 134.296003] destroy_large_folio+0x68/0x70 [ 134.296172] migrate_folio_move+0x12e/0x260 [ 134.296264] ? __pfx_remove_migration_pte+0x10/0x10 [ 134.296389] migrate_pages_batch+0x495/0x6b0 [ 134.296523] migrate_pages+0x1d0/0x500 [ 134.296646] ? __pfx_alloc_misplaced_dst_folio+0x10/0x10 [ 134.296799] migrate_misplaced_folio+0x12d/0x2b0 [ 134.296953] do_numa_page+0x1f4/0x570 [ 134.297121] __handle_mm_fault+0x2b0/0x6c0 [ 134.297254] handle_mm_fault+0x107/0x270 [ 134.300897] do_user_addr_fault+0x167/0x680 [ 134.304561] exc_page_fault+0x65/0x140 [ 134.307919] asm_exc_page_fault+0x22/0x30 The reason for the crash is that, the commit 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") removed the charging and uncharging operations of the migration folios and cleared the memcg data of the old folio. During the subsequent release process of the old large folio in destroy_large_folio(), if the large folio needs to be removed from the split queue, an incorrect split queue can be obtained (which is pgdat->deferred_split_queue) because the old folio's memcg is NULL now. This can lead to list operations being performed under the wrong split queue lock protection, resulting in a list crash as above. After the migration, the old folio is going to be freed, so we can remove it from the split queue in mem_cgroup_migrate() a bit earlier before clearing the memcg data to avoid getting incorrect split queue. [akpm@linux-foundation.org: fix comment, per Zi Yan] Link: https://lkml.kernel.org/r/61273e5e9b490682388377c20f52d19de4a80460.1703054559.git.baolin.wang@linux.alibaba.com Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm: fix arithmetic for max_prop_frac when setting max_ratio	Jingbo Xu
	Since now bdi->max_ratio is part per million, fix the wrong arithmetic for max_prop_frac when setting max_ratio. Otherwise the miscalculated max_prop_frac will affect the incrementing of writeout completion count when max_ratio is not 100%. Link: https://lkml.kernel.org/r/20231219142508.86265-3-jefflexu@linux.alibaba.com Fixes: efc3e6ad53ea ("mm: split off __bdi_set_max_ratio() function") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm: fix arithmetic for bdi min_ratio	Jingbo Xu
	Since now bdi->min_ratio is part per million, fix the wrong arithmetic. Otherwise it will fail with -EINVAL when setting a reasonable min_ratio, as it tries to set min_ratio to (min_ratio * BDI_RATIO_SCALE) in percentage unit, which exceeds 100% anyway. # cat /sys/class/bdi/253\:0/min_ratio 0 # cat /sys/class/bdi/253\:0/max_ratio 100 # echo 1 > /sys/class/bdi/253\:0/min_ratio -bash: echo: write error: Invalid argument Link: https://lkml.kernel.org/r/20231219142508.86265-2-jefflexu@linux.alibaba.com Fixes: 8021fb3232f2 ("mm: split off __bdi_set_min_ratio() function") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm: align larger anonymous mappings on THP boundaries	Rik van Riel
	Align larger anonymous memory mappings on THP boundaries by going through thp_get_unmapped_area if THPs are enabled for the current process. With this patch, larger anonymous mappings are now THP aligned. When a malloc library allocates a 2MB or larger arena, that arena can now be mapped with THPs right from the start, which can result in better TLB hit rates and execution time. Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com Link: https://lkml.kernel.org/r/20231214223423.1133074-1-yang@os.amperecomputing.com Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christopher Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	ACPI: button: trigger wakeup key events	Ken Xue
	Andorid can wakeup from various wakeup sources, but only several wakeup sources can wake up screen with right events(POWER, WAKEUP) from input device. Regarding pressing acpi power button, it can resume system and ACPI_BITMASK_WAKE_STATUS and ACPI_BITMASK_POWER_BUTTON_STATUS are set in pm1a_sts, but kernel does not report any key event to user space during resuming by default. So, send wakeup key event to user space during resume from power button. Signed-off-by: Ken Xue <Ken.Xue@amd.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	ACPI: resource: Add another DMI match for the TongFang GMxXGxx	Hans de Goede
	The TongFang GMxXGxx, which needs IRQ overriding for the keyboard to work, is also sold as the Eluktronics RP-15 which does not use the standard TongFang GMxXGxx DMI board_name. Add an entry for this laptop to the irq1_edge_low_force_override[] DMI table to make the internal keyboard functional. Reported-by: Luis Acuna <ldacuna@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	cpuidle: haltpoll: Do not enable interrupts when entering idle	Borislav Petkov (AMD)
	The cpuidle drivers' ->enter() methods are supposed to be IRQ invariant: 5e26aa933911 ("cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls") bb7b11258561 ("cpuidle: Move IRQ state validation") Do that in the haltpoll driver too. Fixes: 5e26aa933911 ("cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218245 Reported-by: <forza@tnonline.net> Tested-by: <forza@tnonline.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	ASoC: mediatek: mt8186: fix AUD_PAD_TOP register and offset	Eugen Hristev
	AUD_PAD_TOP widget's correct register is AFE_AUD_PAD_TOP , and not zero. Having a zero as register, it would mean that the `snd_soc_dapm_new_widgets` would try to read the register at offset zero when trying to get the power status of this widget, which is incorrect. Fixes: b65c466220b3 ("ASoC: mediatek: mt8186: support adda in platform driver") Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com> Link: https://lore.kernel.org/r/20231229114342.195867-1-eugen.hristev@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-12-29	thermal: gov_power_allocator: Support new update callback of weights	Lukasz Luba
	When the thermal instance's weight is updated from the sysfs the governor update_tz() callback is triggered. Implement proper reaction to this event in the IPA, which would save CPU cycles spent in throttle(). This will speed-up the main throttle() IPA function and clean it up a bit. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal/sysfs: Update governors when the 'weight' has changed	Lukasz Luba
	Support governors update when the thermal instance's weight has changed. This allows to adjust internal state for the governor. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Add two empty code lines aroung the locking ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal/sysfs: Update instance->weight under tz lock	Lukasz Luba
	User space can change the weight of a thermal instance via sysfs while the .throttle() callback is running for a governor, because weight_store() does not use the zone lock. The IPA governor uses instance weight values for power calculations and caches the sum of them as total_weight, so it gets confused when one of them changes while its .throttle() callback is running. To prevent that from happening, use thermal zone locking in weight_store(). Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: gov_power_allocator: Simplify checks for valid power actor	Lukasz Luba
	There is a need to check if the cooling device in the thermal zone supports IPA callback and is set for control trip point. Refactor the code which validates the power actor capabilities and make it more consistent in all places. No intentional functional impact. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: gov_power_allocator: Move memory allocation out of throttle()	Lukasz Luba
	The new thermal callback allows to react to the change of cooling instances in the thermal zone. Move the memory allocation to that new callback and save CPU cycles in the throttle() code path. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: gov_power_allocator: Change trace functions	Lukasz Luba
	Change trace event trace_thermal_power_allocator() to not use dynamic array for requested power and granted power for all power actors. Instead, simplify the trace event and print other simple values. Add new trace event to print power actor information of requested power and granted power. That trace event would be called in a loop for each power actor. The trace data would be easier to parse comparing to the dynamic array implementation. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: gov_power_allocator: Refactor checks in divvy_up_power()	Lukasz Luba
	Simplify the code and remove one extra 'if' block. No intentional functional impact. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: gov_power_allocator: Refactor check_power_actors()	Lukasz Luba
	In preparation for a subsequent change, rearrange check_power_actors(). No intentional functional impact. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: core: Add governor callback for thermal zone change	Lukasz Luba
	Add a new callback to the struct thermal_governor. It can be used for updating governors when there is a change in the thermal zone internals, e.g. thermal cooling device is bind to the thermal zone. That makes possible to move some heavy operations like memory allocations related to the number of cooling instances out of the throttle() callback. Both callback code paths (throttle() and update_tz()) are protected with the same thermal zone lock, which guaranties the consistency. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	cifs: do not depend on release_iface for maintaining iface_list	Shyam Prasad N
	parse_server_interfaces should be in complete charge of maintaining the iface_list linked list. Today, iface entries are removed from the list only when the last refcount is dropped. i.e. in release_iface. However, this can result in undercounting of refcount if the server stops advertising interfaces (which Azure SMB server does). This change puts parse_server_interfaces in full charge of maintaining the iface_list. So if an empty list is returned by the server, the entries in the list will immediately be removed. This way, a following call to the same function will not find entries in the list. Fixes: aa45dadd34e4 ("cifs: change iface_list from array to sorted linked list") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-29	cifs: cifs_chan_is_iface_active should be called with chan_lock held	Shyam Prasad N
	cifs_chan_is_iface_active checks the channels of a session to see if the associated iface is active. This should always happen with chan_lock held. However, these two callers of this function were missing this locking. This change makes sure the function calls are protected with proper locking. Fixes: b54034a73baf ("cifs: during reconnect, update interface if necessary") Fixes: fa1d0508bdd4 ("cifs: account for primary channel in the interface list") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-29	cifs: after disabling multichannel, mark tcon for reconnect	Shyam Prasad N
	Once the server disables multichannel for an active multichannel session, on the following reconnect, the client would reduce the number of channels to 1. However, it could be the case that the tree connect was active on one of these disabled channels. This results in an unrecoverable state. This change fixes that by making sure that whenever a channel is being terminated, the session and tcon are marked for reconnect too. This could mean a few redundant tree connect calls to the server, but considering that this is not a frequent event, we should be okay. Fixes: ee1d21794e55 ("cifs: handle when server stops supporting multichannel") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-29	ALSA: scarlett2: Convert meter levels from little-endian	Geoffrey D. Bennett
	Add missing conversion from little-endian data to CPU-endian in scarlett2_usb_get_meter_levels(). Fixes: 3473185f31df ("ALSA: scarlett2: Remap Level Meter values") Signed-off-by: Geoffrey D. Bennett <g@b4.vu> Link: https://lore.kernel.org/r/ZYsBIE3DSKdi4YC/@m.b4.vu Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29	tracing: Fix blocked reader of snapshot buffer	Steven Rostedt (Google)
	If an application blocks on the snapshot or snapshot_raw files, expecting to be woken up when a snapshot occurs, it will not happen. Or it may happen with an unexpected result. That result is that the application will be reading the main buffer instead of the snapshot buffer. That is because when the snapshot occurs, the main and snapshot buffers are swapped. But the reader has a descriptor still pointing to the buffer that it originally connected to. This is fine for the main buffer readers, as they may be blocked waiting for a watermark to be hit, and when a snapshot occurs, the data that the main readers want is now on the snapshot buffer. But for waiters of the snapshot buffer, they are waiting for an event to occur that will trigger the snapshot and they can then consume it quickly to save the snapshot before the next snapshot occurs. But to do this, they need to read the new snapshot buffer, not the old one that is now receiving new data. Also, it does not make sense to have a watermark "buffer_percent" on the snapshot buffer, as the snapshot buffer is static and does not receive new data except all at once. Link: https://lore.kernel.org/linux-trace-kernel/20231228095149.77f5b45d@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: debdd57f5145f ("tracing: Make a snapshot feature available from userspace") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29	ring-buffer: Fix wake ups when buffer_percent is set to 100	Steven Rostedt (Google)
	The tracefs file "buffer_percent" is to allow user space to set a water-mark on how much of the tracing ring buffer needs to be filled in order to wake up a blocked reader. 0 - is to wait until any data is in the buffer 1 - is to wait for 1% of the sub buffers to be filled 50 - would be half of the sub buffers are filled with data 100 - is not to wake the waiter until the ring buffer is completely full Unfortunately the test for being full was: dirty = ring_buffer_nr_dirty_pages(buffer, cpu); return (dirty * 100) > (full * nr_pages); Where "full" is the value for "buffer_percent". There is two issues with the above when full == 100. 1. dirty * 100 > 100 * nr_pages will never be true That is, the above is basically saying that if the user sets buffer_percent to 100, more pages need to be dirty than exist in the ring buffer! 2. The page that the writer is on is never considered dirty, as dirty pages are only those that are full. When the writer goes to a new sub-buffer, it clears the contents of that sub-buffer. That is, even if the check was ">=" it would still not be equal as the most pages that can be considered "dirty" is nr_pages - 1. To fix this, add one to dirty and use ">=" in the compare. Link: https://lore.kernel.org/linux-trace-kernel/20231226125902.4a057f1d@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 03329f9939781 ("tracing: Add tracefs file buffer_percentage") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-29	platform/x86/intel/pmc: Move GBE LTR ignore to suspend callback	David E. Box
	Commit 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()") caused a network performance regression due to the GBE LTR ignore that it added at probe. This was needed in order to allow the SoC to enter the deepest Package C state. To fix the regression and at least support PC10 during suspend, move the LTR ignore from probe to the suspend callback, and enable it again on resume. This solution will allow PC10 during suspend but restrict Package C entry at runtime to no deeper than PC8/9 while a network cable it attach to the PCH LAN. Fixes: 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()") Signed-off-by: "David E. Box" <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20231223032548.1680738-6-david.e.box@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-12-29	platform/x86/intel/pmc: Allow reenabling LTRs	David E. Box
	Commit 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()") caused a network performance regression due to the GBE LTR ignore that it added during probe. The fix will move the ignore to occur at suspend-time (so as to not affect suspend power). This will require the ability to enable the LTR again on resume. Modify pmc_core_send_ltr_ignore() to allow enabling an LTR. Fixes: 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()") Signed-off-by: "David E. Box" <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20231223032548.1680738-5-david.e.box@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-12-29	platform/x86/intel/pmc: Add suspend callback	David E. Box
	Add a suspend callback to struct pmc for performing platform specific tasks before device suspend. This is needed in order to perform GBE LTR ignore on certain platforms at suspend-time instead of at probe-time and replace the GBE LTR ignore removal that was done in order to fix a bug introduced by commit 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()"). Fixes: 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()") Signed-off-by: "David E. Box" <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20231223032548.1680738-4-david.e.box@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-12-29	platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe	Shin'ichiro Kawasaki
	p2sb_bar() unhides P2SB device to get resources from the device. It guards the operation by locking pci_rescan_remove_lock so that parallel rescans do not find the P2SB device. However, this lock causes deadlock when PCI bus rescan is triggered by /sys/bus/pci/rescan. The rescan locks pci_rescan_remove_lock and probes PCI devices. When PCI devices call p2sb_bar() during probe, it locks pci_rescan_remove_lock again. Hence the deadlock. To avoid the deadlock, do not lock pci_rescan_remove_lock in p2sb_bar(). Instead, do the lock at fs_initcall. Introduce p2sb_cache_resources() for fs_initcall which gets and caches the P2SB resources. At p2sb_bar(), refer the cache and return to the caller. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Fixes: 9745fb07474f ("platform/x86/intel: Add Primary to Sideband (P2SB) bridge support") Cc: stable@vger.kernel.org Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/ Link: https://lore.kernel.org/r/20231229063912.2517922-2-shinichiro.kawasaki@wdc.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-12-29	ALSA: hda/tas2781: remove sound controls in unbind	Gergo Koteles
	Remove sound controls in hda_unbind to make module loadable after module unload. Add a driver specific struct (tas2781_hda) to store the controls. This patch depends on patch: ALSA: hda/tas2781: do not use regcache Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") CC: stable@vger.kernel.org Signed-off-by: Gergo Koteles <soyer@irl.hu> Link: https://lore.kernel.org/r/362aa3e2f81b9259a3e5222f576bec5debfc5e88.1703204848.git.soyer@irl.hu Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29	ALSA: hda/tas2781: move set_drv_data outside tasdevice_init	Gergo Koteles
	allow driver specific driver data in tas2781-hda-i2c and tas2781-i2c Fixes: ef3bcde75d06 ("ASoC: tas2781: Add tas2781 driver") CC: stable@vger.kernel.org Signed-off-by: Gergo Koteles <soyer@irl.hu> Link: https://lore.kernel.org/r/1398bd8bf3e935b1595a99128320e4a1913e210a.1703204848.git.soyer@irl.hu Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29	ALSA: hda/tas2781: fix typos in comment	Gergo Koteles
	Correct typos. Signed-off-by: Gergo Koteles <soyer@irl.hu> Link: https://lore.kernel.org/r/ead5609d63e71e8e87c13e1767decca5b272d696.1703203812.git.soyer@irl.hu Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29	ALSA: hda/tas2781: do not use regcache	Gergo Koteles
	There are two problems with using regcache in this module. The amplifier has 3 addressing levels (BOOK, PAGE, REG). The firmware contains blocks that must be written to BOOK 0x8C. The regcache doesn't know anything about BOOK, so regcache_sync writes invalid values to the actual BOOK. The module handles 2 or more separate amplifiers. The amplifiers have different register values, and the module uses only one regmap/regcache for all the amplifiers. The regcache_sync only writes the last amplifier used. The module successfully restores all the written register values (RC profile, program, configuration, calibration) without regcache. Remove regcache functions and set regmap cache_type to REGCACHE_NONE. Link: https://lore.kernel.org/r/21a183b5a08cb23b193af78d4b1114cc59419272.1701906455.git.soyer@irl.hu/ Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") Acked-by: Mark Brown <broonie@kernel.org> CC: stable@vger.kernel.org Signed-off-by: Gergo Koteles <soyer@irl.hu> Link: https://lore.kernel.org/r/491aeed0e2eecc3b704ec856f815db21bad3ba0e.1703202126.git.soyer@irl.hu Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29	sched/fair: Fix tg->load when offlining a CPU	Vincent Guittot
	When a CPU is taken offline, the contribution of its cfs_rqs to task_groups' load may remain and will negatively impact the calculation of the share of the online CPUs. To fix this bug, clear the contribution of an offlining CPU to task groups' load and skip its contribution while it is inactive. Here's the reproducer of the anomaly, by Imran Khan: "So far I have encountered only one rather lengthy way of reproducing this issue, which is as follows: 1. Take a KVM guest (booted with 4 CPUs and can be scaled up to 124 CPUs) and create 2 custom cgroups: /sys/fs/cgroup/cpu/test_group_1 and /sys/fs/cgroup/ cpu/test_group_2 2. Assign a CPU intensive workload to each of these cgroups and start the workload. For my tests I am using following app: int main(int argc, char *argv[]) { unsigned long count, i, val; if (argc != 2) { printf("usage: ./a.out <number of random nums to generate> \n"); return 0; } count = strtoul(argv[1], NULL, 10); printf("Generating %lu random numbers \n", count); for (i = 0; i < count; i++) { val = rand(); val = val % 2; //usleep(1); } printf("Generated %lu random numbers \n", count); return 0; } Also since the system is booted with 4 CPUs, in order to completely load the system I am also launching 4 instances of same test app under: /sys/fs/cgroup/cpu/ 3. We can see that both of the cgroups get similar CPU time: # systemd-cgtop --depth 1 Path Tasks %CPU Memory Input/s Output/s / 659 - 5.5G - - /system.slice - - 5.7G - - /test_group_1 4 - - - - /test_group_2 3 - - - - /user.slice 31 - 56.5M - - Path Tasks %CPU Memory Input/s Output/s / 659 394.6 5.5G - - /test_group_2 3 65.7 - - - /user.slice 29 55.1 48.0M - - /test_group_1 4 47.3 - - - /system.slice - 2.2 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 394.8 5.5G - - /test_group_1 4 62.9 - - - /user.slice 28 44.9 54.2M - - /test_group_2 3 44.7 - - - /system.slice - 0.9 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 394.4 5.5G - - /test_group_2 3 58.8 - - - /test_group_1 4 51.9 - - - /user.slice 30 39.3 59.6M - - /system.slice - 1.9 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 394.7 5.5G - - /test_group_1 4 60.9 - - - /test_group_2 3 57.9 - - - /user.slice 28 43.5 36.9M - - /system.slice - 3.0 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 395.0 5.5G - - /test_group_1 4 66.8 - - - /test_group_2 3 56.3 - - - /user.slice 29 43.1 51.8M - - /system.slice - 0.7 5.7G - - 4. Now move systemd-udevd to one of these test groups, say test_group_1, and perform scale up to 124 CPUs followed by scale down back to 4 CPUs from the host side. 5. Run the same workload i.e 4 instances of CPU hogger under /sys/fs/cgroup/cpu and one instance of CPU hogger each in /sys/fs/cgroup/cpu/test_group_1 and /sys/fs/cgroup/test_group_2. It can be seen that test_group_1 (the one where systemd-udevd was moved) is getting much less CPU time than the test_group_2, even though at this point of time both of these groups have only CPU hogger running: # systemd-cgtop --depth 1 Path Tasks %CPU Memory Input/s Output/s / 1219 - 5.4G - - /system.slice - - 5.6G - - /test_group_1 4 - - - - /test_group_2 3 - - - - /user.slice 26 - 91.3M - - Path Tasks %CPU Memory Input/s Output/s / 1221 394.3 5.4G - - /test_group_2 3 82.7 - - - /test_group_1 4 14.3 - - - /system.slice - 0.8 5.6G - - /user.slice 26 0.4 91.2M - - Path Tasks %CPU Memory Input/s Output/s / 1221 394.6 5.4G - - /test_group_2 3 67.4 - - - /system.slice - 24.6 5.6G - - /test_group_1 4 12.5 - - - /user.slice 26 0.4 91.2M - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.2 5.4G - - /test_group_2 3 60.9 - - - /system.slice - 27.9 5.6G - - /test_group_1 4 12.2 - - - /user.slice 26 0.4 91.2M - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.2 5.4G - - /test_group_2 3 69.4 - - - /test_group_1 4 13.9 - - - /user.slice 28 1.6 92.0M - - /system.slice - 1.0 5.6G - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.6 5.4G - - /test_group_2 3 59.3 - - - /test_group_1 4 14.1 - - - /user.slice 28 1.3 92.2M - - /system.slice - 0.7 5.6G - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.5 5.4G - - /test_group_2 3 67.2 - - - /test_group_1 4 11.5 - - - /user.slice 28 1.3 92.5M - - /system.slice - 0.6 5.6G - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.1 5.4G - - /test_group_2 3 76.8 - - - /test_group_1 4 12.9 - - - /user.slice 28 1.3 92.8M - - /system.slice - 1.2 5.6G - - From sched_debug data it can be seen that in bad case the load.weight of per-CPU sched entities corresponding to test_group_1 has reduced significantly and also load_avg of test_group_1 remains much higher than that of test_group_2, even though systemd-udevd stopped running long time back and at this point of time both cgroups just have the CPU hogger app as running entity." [ mingo: Added details from the original discussion, plus minor edits to the patch. ] Reported-by: Imran Khan <imran.f.khan@oracle.com> Tested-by: Imran Khan <imran.f.khan@oracle.com> Tested-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Imran Khan <imran.f.khan@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Link: https://lore.kernel.org/r/20231223111545.62135-1-vincent.guittot@linaro.org
2023-12-29	ptp: ocp: fix bug in unregistering the DPLL subsystem	Sagi Maimon
	When unregistering the DPLL subsystem the priv pointer is different then the one used for registration which cause failure in unregistering. Fixes: 09eeb3aecc6c ("ptp_ocp: implement DPLL ops") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29	xfs: use the op name in trace_xlog_intent_recovery_failed	Christoph Hellwig
	Instead of tracing the address of the recovery handler, use the name in the defer op, similar to other defer ops related tracepoints. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-29	xfs: fix a use after free in xfs_defer_finish_recovery	Christoph Hellwig
	dfp will be freed by ->recover_work and thus the tracepoint in case of an error can lead to a use after free. Store the defer ops in a local variable to avoid that. Fixes: 7f2f7531e0d4 ("xfs: store an ops pointer in struct xfs_defer_pending") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-29	xfs: turn the XFS_DA_OP_REPLACE checks in xfs_attr_shortform_addname into ↵	Christoph Hellwig
	asserts Since commit deed9512872d ("xfs: Check for -ENOATTR or -EEXIST"), the high-level attr code does a lookup for any attr we're trying to set, and does the checks to handle the create vs replace cases, which thus never hit the low-level attr code. Turn the checks in xfs_attr_shortform_addname as they must never trip. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-29	xfs: remove xfs_attr_sf_hdr_t	Christoph Hellwig
	Remove the last two users of the typedef. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-29	xfs: remove struct xfs_attr_shortform	Christoph Hellwig
	sparse complains about struct xfs_attr_shortform because it embeds a structure with a variable sized array in a variable sized array. Given that xfs_attr_shortform is not a very useful structure, and the dir2 equivalent has been removed a long time ago, remove it as well. Provide a xfs_attr_sf_firstentry helper that returns the first xfs_attr_sf_entry behind a xfs_attr_sf_hdr to replace the structure dereference. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-29	xfs: use xfs_attr_sf_findname in xfs_attr_shortform_getvalue	Christoph Hellwig
	xfs_attr_shortform_getvalue duplicates the logic in xfs_attr_sf_findname. Use the helper instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>