summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2022-07-03mm/damon/schemes: add 'LRU_PRIO' DAMOS actionSeongJae Park
This commit adds a new DAMOS action called 'LRU_PRIO' for the physical address space. The action prioritizes pages in the memory regions of the user-specified target access pattern on their LRU lists. This is hence supposed to be used for frequently accessed (hot) memory regions so that hot pages could be more likely protected under memory pressure. Internally, it simply calls 'mark_page_accessed()'. Link: https://lkml.kernel.org/r/20220613192301.8817-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: shrinkers: provide shrinkers with namesRoman Gushchin
Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: shrinkers: introduce debugfs interface for memory shrinkersRoman Gushchin
This commit introduces the /sys/kernel/debug/shrinker debugfs interface which provides an ability to observe the state of individual kernel memory shrinkers. Because the feature adds some memory overhead (which shouldn't be large unless there is a huge amount of registered shrinkers), it's guarded by a config option (enabled by default). This commit introduces the "count" interface for each shrinker registered in the system. The output is in the following format: <cgroup inode id> <nr of objects on node 0> <nr of objects on node 1>... <cgroup inode id> <nr of objects on node 0> <nr of objects on node 1>... ... To reduce the size of output on machines with many thousands cgroups, if the total number of objects on all nodes is 0, the line is omitted. If the shrinker is not memcg-aware or CONFIG_MEMCG is off, 0 is printed as cgroup inode id. If the shrinker is not numa-aware, 0's are printed for all nodes except the first one. This commit gives debugfs entries simple numeric names, which are not very convenient. The following commit in the series will provide shrinkers with more meaningful names. [akpm@linux-foundation.org: remove WARN_ON_ONCE(), per Roman] Reported-by: syzbot+300d27c79fe6d4cbcc39@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/20220601032227.4076670-3-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino()Roman Gushchin
Patch series "mm: introduce shrinker debugfs interface", v5. The only existing debugging mechanism is a couple of tracepoints in do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't covering everything though: shrinkers which report 0 objects will never show up, there is no support for memcg-aware shrinkers. Shrinkers are identified by their scan function, which is not always enough (e.g. hard to guess which super block's shrinker it is having only "super_cache_scan"). To provide a better visibility and debug options for memory shrinkers this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent similar to /sys/kernel/slab. For each shrinker registered in the system a directory is created. As now, the directory will contain only a "scan" file, which allows to get the number of managed objects for each memory cgroup (for memcg-aware shrinkers) and each numa node (for numa-aware shrinkers on a numa machine). Other interfaces might be added in the future. To make debugging more pleasant, the patchset also names all shrinkers, so that debugfs entries can have meaningful names. This patch (of 5): Shrinker debugfs requires a way to represent memory cgroups without using full paths, both for displaying information and getting input from a user. Cgroup inode number is a perfect way, already used by bpf. This commit adds a couple of helper functions which will be used to handle memcg-aware shrinkers. Link: https://lkml.kernel.org/r/20220601032227.4076670-1-roman.gushchin@linux.dev Link: https://lkml.kernel.org/r/20220601032227.4076670-2-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: introduce clear_highpage_kasan_taggedAndrey Konovalov
Add a clear_highpage_kasan_tagged() helper that does clear_highpage() on a page potentially tagged by KASAN. This helper is used by the following patch. Link: https://lkml.kernel.org/r/4471979b46b2c487787ddcd08b9dc5fedd1b6ffd.1654798516.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/damon/{dbgfs,sysfs}: move target_has_pid() from dbgfs to damon.hSeongJae Park
The function for knowing if given monitoring context's targets will have pid or not is defined and used in dbgfs only. However, the logic is also needed for sysfs. This commit moves the code to damon.h and makes both dbgfs and sysfs to use it. Link: https://lkml.kernel.org/r/20220606182310.48781-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/migration: fix potential pte_unmap on an not mapped pteMiaohe Lin
__migration_entry_wait and migration_entry_wait_on_locked assume pte is always mapped from caller. But this is not the case when it's called from migration_entry_wait_huge and follow_huge_pmd. Add a hugetlbfs variant that calls hugetlb_migration_entry_wait(ptep == NULL) to fix this issue. Link: https://lkml.kernel.org/r/20220530113016.16663-5-linmiaohe@huawei.com Fixes: 30dad30922cc ("mm: migration: add migrate_entry_wait_huge()") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Howells <dhowells@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/migration: return errno when isolate_huge_page failedMiaohe Lin
We might fail to isolate huge page due to e.g. the page is under migration which cleared HPageMigratable. We should return errno in this case rather than always return 1 which could confuse the user, i.e. the caller might think all of the memory is migrated while the hugetlb page is left behind. We make the prototype of isolate_huge_page consistent with isolate_lru_page as suggested by Huang Ying and rename isolate_huge_page to isolate_hugetlb as suggested by Muchun to improve the readability. Link: https://lkml.kernel.org/r/20220530113016.16663-4-linmiaohe@huawei.com Fixes: e8db67eb0ded ("mm: migrate: move_pages() supports thp migration") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Suggested-by: Huang Ying <ying.huang@intel.com> Reported-by: kernel test robot <lkp@intel.com> (build error) Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/page_vma_mapped.c: check possible huge PMD map with transhuge_vma_suitable()Yang Shi
IIUC page_vma_mapped_walk() checks if the vma is possibly huge PMD mapped with transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR". Actually pvmw->nr_pages is returned by compound_nr() or folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages >= HPAGE_PMD_NR". And it is guaranteed THP is allocated for valid VMA in the first place. But it may be not PMD mapped if the VMA is file VMA and it is not properly aligned. The transhuge_vma_suitable() is used to do such check, so replace transparent_hugepage_active() to it, which is too heavy and overkilling. Link: https://lkml.kernel.org/r/20220513191705.457775-1-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm: rmap: use the correct parameter name for DEFINE_PAGE_VMA_WALKYang Shi
The parameter used by DEFINE_PAGE_VMA_WALK is _page not page, fix the parameter name. It didn't cause any build error, it is probably because the only caller is write_protect_page() from ksm.c, which pass in page. Link: https://lkml.kernel.org/r/20220512174551.81279-1-shy828301@gmail.com Fixes: 2aff7a4755be ("mm: Convert page_vma_mapped_walk to work on PFNs") Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03Documentation: highmem: use literal block for code example in highmem.h commentBagas Sanjaya
When building htmldocs on Linus's tree, there are inline emphasis warnings on include/linux/highmem.h: Documentation/vm/highmem:166: ./include/linux/highmem.h:154: WARNING: Inline emphasis start-string without end-string. Documentation/vm/highmem:166: ./include/linux/highmem.h:157: WARNING: Inline emphasis start-string without end-string. These warnings above are due to comments in code example at the mentioned lines above are enclosed by double dash (--), which confuses Sphinx as inline markup delimiters instead. Fix these warnings by indenting the code example with literal block indentation and making the comments C comments. Link: https://lkml.kernel.org/r/20220622084546.17745-1-bagasdotme@gmail.com Fixes: 85a85e7601263f ("Documentation/vm: move "Using kmap-atomic" to highmem.h") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03lockref: remove unused 'lockref_get_or_lock()' functionLinus Torvalds
Looking at the conditional lock acquire functions in the kernel due to the new sparse support (see commit 4a557a5d1a61 "sparse: introduce conditional lock acquire function attribute"), it became obvious that the lockref code has a couple of them, but they don't match the usual naming convention for the other ones, and their return value logic is also reversed. In the other very similar places, the naming pattern is '*_and_lock()' (eg 'atomic_put_and_lock()' and 'refcount_dec_and_lock()'), and the function returns true when the lock is taken. The lockref code is superficially very similar to the refcount code, only with the special "atomic wrt the embedded lock" semantics. But instead of the '*_and_lock()' naming it uses '*_or_lock()'. And instead of returning true in case it took the lock, it returns true if it *didn't* take the lock. Now, arguably the reflock code is quite logical: it really is a "either decrement _or_ lock" kind of situation - and the return value is about whether the operation succeeded without any special care needed. So despite the similarities, the differences do make some sense, and maybe it's not worth trying to unify the different conditional locking primitives in this area. But while looking at this all, it did become obvious that the 'lockref_get_or_lock()' function hasn't actually had any users for almost a decade. The only user it ever had was the shortlived 'd_rcu_to_refcount()' function, and it got removed and replaced with 'lockref_get_not_dead()' back in 2013 in commits 0d98439ea3c6 ("vfs: use lockred 'dead' flag to mark unrecoverably dead dentries") and e5c832d55588 ("vfs: fix dentry RCU to refcounting possibly sleeping dput()") In fact, that single use was removed less than a week after the whole function was introduced in commit b3abd80250c1 ("lockref: add 'lockref_get_or_lock() helper") so this function has been around for a decade, but only had a user for six days. Let's just put this mis-designed and unused function out of its misery. We can think about the naming and semantic oddities of the remaining 'lockref_put_or_lock()' later, but at least that function has users. And while the naming is different and the return value doesn't match, that function matches the whole '{atomic,refcount}_dec_and_test()' pattern much better (ie the magic happens when the count goes down to zero, not when it is incremented from zero). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-03sparse: introduce conditional lock acquire function attributeLinus Torvalds
The kernel tends to try to avoid conditional locking semantics because it makes it harder to think about and statically check locking rules, but we do have a few fundamental locking primitives that take locks conditionally - most obviously the 'trylock' functions. That has always been a problem for 'sparse' checking for locking imbalance, and we've had a special '__cond_lock()' macro that we've used to let sparse know how the locking works: # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) so that you can then use this to tell sparse that (for example) the spinlock trylock macro ends up acquiring the lock when it succeeds, but not when it fails: #define raw_spin_trylock(lock) __cond_lock(lock, _raw_spin_trylock(lock)) and then sparse can follow along the locking rules when you have code like if (!spin_trylock(&dentry->d_lock)) return LRU_SKIP; .. sparse sees that the lock is held here.. spin_unlock(&dentry->d_lock); and sparse ends up happy about the lock contexts. However, this '__cond_lock()' use does result in very ugly header files, and requires you to basically wrap the real function with that macro that uses '__cond_lock'. Which has made PeterZ NAK things that try to fix sparse warnings over the years [1]. To solve this, there is now a very experimental patch to sparse that basically does the exact same thing as '__cond_lock()' did, but using a function attribute instead. That seems to make PeterZ happy [2]. Note that this does not replace existing use of '__cond_lock()', but only exposes the new proposed attribute and uses it for the previously unannotated 'refcount_dec_and_lock()' family of functions. For existing sparse installations, this will make no difference (a negative output context was ignored), but if you have the experimental sparse patch it will make sparse now understand code that uses those functions, the same way '__cond_lock()' makes sparse understand the very similar 'atomic_dec_and_lock()' uses that have the old '__cond_lock()' annotations. Note that in some cases this will silence existing context imbalance warnings. But in other cases it may end up exposing new sparse warnings for code that sparse just didn't see the locking for at all before. This is a trial, in other words. I'd expect that if it ends up being successful, and new sparse releases end up having this new attribute, we'll migrate the old-style '__cond_lock()' users to use the new-style '__cond_acquires' function attribute. The actual experimental sparse patch was posted in [3]. Link: https://lore.kernel.org/all/20130930134434.GC12926@twins.programming.kicks-ass.net/ [1] Link: https://lore.kernel.org/all/Yr60tWxN4P568x3W@worktop.programming.kicks-ass.net/ [2] Link: https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/ [3] Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Aring <aahringo@redhat.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-03Merge tag 'linux-can-next-for-5.20-20220703' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-07-03 this is a pull request of 15 patches for net-next/master. The first 2 patches are by Max Staudt and add the can327 serial CAN driver along with a new line discipline ID. The next patch is by me an fixes a typo in the ctucanfd driver. The last 12 patches are by Dario Binacchi and integrate slcan CAN serial driver better into the existing CAN driver API. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-03can: netlink: dump bitrate 0 if can_priv::bittiming.bitrate is -1UDario Binacchi
Upcoming changes on slcan driver will require you to specify a bitrate of value -1 to prevent the open_candev() from failing but at the same time highlighting that it is a fake value. In this case the command `ip --details -s -s link show' would print 4294967295 as the bitrate value. The patch change this value in 0. Link: https://lore.kernel.org/all/20220628163137.413025-5-dario.binacchi@amarulasolutions.com Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com> Tested-by: Jeroen Hofstee <jhofstee@victronenergy.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-07-02net/mlx5: E-switch, Remove dependency between sriov and eswitch modeChris Mi
Currently, there are three eswitch modes, none, legacy and switchdev. None is the default mode. Remove redundant none mode as eswitch mode should always be either legacy mode or switchdev mode. With this patch, there are two behavior changes: 1. Legacy becomes the default mode. When querying eswitch mode using devlink, a valid mode is always returned. 2. When disabling sriov, the eswitch mode will not change, only vfs are unloaded. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-02net: add skb_[inner_]tcp_all_headers helpersEric Dumazet
Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)" to compute headers length for a TCP packet, but others use more convoluted (but equivalent) ways. Add skb_tcp_all_headers() and skb_inner_tcp_all_headers() helpers to harmonize this a bit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-02platform/surface: aggregator: Move device registry helper functions to core ↵Maximilian Luz
module Move helper functions for client device registration to the core module. This simplifies addition of future DT/OF support and also allows us to split out the device hub drivers into their own module. At the same time, also improve device node validation a bit by not silently skipping devices with invalid device UID specifiers. Further, ensure proper lifetime management for the firmware/software nodes associated with the added devices. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Link: https://lore.kernel.org/r/20220624205800.1355621-2-luzmaximilian@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-07-02platform/surface: aggregator: Add helper macros for requests with argument ↵Maximilian Luz
and return value Add helper macros for synchronous stack-allocated Surface Aggregator request with both argument and return value, similar to the current argument-only and return-value-only ones. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Link: https://lore.kernel.org/r/20220624183642.910893-2-luzmaximilian@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-07-01panic: Taint kernel if tests are runDavid Gow
Most in-kernel tests (such as KUnit tests) are not supposed to run on production systems: they may do deliberately illegal things to trigger errors, and have security implications (for example, KUnit assertions will often deliberately leak kernel addresses). Add a new taint type, TAINT_TEST to signal that a test has been run. This will be printed as 'N' (originally for kuNit, as every other sensible letter was taken.) This should discourage people from running these tests on production systems, and to make it easier to tell if tests have been run accidentally (by loading the wrong configuration, etc.) Acked-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-07-01Merge tag 'pm-5.19-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix some issues in cpufreq drivers and some issues in devfreq: - Fix error code path issues related PROBE_DEFER handling in devfreq (Christian Marangi) - Revert an editing accident in SPDX-License line in the devfreq passive governor (Lukas Bulwahn) - Fix refcount leak in of_get_devfreq_events() in the exynos-ppmu devfreq driver (Miaoqian Lin) - Use HZ_PER_KHZ macro in the passive devfreq governor (Yicong Yang) - Fix missing of_node_put for qoriq and pmac32 driver (Liang He) - Fix issues around throttle interrupt for qcom driver (Stephen Boyd) - Add MT8186 to cpufreq-dt-platdev blocklist (AngeloGioacchino Del Regno) - Make amd-pstate enable CPPC on resume from S3 (Jinzhou Su)" * tag 'pm-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / devfreq: passive: revert an editing accident in SPDX-License line PM / devfreq: Fix kernel warning with cpufreq passive register fail PM / devfreq: Rework freq_table to be local to devfreq struct PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events PM / devfreq: passive: Use HZ_PER_KHZ macro in units.h PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFER PM / devfreq: Mute warning on governor PROBE_DEFER PM / devfreq: Fix kernel panic with cpu based scaling to passive gov cpufreq: Add MT8186 to cpufreq-dt-platdev blocklist cpufreq: pmac32-cpufreq: Fix refcount leak bug cpufreq: qcom-hw: Don't do lmh things without a throttle interrupt drivers: cpufreq: Add missing of_node_put() in qoriq-cpufreq.c cpufreq: amd-pstate: Add resume and suspend callbacks
2022-07-01PM: runtime: Redefine pm_runtime_release_supplier()Rafael J. Wysocki
Instead of passing an extra bool argument to pm_runtime_release_supplier(), make its callers take care of triggering a runtime-suspend of the supplier device as needed. No expected functional impact. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
2022-07-01x86/kexec: Carry forward IMA measurement log on kexecJonathan McDowell
On kexec file load, the Integrity Measurement Architecture (IMA) subsystem may verify the IMA signature of the kernel and initramfs, and measure it. The command line parameters passed to the kernel in the kexec call may also be measured by IMA. A remote attestation service can verify a TPM quote based on the TPM event log, the IMA measurement list and the TPM PCR data. This can be achieved only if the IMA measurement log is carried over from the current kernel to the next kernel across the kexec call. PowerPC and ARM64 both achieve this using device tree with a "linux,ima-kexec-buffer" node. x86 platforms generally don't make use of device tree, so use the setup_data mechanism to pass the IMA buffer to the new kernel. Signed-off-by: Jonathan McDowell <noodles@fb.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> # IMA function definitions Link: https://lore.kernel.org/r/YmKyvlF3my1yWTvK@noodles-fedora-PC23Y6EG
2022-07-01fanotify: introduce FAN_MARK_IGNOREAmir Goldstein
This flag is a new way to configure ignore mask which allows adding and removing the event flags FAN_ONDIR and FAN_EVENT_ON_CHILD in ignore mask. The legacy FAN_MARK_IGNORED_MASK flag would always ignore events on directories and would ignore events on children depending on whether the FAN_EVENT_ON_CHILD flag was set in the (non ignored) mask. FAN_MARK_IGNORE can be used to ignore events on children without setting FAN_EVENT_ON_CHILD in the mark's mask and will not ignore events on directories unconditionally, only when FAN_ONDIR is set in ignore mask. The new behavior is non-downgradable. After calling fanotify_mark() with FAN_MARK_IGNORE once, calling fanotify_mark() with FAN_MARK_IGNORED_MASK on the same object will return EEXIST error. Setting the event flags with FAN_MARK_IGNORE on a non-dir inode mark has no meaning and will return ENOTDIR error. The meaning of FAN_MARK_IGNORED_SURV_MODIFY is preserved with the new FAN_MARK_IGNORE flag, but with a few semantic differences: 1. FAN_MARK_IGNORED_SURV_MODIFY is required for filesystem and mount marks and on an inode mark on a directory. Omitting this flag will return EINVAL or EISDIR error. 2. An ignore mask on a non-directory inode that survives modify could never be downgraded to an ignore mask that does not survive modify. With new FAN_MARK_IGNORE semantics we make that rule explicit - trying to update a surviving ignore mask without the flag FAN_MARK_IGNORED_SURV_MODIFY will return EEXIST error. The conveniene macro FAN_MARK_IGNORE_SURV is added for (FAN_MARK_IGNORE | FAN_MARK_IGNORED_SURV_MODIFY), because the common case should use short constant names. Link: https://lore.kernel.org/r/20220629144210.2983229-4-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-07-01fanotify: cleanups for fanotify_mark() input validationsAmir Goldstein
Create helper fanotify_may_update_existing_mark() for checking for conflicts between existing mark flags and fanotify_mark() flags. Use variable mark_cmd to make the checks for mark command bits cleaner. Link: https://lore.kernel.org/r/20220629144210.2983229-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-07-01fanotify: prepare for setting event flags in ignore maskAmir Goldstein
Setting flags FAN_ONDIR FAN_EVENT_ON_CHILD in ignore mask has no effect. The FAN_EVENT_ON_CHILD flag in mask implicitly applies to ignore mask and ignore mask is always implicitly applied to events on directories. Define a mark flag that replaces this legacy behavior with logic of applying the ignore mask according to event flags in ignore mask. Implement the new logic to prepare for supporting an ignore mask that ignores events on children and ignore mask that does not ignore events on directories. To emphasize the change in terminology, also rename ignored_mask mark member to ignore_mask and use accessors to get only the effective ignored events or the ignored events and flags. This change in terminology finally aligns with the "ignore mask" language in man pages and in most of the comments. Link: https://lore.kernel.org/r/20220629144210.2983229-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-07-01firmware: Hold a reference for of_find_compatible_node()Liang He
In of_register_trusted_foundations(), we need to hold the reference returned by of_find_compatible_node() and then use it to call of_node_put() for refcount balance. Signed-off-by: Liang He <windhl@126.com> Link: https://lore.kernel.org/r/20220628021640.4015-1-windhl@126.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-01uacce: Handle parent device removal or parent driver module rmmodJean-Philippe Brucker
The uacce driver must deal with a possible removal of the parent device or parent driver module rmmod at any time. Although uacce_remove(), called on device removal and on driver unbind, prevents future use of the uacce fops by removing the cdev, fops that were called before that point may still be running. Serialize uacce_fops_open() and uacce_remove() with uacce->mutex. Serialize other fops against uacce_remove() with q->mutex. Since we need to protect uacce_fops_poll() which gets called on the fast path, replace uacce->queues_lock with q->mutex to improve scalability. The other fops are only used during setup. uacce_queue_is_valid(), checked under q->mutex or uacce->mutex, denotes whether uacce_remove() has disabled all queues. If that is the case, don't go any further since the parent device is being removed and uacce->ops should not be called anymore. Reported-by: Yang Shen <shenyang39@huawei.com> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20220701034843.7502-1-zhangfei.gao@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-01wifi: ieee80211: s1g action frames are not robustPeter Chiu
S1g action frame with code 22 is not protected so update the robust action frame list. Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com> Link: https://lore.kernel.org/r/20220622010820.17522-1-chui-hao.chiu@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-01misc: rtsx_usb: use separate command and response buffersShuah Khan
rtsx_usb uses same buffer for command and response. There could be a potential conflict using the same buffer for both especially if retries and timeouts are involved. Use separate command and response buffers to avoid conflicts. Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/07e3721804ff07aaab9ef5b39a5691d0718b9ade.1656642167.git.skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-01misc: rtsx_usb: fix use of dma mapped buffer for usb bulk transferShuah Khan
rtsx_usb driver allocates coherent dma buffer for urb transfers. This buffer is passed to usb_bulk_msg() and usb core tries to map already mapped buffer running into a dma mapping error. xhci_hcd 0000:01:00.0: rejecting DMA map of vmalloc memory WARNING: CPU: 1 PID: 279 at include/linux/dma-mapping.h:326 usb_ hcd_map_urb_for_dma+0x7d6/0x820 ... xhci_map_urb_for_dma+0x291/0x4e0 usb_hcd_submit_urb+0x199/0x12b0 ... usb_submit_urb+0x3b8/0x9e0 usb_start_wait_urb+0xe3/0x2d0 usb_bulk_msg+0x115/0x240 rtsx_usb_transfer_data+0x185/0x1a8 [rtsx_usb] rtsx_usb_send_cmd+0xbb/0x123 [rtsx_usb] rtsx_usb_write_register+0x12c/0x143 [rtsx_usb] rtsx_usb_probe+0x226/0x4b2 [rtsx_usb] Fix it to use kmalloc() to get DMA-able memory region instead. Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/667d627d502e1ba9ff4f9b94966df3299d2d3c0d.1656642167.git.skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-30time64.h: consolidate uses of PSEC_PER_NSECVladimir Oltean
Time-sensitive networking code needs to work with PTP times expressed in nanoseconds, and with packet transmission times expressed in picoseconds, since those would be fractional at higher than gigabit speed when expressed in nanoseconds. Convert the existing uses in tc-taprio and the ocelot/felix DSA driver to a PSEC_PER_NSEC macro. This macro is placed in include/linux/time64.h as opposed to its relatives (PSEC_PER_SEC etc) from include/vdso/time64.h because the vDSO library does not (yet) need/use it. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for the vDSO parts Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30bitmap: don't assume compiler evaluates small mem*() builtins callsAlexander Lobakin
Intel kernel bot triggered the build bug on ARC architecture that in fact is as follows: DECLARE_BITMAP(bitmap, BITS_PER_LONG); bitmap_clear(bitmap, 0, BITS_PER_LONG); BUILD_BUG_ON(!__builtin_constant_p(*bitmap)); which can be expanded to: unsigned long bitmap[1]; memset(bitmap, 0, sizeof(*bitmap)); BUILD_BUG_ON(!__builtin_constant_p(*bitmap)); In most cases, a compiler is able to expand small/simple mem*() calls to simple assignments or bitops, in this case that would mean: unsigned long bitmap[1] = { 0 }; BUILD_BUG_ON(!__builtin_constant_p(*bitmap)); and on most architectures this works, but not on ARC, despite having -O3 for every build. So, to make this work, in case when the last bit to modify is still within the first long (small_const_nbits()), just use plain assignments for the rest of bitmap_*() functions which still use mem*(), but didn't receive such compile-time optimizations yet. This doesn't have the same coverage as compilers provide, but at least something to start: text: add/remove: 3/7 grow/shrink: 43/78 up/down: 1848/-3370 (-1546) data: add/remove: 1/11 grow/shrink: 0/8 up/down: 4/-356 (-352) notably cpumask_*() family when NR_CPUS <= BITS_PER_LONG: netif_get_num_default_rss_queues 38 4 -34 cpumask_copy 90 - -90 cpumask_clear 146 - -146 and the abovementioned assertion started passing. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-06-30bitops: let optimize out non-atomic bitops on compile-time constantsAlexander Lobakin
Currently, many architecture-specific non-atomic bitop implementations use inline asm or other hacks which are faster or more robust when working with "real" variables (i.e. fields from the structures etc.), but the compilers have no clue how to optimize them out when called on compile-time constants. That said, the following code: DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1]; unsigned long bar = BIT(BAR_BIT); unsigned long baz = 0; __set_bit(FOO_BIT, foo); baz |= BIT(BAZ_BIT); BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo)); BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT)); BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT)); triggers the first assertion on x86_64, which means that the compiler is unable to evaluate it to a compile-time initializer when the architecture-specific bitop is used even if it's obvious. In order to let the compiler optimize out such cases, expand the bitop() macro to use the "constant" C non-atomic bitop implementations when all of the arguments passed are compile-time constants, which means that the result will be a compile-time constant as well, so that it produces more efficient and simple code in 100% cases, comparing to the architecture-specific counterparts. The savings are architecture, compiler and compiler flags dependent, for example, on x86_64 -O2: GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235) LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816) LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287) and ARM64 (courtesy of Mark): GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240) LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764) Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-06-30bitops: wrap non-atomic bitops with a transparent macroAlexander Lobakin
In preparation for altering the non-atomic bitops with a macro, wrap them in a transparent definition. This requires prepending one more '_' to their names in order to be able to do that seamlessly. It is a simple change, given that all the non-prefixed definitions are now in asm-generic. sparc32 already has several triple-underscored functions, so I had to rename them ('___' -> 'sp32_'). Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-06-30bitops: define const_*() versions of the non-atomicsAlexander Lobakin
Define const_*() variants of the non-atomic bitops to be used when the input arguments are compile-time constants, so that the compiler will be always able to resolve those to compile-time constants as well. Those are mostly direct aliases for generic_*() with one exception for const_test_bit(): the original one is declared atomic-safe and thus doesn't discard the `volatile` qualifier, so in order to let optimize code, define it separately disregarding the qualifier. Add them to the compile-time type checks as well just in case. Suggested-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-06-30bitops: unify non-atomic bitops prototypes across architecturesAlexander Lobakin
Currently, there is a mess with the prototypes of the non-atomic bitops across the different architectures: ret bool, int, unsigned long nr int, long, unsigned int, unsigned long addr volatile unsigned long *, volatile void * Thankfully, it doesn't provoke any bugs, but can sometimes make the compiler angry when it's not handy at all. Adjust all the prototypes to the following standard: ret bool retval can be only 0 or 1 nr unsigned long native; signed makes no sense addr volatile unsigned long * bitmaps are arrays of ulongs Next, some architectures don't define 'arch_' versions as they don't support instrumentation, others do. To make sure there is always the same set of callables present and to ease any potential future changes, make them all follow the rule: * architecture-specific files define only 'arch_' versions; * non-prefixed versions can be defined only in asm-generic files; and place the non-prefixed definitions into a new file in asm-generic to be included by non-instrumented architectures. Finally, add some static assertions in order to prevent people from making a mess in this room again. I also used the %__always_inline attribute consistently, so that they always get resolved to the actual operations. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-06-30Merge tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Bit quieter this week, the main thing is it pulls in the fixes for the sysfb resource issue you were seeing. these had been queued for next so should have had some decent testing. Otherwise amdgpu, i915 and msm each have a few fixes, and vc4 has one. fbdev: - sysfb fixes/conflicting fb fixes amdgpu: - GPU recovery fix - Fix integer type usage in fourcc header for AMD modifiers - KFD TLB flush fix for gfx9 APUs - Display fix i915: - Fix ioctl argument error return - Fix d3cold disable to allow PCI upstream bridge D3 transition - Fix setting cache_dirty for dma-buf objects on discrete msm: - Fix to increment vsync_cnt before calling drm_crtc_handle_vblank so that userspace sees the value *after* it is incremented if waiting for vblank events - Fix to reset drm_dev to NULL in dp_display_unbind to avoid a crash in probe/bind error paths - Fix to resolve the smatch error of de-referencing before NULL check in dpu_encoder_phys_wb.c - Fix error return to userspace if fence-id allocation fails in submit ioctl vc4: - NULL ptr dereference fix" * tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drm: Revert "drm/amdgpu/display: set vblank_disable_immediate for DC" drm/amdgpu: To flush tlb for MMHUB of RAVEN series drm/fourcc: fix integer type usage in uapi header drm/amdgpu: fix adev variable used in amdgpu_device_gpu_recover() fbdev: Disable sysfb device registration when removing conflicting FBs firmware: sysfb: Add sysfb_disable() helper function firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer drm/msm/gem: Fix error return on fence id alloc fail drm/i915: tweak the ordering in cpu_write_needs_clflush drm/i915/dgfx: Disable d3cold at gfx root port drm/i915/gem: add missing else drm/vc4: perfmon: Fix variable dereferenced before check drm/msm/dpu: Fix variable dereferenced before check drm/msm/dp: reset drm_dev to NULL at dp_display_unbind() drm/msm/dpu: Increment vsync_cnt before waking up userspace
2022-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c 9c5de246c1db ("net: sparx5: mdb add/del handle non-sparx5 devices") fbb89d02e33a ("net: sparx5: Allow mdb entries to both CPU and ports") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-01Merge tag 'drm-misc-fixes-2022-06-30' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A NULL pointer dereference fix for vc4, and 3 patches to improve the sysfb device behaviour when removing conflicting framebuffers Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20220630072404.2fa4z3nk5h5q34ci@houat
2022-06-30Merge tag 'net-5.19-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - new code bugs: - clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user() - mptcp: - invoke MP_FAIL response only when needed - fix shutdown vs fallback race - consistent map handling on failure - octeon_ep: use bitwise AND Previous releases - regressions: - tipc: move bc link creation back to tipc_node_create, fix NPD Previous releases - always broken: - tcp: add a missing nf_reset_ct() in 3WHS handling to prevent socket buffered skbs from keeping refcount on the conntrack module - ipv6: take care of disable_policy when restoring routes - tun: make sure to always disable and unlink NAPI instances - phy: don't trigger state machine while in suspend - netfilter: nf_tables: avoid skb access on nf_stolen - asix: fix "can't send until first packet is send" issue - usb: asix: do not force pause frames support - nxp-nci: don't issue a zero length i2c_master_read() Misc: - ncsi: allow use of proper "mellanox" DT vendor prefix - act_api: add a message for user space if any actions were already flushed before the error was hit" * tag 'net-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits) net: dsa: felix: fix race between reading PSFP stats and port stats selftest: tun: add test for NAPI dismantle net: tun: avoid disabling NAPI twice net: sparx5: mdb add/del handle non-sparx5 devices net: sfp: fix memory leak in sfp_probe() mlxsw: spectrum_router: Fix rollback in tunnel next hop init net: rose: fix UAF bugs caused by timer handler net: usb: ax88179_178a: Fix packet receiving net: bonding: fix use-after-free after 802.3ad slave unbind ipv6: fix lockdep splat in in6_dump_addrs() net: phy: ax88772a: fix lost pause advertisement configuration net: phy: Don't trigger state machine while in suspend usbnet: fix memory allocation in helpers selftests net: fix kselftest net fatal error NFC: nxp-nci: don't print header length mismatch on i2c error NFC: nxp-nci: Don't issue a zero length i2c_master_read() net: tipc: fix possible refcount leak in tipc_sk_create() nfc: nfcmrvl: Fix irq_of_parse_and_map() return value net: ipv6: unexport __init-annotated seg6_hmac_net_init() ipv6/sit: fix ipip6_tunnel_get_prl return value ...
2022-06-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Three minor bug fixes: - qedr not setting the QP timeout properly toward userspace - Memory leak on error path in ib_cm - Divide by 0 in RDMA interrupt moderation" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: linux/dim: Fix divide by 0 in RDMA DIM RDMA/cm: Fix memory leak in ib_cm_insert_listen RDMA/qedr: Fix reporting QP timeout attribute
2022-06-30Merge tag 'fsnotify_for_v5.19-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify fix from Jan Kara: "A fix for recently added fanotify API to have stricter checks and refuse some invalid flag combinations to make our life easier in the future" * tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: refine the validation checks on non-dir inode mask
2022-06-30vfio: Split migration ops from main device opsYishai Hadas
vfio core checks whether the driver sets some migration op (e.g. set_state/get_state) and accordingly calls its op. However, currently mlx5 driver sets the above ops without regards to its migration caps. This might lead to unexpected usage/Oops if user space may call to the above ops even if the driver doesn't support migration. As for example, the migration state_mutex is not initialized in that case. The cleanest way to manage that seems to split the migration ops from the main device ops, this will let the driver setting them separately from the main ops when it's applicable. As part of that, validate ops construction on registration and include a check for VFIO_MIGRATION_STOP_COPY since the uAPI claims it must be set in migration_flags. HISI driver was changed as well to match this scheme. This scheme may enable down the road to come with some extra group of ops (e.g. DMA log) that can be set without regards to the other options based on driver caps. Fixes: 6fadb021266d ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220628155910.171454-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-30serial: 8250: Fix PM usage_count for console handoverIlpo Järvinen
When console is enabled, univ8250_console_setup() calls serial8250_console_setup() before .dev is set to uart_port. Therefore, it will not call pm_runtime_get_sync(). Later, when the actual driver is going to take over univ8250_console_exit() is called. As .dev is already set, serial8250_console_exit() makes pm_runtime_put_sync() call with usage count being zero triggering PM usage count warning (extra debug for univ8250_console_setup(), univ8250_console_exit(), and serial8250_register_ports()): [ 0.068987] univ8250_console_setup ttyS0 nodev [ 0.499670] printk: console [ttyS0] enabled [ 0.717955] printk: console [ttyS0] printing thread started [ 1.960163] serial8250_register_ports assigned dev for ttyS0 [ 1.976830] printk: console [ttyS0] disabled [ 1.976888] printk: console [ttyS0] printing thread stopped [ 1.977073] univ8250_console_exit ttyS0 usage:0 [ 1.977075] serial8250 serial8250: Runtime PM usage count underflow! [ 1.977429] dw-apb-uart.6: ttyS0 at MMIO 0x4010006000 (irq = 33, base_baud = 115200) is a 16550A [ 1.977812] univ8250_console_setup ttyS0 usage:2 [ 1.978167] printk: console [ttyS0] printing thread started [ 1.978203] printk: console [ttyS0] enabled To fix the issue, call pm_runtime_get_sync() in serial8250_register_ports() as soon as .dev is set for an uart_port if it has console enabled. This problem became apparent only recently because 82586a721595 ("PM: runtime: Avoid device usage count underflows") added the warning printout. I confirmed this problem also occurs with v5.18 (w/o the warning printout, obviously). Fixes: bedb404e91bb ("serial: 8250_port: Don't use power management for kernel console") Cc: stable <stable@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/b4f428e9-491f-daf2-2232-819928dc276e@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-30tty: serial: samsung_tty: loopback mode supportChanho Park
Internal loopback mode can be supported by setting UCON register's Loopback Mode bit. The mode & bit can be supported since s3c2410 and later SoCs. The prefix of LOOPBACK / BIT(5) naming should be also changed to S3C2410_ in order to avoid confusion. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Chanho Park <chanho61.park@samsung.com> Link: https://lore.kernel.org/r/20220629004141.51484-1-chanho61.park@samsung.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-30spi: spi.c: Fix comment styleDavid Jander
Capitalize first word in comment where appropriate and add parentheses to function names. Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David Jander <david@protonic.nl> Link: https://lore.kernel.org/r/20220629142519.3985486-3-david@protonic.nl Signed-off-by: Mark Brown <broonie@kernel.org>
2022-06-30sysctl: add proc_dointvec_ms_jiffies_minmaxYuwei Wang
add proc_dointvec_ms_jiffies_minmax to fit read msecs value to jiffies with a limited range of values Signed-off-by: Yuwei Wang <wangyuweihx@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-29net: phy: Don't trigger state machine while in suspendLukas Wunner
Upon system sleep, mdio_bus_phy_suspend() stops the phy_state_machine(), but subsequent interrupts may retrigger it: They may have been left enabled to facilitate wakeup and are not quiesced until the ->suspend_noirq() phase. Unwanted interrupts may hence occur between mdio_bus_phy_suspend() and dpm_suspend_noirq(), as well as between dpm_resume_noirq() and mdio_bus_phy_resume(). Retriggering the phy_state_machine() through an interrupt is not only undesirable for the reason given in mdio_bus_phy_suspend() (freezing it midway with phydev->lock held), but also because the PHY may be inaccessible after it's suspended: Accesses to USB-attached PHYs are blocked once usb_suspend_both() clears the can_submit flag and PHYs on PCI network cards may become inaccessible upon suspend as well. Amend phy_interrupt() to avoid triggering the state machine if the PHY is suspended. Signal wakeup instead if the attached net_device or its parent has been configured as a wakeup source. (Those conditions are identical to mdio_bus_phy_may_suspend().) Postpone handling of the interrupt until the PHY has resumed. Before stopping the phy_state_machine() in mdio_bus_phy_suspend(), wait for a concurrent phy_interrupt() to run to completion. That is necessary because phy_interrupt() may have checked the PHY's suspend status before the system sleep transition commenced and it may thus retrigger the state machine after it was stopped. Likewise, after re-enabling interrupt handling in mdio_bus_phy_resume(), wait for a concurrent phy_interrupt() to complete to ensure that interrupts which it postponed are properly rerun. The issue was exposed by commit 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling"), but has existed since forever. Fixes: 541cd3ee00a4 ("phylib: Fix deadlock on resume") Link: https://lore.kernel.org/netdev/a5315a8a-32c2-962f-f696-de9a26d30091@samsung.com/ Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable@vger.kernel.org # v2.6.33+ Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/b7f386d04e9b5b0e2738f0125743e30676f309ef.1656410895.git.lukas@wunner.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-29iosys-map: Add per-word writeLucas De Marchi
Like was done for read, provide the equivalent for write. Even if current users are not in the hot path, this should future-proof it. v2: - Remove default from _Generic() - callers wanting to write more than u64 should use iosys_map_memcpy_to() - Add WRITE_ONCE() cases dereferencing the pointer when using system memory v3: - Fix precedence issue when casting inside WRITE_ONCE(). By not using () around vaddr__ the offset was not part of the cast, but rather added to it, producing a wrong address - Remove compiletime_assert() as WRITE_ONCE() already contains it Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Reviewed-by: Christian König <christian.koenig@amd.com> # v1 Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20220628191016.3899428-2-lucas.demarchi@intel.com