Age | Commit message (Collapse) | Author |
|
Since commit aa06a9bd8533 ("ia64: fix clock_getres(CLOCK_MONOTONIC) to
report ITC frequency"), gcc 10.1.0 fails to build ia64 with the gnomic:
| ../arch/ia64/kernel/sys_ia64.c: In function 'ia64_clock_getres':
| ../arch/ia64/kernel/sys_ia64.c:189:3: error: a label can only be part of a statement and a declaration is not a statement
| 189 | s64 tick_ns = DIV_ROUND_UP(NSEC_PER_SEC, local_cpu_data->itc_freq);
This line appears immediately after a case label in a switch.
Move the declarations out of the case, to the top of the function.
Link: https://lkml.kernel.org/r/20230117151632.393836-1-james.morse@arm.com
Fixes: aa06a9bd8533 ("ia64: fix clock_getres(CLOCK_MONOTONIC) to report ITC frequency")
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Sergei Trofimovich <slyich@gmail.com>
Cc: Émeric Maschino <emeric.maschino@gmail.com>
Cc: matoro <matoro_mailinglist_kernel@matoro.tk>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
lru_gen_migrate_mm() assumes lru_gen_add_mm() runs prior to itself. This
isn't true for the following scenario:
CPU 1 CPU 2
clone()
cgroup_can_fork()
cgroup_procs_write()
cgroup_post_fork()
task_lock()
lru_gen_migrate_mm()
task_unlock()
task_lock()
lru_gen_add_mm()
task_unlock()
And when the above happens, kernel crashes because of linked list
corruption (mm_struct->lru_gen.list).
Link: https://lore.kernel.org/r/20230115134651.30028-1-msizanoen@qtmlabs.xyz/
Link: https://lkml.kernel.org/r/20230116034405.2960276-1-yuzhao@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: msizanoen <msizanoen@qtmlabs.xyz>
Tested-by: msizanoen <msizanoen@qtmlabs.xyz>
Cc: <stable@vger.kernel.org> [6.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit 12a5d3955227b0d7e04fb793ccceeb2a1dd275c5.
Although it is recognized that a finer grained pro-active reclaim is
something we need and want the semantic of this implementation is really
ambiguous.
In a follow up discussion it became clear that there are two essential
usecases here. One is to use memory.reclaim to pro-actively reclaim
memory and expectation is that the requested and reported amount of memory
is uncharged from the memcg. Another usecase focuses on pro-active
demotion when the memory is merely shuffled around to demotion targets
while the overall charged memory stays unchanged.
The current implementation considers demoted pages as reclaimed and that
break both usecases. [1] has tried to address the reporting part but
there are more issues with that summarized in [2] and follow up emails.
Let's revert the nodemask based extension of the memcg pro-active
reclaim for now until we settle with a more robust semantic.
[1] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20221206023406.3182800-1-almasrymina@google.com
[2] http://lkml.kernel.org/r/Y5bsmpCyeryu3Zz1@dhcp22.suse.cz
Link: https://lkml.kernel.org/r/Y5xASNe1x8cusiTx@dhcp22.suse.cz
Fixes: 12a5d3955227b0d ("mm: add nodes= arg to memory.reclaim")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: zefan li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, there is a race between zs_free() and zs_reclaim_page():
zs_reclaim_page() finds a handle to an allocated object, but before the
eviction happens, an independent zs_free() call to the same handle could
come in and overwrite the object value stored at the handle with the last
deferred handle. When zs_reclaim_page() finally gets to call the eviction
handler, it will see an invalid object value (i.e the previous deferred
handle instead of the original object value).
This race happens quite infrequently. We only managed to produce it with
out-of-tree developmental code that triggers zsmalloc writeback with a
much higher frequency than usual.
This patch fixes this race by storing the deferred handle in the object
header instead. We differentiate the deferred handle from the other two
cases (handle for allocated object, and linkage for free object) with a
new tag. If zspage reclamation succeeds, we will free these deferred
handles by walking through the zspage objects. On the other hand, if
zspage reclamation fails, we reconstruct the zspage freelist (with the
deferred handle tag and allocated tag) before trying again with the
reclamation.
[arnd@arndb.de: avoid unused-function warning]
Link: https://lkml.kernel.org/r/20230117170507.2651972-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20230110231701.326724-1-nphamcs@gmail.com
Fixes: 9997bc017549 ("zsmalloc: implement writeback mechanism for zsmalloc")
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked.
Page table traversal is allowed under any one of the mmap lock, the
anon_vma lock (if the VMA is associated with an anon_vma), and the
mapping lock (if the VMA is associated with a mapping); and so to be
able to remove page tables, we must hold all three of them.
retract_page_tables() bails out if an ->anon_vma is attached, but does
this check before holding the mmap lock (as the comment above the check
explains).
If we racily merged an existing ->anon_vma (shared with a child
process) from a neighboring VMA, subsequent rmap traversals on pages
belonging to the child will be able to see the page tables that we are
concurrently removing while assuming that nothing else can access them.
Repeat the ->anon_vma check once we hold the mmap lock to ensure that
there really is no concurrent page table access.
Hitting this bug causes a lockdep warning in collapse_and_free_pmd(),
in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)".
It can also lead to use-after-free access.
Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@mail.gmail.com/
Link: https://lkml.kernel.org/r/20230111133351.807024-1-jannh@google.com
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Jann Horn <jannh@google.com>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@intel.linux.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
mas_empty_area_rev() was not correctly validating the start of a gap
against the lower limit. This could lead to the range starting lower than
the requested minimum.
Fix the issue by better validating a gap once one is found.
This commit also adds tests to the maple tree test suite for this issue
and tests the mas_empty_area() function for similar bound checking.
Link: https://lkml.kernel.org/r/20230111200136.1851322-1-Liam.Howlett@oracle.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216911
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: <amanieu@gmail.com>
Link: https://lore.kernel.org/linux-mm/0b9f5425-08d4-8013-aa4c-e620c3b10bb2@leemhuis.info/
Tested-by: Holger Hoffsttte <holger@applied-asynchrony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"cpuset has a bug which can cause an oops after some configuration
operations, introduced during the v6.1 cycle.
This single commit fixes the bug"
* tag 'cgroup-for-6.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Fix wrong check in update_parent_subparts_cpumask()
|
|
This allows us to run some code when the guard is dropped (e.g.,
implicitly when it goes out of scope). We can also prevent the
guard from running by calling its `dismiss()` method.
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
It was found that the check to see if a partition could use up all
the cpus from the parent cpuset in update_parent_subparts_cpumask()
was incorrect. As a result, it is possible to leave parent with no
effective cpu left even if there are tasks in the parent cpuset. This
can lead to system panic as reported in [1].
Fix this probem by updating the check to fail the enabling the partition
if parent's effective_cpus is a subset of the child's cpus_allowed.
Also record the error code when an error happens in update_prstate()
and add a test case where parent partition and child have the same cpu
list and parent has task. Enabling partition in the child will fail in
this case.
[1] https://www.spinics.net/lists/cgroups/msg36254.html
Fixes: f0af1bfc27b5 ("cgroup/cpuset: Relax constraints to partition & cpus changes")
Cc: stable@vger.kernel.org # v6.1
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Add the per-SoC (qcom,sm8350-dsi-ctrl) compatible strings to DSI nodes
to follow the pending DSI bindings changes.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230118032024.1715857-1-dmitry.baryshkov@linaro.org
|
|
Per DT bindings add p1 register blocks to all DP controllers on SC8280XP
platform.
Fixes: 6f299ae7f96d ("arm64: dts: qcom: sc8280xp: add p1 register blocks to DP nodes")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230118031718.1714861-4-dmitry.baryshkov@linaro.org
|
|
The eDP device doesn't provide sound DAI. Drop corresponding property
from the eDP node.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230118031718.1714861-3-dmitry.baryshkov@linaro.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two core fixes.
One simply moves an annotation from put to release to avoid the
warning triggering needlessly in alua, but to keep it in case release
is ever called from that path (which we don't think will happen).
The other reverts a change to the PQ=1 target scanning behaviour
that's under intense discussion at the moment"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Revert "scsi: core: map PQ=1, PDT=other values to SCSI_SCAN_TARGET_PRESENT"
scsi: core: Fix the scsi_device_put() might_sleep annotation
|
|
Syzkaller triggered a WARN in put_pmu_ctx().
WARNING: CPU: 1 PID: 2245 at kernel/events/core.c:4925 put_pmu_ctx+0x1f0/0x278
This is because there is no locking around the access of "if
(!epc->ctx)" in find_get_pmu_context() and when it is set to NULL in
put_pmu_ctx().
The decrement of the reference count in put_pmu_ctx() also happens
outside of the spinlock, leading to the possibility of this order of
events, and the context being cleared in put_pmu_ctx(), after its
refcount is non zero:
CPU0 CPU1
find_get_pmu_context()
if (!epc->ctx) == false
put_pmu_ctx()
atomic_dec_and_test(&epc->refcount) == true
epc->refcount == 0
atomic_inc(&epc->refcount);
epc->refcount == 1
list_del_init(&epc->pmu_ctx_entry);
epc->ctx = NULL;
Another issue is that WARN_ON for no active PMU events in put_pmu_ctx()
is outside of the lock. If the perf_event_pmu_context is an embedded
one, even after clearing it, it won't be deleted and can be re-used. So
the warning can trigger. For this reason it also needs to be moved
inside the lock.
The above warning is very quick to trigger on Arm by running these two
commands at the same time:
while true; do perf record -- ls; done
while true; do perf record -- ls; done
[peterz: atomic_dec_and_raw_lock*()]
Fixes: bd2756811766 ("perf: Rewrite core context handling")
Reported-by: syzbot+697196bc0265049822bd@syzkaller.appspotmail.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20230127143141.1782804-2-james.clark@arm.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A couple of v4l2 core fixes:
- fix a regression on strings control support
- fix a regression for some drivers that depend on an odd streaming
behavior"
* tag 'media/v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: videobuf2: set q->streaming later
media: v4l2-ctrls-api.c: move ctrl->is_new = 1 to the correct line
|
|
Reading DR[0-3]_ADDR_MASK MSRs takes about 250 cycles which is going to
be noticeable with the AMD KVM SEV-ES DebugSwap feature enabled. KVM is
going to store host's DR[0-3] and DR[0-3]_ADDR_MASK before switching to
a guest; the hardware is going to swap these on VMRUN and VMEXIT.
Store MSR values passed to set_dr_addr_mask() in percpu variables
(when changed) and return them via new amd_get_dr_addr_mask().
The gain here is about 10x.
As set_dr_addr_mask() uses the array too, change the @dr type to
unsigned to avoid checking for <0. And give it the amd_ prefix to match
the new helper as the whole DR_ADDR_MASK feature is AMD-specific anyway.
While at it, replace deprecated boot_cpu_has() with cpu_feature_enabled()
in set_dr_addr_mask().
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230120031047.628097-2-aik@amd.com
|
|
Commit 2b3f056f72e5 moved a blk_put_queue() call from
blk_mq_destroy_queue() into its callers. Reflect this change in the
documentation block above blk_mq_destroy_queue().
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Keith Busch <kbusch@kernel.org>
Fixes: 2b3f056f72e5 ("blk-mq: move the call to blk_put_queue out of blk_mq_destroy_queue")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230130211233.831613-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add support to retrieve VDM attention messages and forward them to the
appropriate alt mode driver.
Signed-off-by: Prashant Malani <pmalani@chromium.org>
Reviewed-by: Benson Leung <bleung@chromium.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230126205620.3714994-2-pmalani@chromium.org
|
|
Incorporate updates to the EC headers to support the retrieval of VDM
Attention messages from port partners. These headers are already present
in the ChromeOS EC codebase. [1]
[1] https://source.chromium.org/chromium/chromiumos/platform/ec/+/main:include/ec_commands.h
Signed-off-by: Prashant Malani <pmalani@chromium.org>
[pmalani: Removed extra tab in header #define]
Reviewed-by: Benson Leung <bleung@chromium.org>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://lore.kernel.org/r/20230126205620.3714994-1-pmalani@chromium.org
|
|
Function __get_mem_detect_block() resets start and end
output parameters in case of invalid mem_detect array
index is provided. That violates the rule of sparing
the output on fail path and leads e.g to a below anomaly:
for_each_mem_detect_block(i, &start, &end)
continue;
One would expect start and end contain addresses of the
last memory block (if available), but in fact the two
will be reset to zeroes. That is not how an iterator is
expected to work.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Unbinding an I/O subchannel with a child-CCW device in disconnected
state sometimes causes a kernel-panic. The race condition was seen
mostly during testing, when setting all the CHPIDs of a device to
offline and at the same time, the unbinding the I/O subchannel driver.
The kernel-panic occurs because of double delete, the I/O subchannel
driver calls device_del on the CCW device while another device_del
invocation for the same device is in-flight. For instance, disabling
all the CHPIDs will trigger the ccw_device_remove function, which will
call a ccw_device_unregister(), which ends up calling the device_del()
which is asynchronous via cdev's todo workqueue. And unbinding the I/O
subchannel driver calls io_subchannel_remove() function which calls the
ccw_device_unregister() and device_del().
This double delete can be prevented by serializing all CCW device
registration/unregistration calls into the driver core. This patch
introduces a mutex which will be used for this purpose.
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reported-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
---[ Real Memory Copy Area Start ]---
0x001bfffffffff000-0x001c000000000000 4K PTE I
---[ Kasan Shadow Start ]---
---[ Real Memory Copy Area End ]---
0x001c000000000000-0x001c000200000000 8G PMD RW NX
...
---[ Kasan Shadow End ]---
ptdump does a stable sort of markers. Move kasan markers after
memcpy real to avoid swapping.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
setup_vmem() already calls populate for all online memory regions.
pgtable_populate_end() could be removed.
Also rename pgtable_populate_begin() to pgtable_populate_init().
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Commit bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
doesn't consider online memory holes due to potential memory offlining
and erroneously creates pgtables for stand-by memory, which bear RW+X
attribute and trigger a warning:
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x0000000c3fffffff 49G online yes 0-48
0x0000000c40000000-0x0000000c7fffffff 1G offline 49
0x0000000c80000000-0x0000000fffffffff 14G online yes 50-63
0x0000001000000000-0x00000013ffffffff 16G offline 64-79
s390/mm: Found insecure W+X mapping at address 0xc40000000
WARNING: CPU: 14 PID: 1 at arch/s390/mm/dump_pagetables.c:142 note_page+0x2cc/0x2d8
Map only online memory ranges which fit within identity mapping limit.
Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Historically calls to __decompress() didn't specify "out_len" parameter
on many architectures including s390, expecting that no writes beyond
uncompressed kernel image are performed. This has changed since commit
2aa14b1ab2c4 ("zstd: import usptream v1.5.2") which includes zstd library
commit 6a7ede3dfccb ("Reduce size of dctx by reutilizing dst buffer
(#2751)"). Now zstd decompression code might store literal buffer in
the unwritten portion of the destination buffer. Since "out_len" is
not set, it is considered to be unlimited and hence free to use for
optimization needs. On s390 this might corrupt initrd or ipl report
which are often placed right after the decompressor buffer. Luckily the
size of uncompressed kernel image is already known to the decompressor,
so to avoid the problem simply specify it in the "out_len" parameter.
Link: https://github.com/facebook/zstd/commit/6a7ede3dfccb
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Link: https://lore.kernel.org/r/patch-1.thread-41c676.git-41c676c2d153.your-ad-here.call-01675030179-ext-9637@work.hours
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This was originally added for the definition of nth_page(), but we no
longer use nth_page() in this header, so we can drop the heavyweight
mm.h now.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230131050132.2627124-1-willy@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Looks like kunit_test_init_section_suites(...) was messed up in a merge
conflict. This fixes it.
kunit_test_init_section_suites(...) was not updated to avoid the extra
level of indirection when .kunit_test_suites was flattened. Given no-one
was actively using it, this went unnoticed for a long period of time.
Fixes: e5857d396f35 ("kunit: flatten kunit_suite*** to kunit_suite** in .kunit_test_suites")
Signed-off-by: Brendan Higgins <brendan.higgins@linux.dev>
Signed-off-by: David Gow <davidgow@google.com>
Tested-by: Martin Fernandez <martin.fernandez@eclypsium.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Due to the way we use alternatives in the irqflags code, even when
CONFIG_ARM64_PSEUDO_NMI=n, we generate unused alternative code for
pseudo-NMI management. This patch reworks the irqflags code to remove
the redundant code when CONFIG_ARM64_PSEUDO_NMI=n, which benefits the
more common case, and will permit further rework of our DAIF management
(e.g. in preparation for ARMv8.8-A's NMI feature).
Prior to this patch a defconfig kernel has hundreds of redundant
instructions to access ICC_PMR_EL1 (which should only need to be
manipulated in setup code), which this patch removes:
| [mark@lakrids:~/src/linux]% usekorg 12.1.0 aarch64-linux-objdump -d vmlinux-before-defconfig | grep icc_pmr_el1 | wc -l
| 885
| [mark@lakrids:~/src/linux]% usekorg 12.1.0 aarch64-linux-objdump -d vmlinux-after-defconfig | grep icc_pmr_el1 | wc -l
| 5
Those instructions alone account for more than 3KiB of kernel text, and
will be associated with additional alt_instr entries, padding and
branches, etc.
These redundant instructions exist because we use alternative sequences
for to choose between DAIF / PMR management in irqflags.h, and even when
CONFIG_ARM64_PSEUDO_NMI=n, those alternative sequences will generate the
code for PMR management, along with alt_instr entries. We use
alternatives here as this was necessary to ensure that we never
encounter a mismatched local_irq_save() ... local_irq_restore() sequence
in the middle of patching, which was possible to see if we used static
keys to choose between DAIF and PMR management.
Since commit:
21fb26bfb01ffe0d ("arm64: alternatives: add alternative_has_feature_*()")
... we have a mechanism to use alternatives similarly to static keys,
allowing us to write the bulk of the logic in C code while also being
able to rely on all sites being patched in one go, and avoiding a
mismatched mismatched local_irq_save() ... local_irq_restore() sequence
during patching.
This patch rewrites arm64's local_irq_*() functions to use alternative
branches. This allows for the pseudo-NMI code to be entirely elided when
CONFIG_ARM64_PSEUDO_NMI=n, making a defconfig Image 64KiB smaller, and
not affectint the size of an Image with CONFIG_ARM64_PSEUDO_NMI=y:
| [mark@lakrids:~/src/linux]% ls -al vmlinux-*
| -rwxr-xr-x 1 mark mark 137473432 Jan 18 11:11 vmlinux-after-defconfig
| -rwxr-xr-x 1 mark mark 137918776 Jan 18 11:15 vmlinux-after-pnmi
| -rwxr-xr-x 1 mark mark 137380152 Jan 18 11:03 vmlinux-before-defconfig
| -rwxr-xr-x 1 mark mark 137523704 Jan 18 11:08 vmlinux-before-pnmi
| [mark@lakrids:~/src/linux]% ls -al Image-*
| -rw-r--r-- 1 mark mark 38646272 Jan 18 11:11 Image-after-defconfig
| -rw-r--r-- 1 mark mark 38777344 Jan 18 11:14 Image-after-pnmi
| -rw-r--r-- 1 mark mark 38711808 Jan 18 11:03 Image-before-defconfig
| -rw-r--r-- 1 mark mark 38777344 Jan 18 11:08 Image-before-pnmi
Some sensitive code depends on being run with interrupts enabled or with
interrupts disabled, and so when enabling or disabling interrupts we
must ensure that the compiler does not move such code around the actual
enable/disable. Before this patch, that was ensured by the combined asm
volatile blocks having memory clobbers (and any sensitive code either
being asm volatile, or touching memory). This patch consistently uses
explicit barrier() operations before and after the enable/disable, which
allows us to use the usual sysreg accessors (which are asm volatile) to
manipulate the interrupt masks. The use of pmr_sync() is pulled within
this critical section for consistency.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230130145429.903791-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When Priority Mask Hint Enable (PMHE) == 0b1, the GIC may use the PMR
value to determine whether to signal an IRQ to a PE, and consequently
after a change to the PMR value, a DSB SY may be required to ensure that
interrupts are signalled to a CPU in finite time. When PMHE == 0b0,
interrupts are always signalled to the relevant PE, and all masking
occurs locally, without requiring a DSB SY.
Since commit:
f226650494c6aa87 ("arm64: Relax ICC_PMR_EL1 accesses when ICC_CTLR_EL1.PMHE is clear")
... we handle this dynamically: in most cases a static key is used to
determine whether to issue a DSB SY, but the entry code must read from
ICC_CTLR_EL1 as static keys aren't accessible from plain assembly.
It would be much nicer to use an alternative instruction sequence for
the DSB, as this would avoid the need to read from ICC_CTLR_EL1 in the
entry code, and for most other code this will result in simpler code
generation with fewer instructions and fewer branches.
This patch adds a new ARM64_HAS_GIC_PRIO_RELAXED_SYNC cpucap which is
only set when ICC_CTLR_EL1.PMHE == 0b0 (and GIC priority masking is in
use). This allows us to replace the existing users of the
`gic_pmr_sync` static key with alternative sequences which default to a
DSB SY and are relaxed to a NOP when PMHE is not in use.
The entry assembly management of the PMR is slightly restructured to use
a branch (rather than multiple NOPs) when priority masking is not in
use. This is more in keeping with other alternatives in the entry
assembly, and permits the use of a separate alternatives for the
PMHE-dependent DSB SY (and removal of the conditional branch this
currently requires). For consistency I've adjusted both the save and
restore paths.
According to bloat-o-meter, when building defconfig +
CONFIG_ARM64_PSEUDO_NMI=y this shrinks the kernel text by ~4KiB:
| add/remove: 4/2 grow/shrink: 42/310 up/down: 332/-5032 (-4700)
The resulting vmlinux is ~66KiB smaller, though the resulting Image size
is unchanged due to padding and alignment:
| [mark@lakrids:~/src/linux]% ls -al vmlinux-*
| -rwxr-xr-x 1 mark mark 137508344 Jan 17 14:11 vmlinux-after
| -rwxr-xr-x 1 mark mark 137575440 Jan 17 13:49 vmlinux-before
| [mark@lakrids:~/src/linux]% ls -al Image-*
| -rw-r--r-- 1 mark mark 38777344 Jan 17 14:11 Image-after
| -rw-r--r-- 1 mark mark 38777344 Jan 17 13:49 Image-before
Prior to this patch we did not verify the state of ICC_CTLR_EL1.PMHE on
secondary CPUs. As of this patch this is verified by the cpufeature code
when using GIC priority masking (i.e. when using pseudo-NMIs).
Note that since commit:
7e3a57fa6ca831fa ("arm64: Document ICC_CTLR_EL3.PMHE setting requirements")
... Documentation/arm64/booting.rst specifies:
| - ICC_CTLR_EL3.PMHE (bit 6) must be set to the same value across
| all CPUs the kernel is executing on, and must stay constant
| for the lifetime of the kernel.
... so that should not adversely affect any compliant systems, and as
we'll only check for the absense of PMHE when using pseudo-NMIs, this
will only fire when such mismatch will adversely affect the system.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230130145429.903791-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently the arm64_cpu_capabilities structure for
ARM64_HAS_GIC_PRIO_MASKING open-codes the same CPU field definitions as
the arm64_cpu_capabilities structure for ARM64_HAS_GIC_CPUIF_SYSREGS, so
that can_use_gic_priorities() can use has_useable_gicv3_cpuif().
This duplication isn't ideal for the legibility of the code, and sets a
bad example for any ARM64_HAS_GIC_* definitions added by subsequent
patches.
Instead, have ARM64_HAS_GIC_PRIO_MASKING check for the
ARM64_HAS_GIC_CPUIF_SYSREGS cpucap, and add a comment explaining why
this is safe. Subsequent patches will use the same pattern where one
cpucap depends upon another.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230130145429.903791-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Subsequent patches will add more GIC-related cpucaps. When we do so, it
would be nice to give them a consistent HAS_GIC_* prefix.
In preparation for doing so, this patch renames the existing
ARM64_HAS_IRQ_PRIO_MASKING cap to ARM64_HAS_GIC_PRIO_MASKING.
The cpucaps file was hand-modified; all other changes were scripted
with:
find . -type f -name '*.[chS]' -print0 | \
xargs -0 sed -i 's/ARM64_HAS_IRQ_PRIO_MASKING/ARM64_HAS_GIC_PRIO_MASKING/'
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230130145429.903791-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Subsequent patches will add more GIC-related cpucaps. When we do so, it
would be nice to give them a consistent HAS_GIC_* prefix.
In preparation for doing so, this patch renames the existing
ARM64_HAS_SYSREG_GIC_CPUIF cap to ARM64_HAS_GIC_CPUIF_SYSREGS.
The 'CPUIF_SYSREGS' suffix is chosen so that this will be ordered ahead
of other ARM64_HAS_GIC_* definitions in subsequent patches.
The cpucaps file was hand-modified; all other changes were scripted
with:
find . -type f -name '*.[chS]' -print0 | \
xargs -0 sed -i
's/ARM64_HAS_SYSREG_GIC_CPUIF/ARM64_HAS_GIC_CPUIF_SYSREGS/'
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230130145429.903791-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently, when CONFIG_ARM64_PTR_AUTH_KERNEL=y (and
CONFIG_UNWIND_PATCH_PAC_INTO_SCS=n), we enable pointer authentication
for all functions, including leaf functions. This isn't necessary, and
is unfortunate for a few reasons:
* Any PACIASP instruction is implicitly a `BTI C` landing pad, and
forcing the addition of a PACIASP in every function introduces a
larger set of BTI gadgets than is necessary.
* The PACIASP and AUTIASP instructions make leaf functions larger than
necessary, bloating the kernel Image. For a defconfig v6.2-rc3 kernel,
this appears to add ~64KiB relative to not signing leaf functions,
which is unfortunate but not entirely onerous.
* The PACIASP and AUTIASP instructions potentially make leaf functions
more expensive in terms of performance and/or power. For many trivial
leaf functions, this is clearly unnecessary, e.g.
| <arch_local_save_flags>:
| d503233f paciasp
| d53b4220 mrs x0, daif
| d50323bf autiasp
| d65f03c0 ret
| <calibration_delay_done>:
| d503233f paciasp
| d50323bf autiasp
| d65f03c0 ret
| d503201f nop
* When CONFIG_UNWIND_PATCH_PAC_INTO_SCS=y we disable pointer
authentication for leaf functions, so clearly this is not functionally
necessary, indicates we have an inconsistent threat model, and
convolutes the Makefile logic.
We've used pointer authentication in leaf functions since the
introduction of in-kernel pointer authentication in commit:
74afda4016a7437e ("arm64: compile the kernel with ptrauth return address signing")
... but at the time we had no rationale for signing leaf functions.
Subsequently, we considered avoiding signing leaf functions:
https://lore.kernel.org/linux-arm-kernel/1586856741-26839-1-git-send-email-amit.kachhap@arm.com/
https://lore.kernel.org/linux-arm-kernel/1588149371-20310-1-git-send-email-amit.kachhap@arm.com/
... however at the time we didn't have an abundance of reasons to avoid
signing leaf functions as above (e.g. the BTI case), we had no hardware
to make performance measurements, and it was reasoned that this gave
some level of protection against a limited set of code-reuse gadgets
which would fall through to a RET. We documented this in commit:
717b938e22f8dbf0 ("arm64: Document why we enable PAC support for leaf functions")
Notably, this was before we supported any forward-edge CFI scheme (e.g.
Arm BTI, or Clang CFI/kCFI), which would prevent jumping into the middle
of a function.
In addition, even with signing forced for leaf functions, AUTIASP may be
placed before a number of instructions which might constitute such a
gadget, e.g.
| <user_regs_reset_single_step>:
| f9400022 ldr x2, [x1]
| d503233f paciasp
| d50323bf autiasp
| f9408401 ldr x1, [x0, #264]
| 720b005f tst w2, #0x200000
| b26b0022 orr x2, x1, #0x200000
| 926af821 and x1, x1, #0xffffffffffdfffff
| 9a820021 csel x1, x1, x2, eq // eq = none
| f9008401 str x1, [x0, #264]
| d65f03c0 ret
| <fpsimd_cpu_dead>:
| 2a0003e3 mov w3, w0
| 9000ff42 adrp x2, ffff800009ffd000 <xen_dynamic_chip+0x48>
| 9120e042 add x2, x2, #0x838
| 52800000 mov w0, #0x0 // #0
| d503233f paciasp
| f000d041 adrp x1, ffff800009a20000 <this_cpu_vector>
| d50323bf autiasp
| 9102c021 add x1, x1, #0xb0
| f8635842 ldr x2, [x2, w3, uxtw #3]
| f821685f str xzr, [x2, x1]
| d65f03c0 ret
| d503201f nop
So generally, trying to use AUTIASP to detect such gadgetization is not
robust, and this is dealt with far better by forward-edge CFI (which is
designed to prevent such cases). We should bite the bullet and stop
pretending that AUTIASP is a mitigation for such forward-edge
gadgetization.
For the above reasons, this patch has the kernel consistently sign
non-leaf functions and avoid signing leaf functions.
Considering a defconfig v6.2-rc3 kernel built with LLVM 15.0.6:
* The vmlinux is ~43KiB smaller:
| [mark@lakrids:~/src/linux]% ls -al vmlinux-*
| -rwxr-xr-x 1 mark mark 338547808 Jan 25 17:17 vmlinux-after
| -rwxr-xr-x 1 mark mark 338591472 Jan 25 17:22 vmlinux-before
* The resulting Image is 64KiB smaller:
| [mark@lakrids:~/src/linux]% ls -al Image-*
| -rwxr-xr-x 1 mark mark 32702976 Jan 25 17:17 Image-after
| -rwxr-xr-x 1 mark mark 32768512 Jan 25 17:22 Image-before
* There are ~400 fewer BTI gadgets:
| [mark@lakrids:~/src/linux]% usekorg 12.1.0 aarch64-linux-objdump -d vmlinux-before 2> /dev/null | grep -ow 'paciasp\|bti\sc\?' | sort | uniq -c
| 1219 bti c
| 61982 paciasp
| [mark@lakrids:~/src/linux]% usekorg 12.1.0 aarch64-linux-objdump -d vmlinux-after 2> /dev/null | grep -ow 'paciasp\|bti\sc\?' | sort | uniq -c
| 10099 bti c
| 52699 paciasp
Which is +8880 BTIs, and -9283 PACIASPs, for -403 unnecessary BTI
gadgets. While this is small relative to the total, distinguishing the
two cases will make it easier to analyse and reduce this set further
in future.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230131105809.991288-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Assemblers will reject instructions not supported by a target
architecture version, and so we must explicitly tell the assembler the
latest architecture version for which we want to assemble instructions
from.
We've added a few AS_HAS_ARMV8_<N> definitions for this, in addition to
an inconsistently named AS_HAS_PAC definition, from which arm64's
top-level Makefile determines the architecture version that we intend to
target, and generates the `asm-arch` variable.
To make this a bit clearer and easier to maintain, this patch reworks
the Makefile to determine asm-arch in a single if-else-endif chain.
AS_HAS_PAC, which is defined when the assembler supports
`-march=armv8.3-a`, is renamed to AS_HAS_ARMV8_3.
As the logic for armv8.3-a is lifted out of the block handling pointer
authentication, `asm-arch` may now be set to armv8.3-a regardless of
whether support for pointer authentication is selected. This means that
it will be possible to assemble armv8.3-a instructions even if we didn't
intend to, but this is consistent with our handling of other
architecture versions, and the compiler won't generate armv8.3-a
instructions regardless.
For the moment there's no need for an CONFIG_AS_HAS_ARMV8_1, as the code
for LSE atomics and LDAPR use individual `.arch_extension` entries and
do not require the baseline asm arch to be bumped to armv8.1-a. The
other armv8.1-a features (e.g. PAN) do not require assembler support.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230131105809.991288-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The newly added zt-test program copied the pattern from the other FP
stress test programs of having a redundant _start label which is
rejected by clang, as we did in a parallel series for the other tests
remove the label so we can build with clang.
No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230130-arm64-fix-sme2-clang-v1-1-3ce81d99ea8f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Add DTs for Qualcomm IDP platforms using the QDU1000 and QRU1000
SoCs.
Signed-off-by: Melody Olvera <quic_molvera@quicinc.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230112210722.6234-3-quic_molvera@quicinc.com
|
|
Add the base DTSI files for QDU1000 and QRU1000 SoCs, including base
descriptions of CPUs, GCC, RPMHCC, QUP, TLMM, and interrupt-controller
to boot to shell with console on these SoCs.
Signed-off-by: Melody Olvera <quic_molvera@quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230112210722.6234-2-quic_molvera@quicinc.com
|
|
arm64-for-6.3
Merge DT binding in order to get GCC clock defines.
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into HEAD
Merge DT binding to gain interconnect defines.
|
|
Changing pfn on a user page table mapped entry, without first going through
break-before-make (BBM) procedure is unsafe. This just updates set_pte_at()
to intercept such changes, via an updated pgattr_change_is_safe(). This new
check happens via __check_racy_pte_update(), which has now been renamed as
__check_safe_pte_update().
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20230130121457.1607675-1-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Correct spelling problems for Documentation/arm64/ as reported
by codespell.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Link: https://lore.kernel.org/r/20230127064005.1558-3-rdunlap@infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The mailing list at librecores.org is being shut down due to
infrastructure issues. Update the the newly created list on
vger.kernel.org.
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
Microcode gets reloaded late only if "1" is written to the reload file.
However, the code silently treats any other unsigned integer as a
successful write even though no actions are performed to load microcode.
Make the loader more strict to accept only "1" as a trigger value and
return an error otherwise.
[ bp: Massage commit message. ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230130213955.6046-3-ashok.raj@intel.com
|
|
When SVE was initially merged we chose to export the maximum VQ in the ABI
as being 512, rather more than the architecturally supported maximum of 16.
For the ptrace tests this results in us generating a lot of test cases and
hence log output which are redundant since a system couldn't possibly
support them. Instead only check values up to the current architectural
limit, plus one more so that we're covering the constraining of higher
vector lengths.
This makes no practical difference to our test coverage, speeds things up
on slower consoles and makes the output much more managable.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230111-arm64-kselftest-ptrace-max-vl-v1-1-8167f41d1ad8@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Owner of one unprivileged ublk device could be one evil user, which
can grant this disk's privilege to other users deliberately, and
this way could be like making one trap and waiting for other users
to be caught.
So only owner to open unprivileged disk even though the owner
grants disk privilege to other user. This way is reasonable too
given anyone can create ublk disk, and no need other's grant.
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Fixes: 4093cb5a0634 ("ublk_drv: add mechanism for supporting unprivileged ublk device")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230131040446.214583-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When validating drafted SPDK ublk target, in a case that
assigning large queue depth to multiqueue ublk device,
ublk target would run into a weird incorrect state. During
rounds of review and debug, An overflow bug was found
in ublk driver.
In ublk_cmd.h, UBLK_MAX_QUEUE_DEPTH is 4096 which means
each ublk queue depth can be set as large as 4096. But
when setting qd for a ublk device,
sizeof(struct ublk_queue) + depth * sizeof(struct ublk_io)
will be larger than 65535 if qd is larger than 2728.
Then queue_size is overflowed, and ublk_get_queue()
references a wrong pointer position. The wrong content of
ublk_queue elements will lead to out-of-bounds memory
access.
Extend queue_size in ublk_device as "unsigned int".
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230131070552.115067-1-xiaodong.liu@intel.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Improve code clarity and enable earlier use of
tidbuf->npages by moving its assignment to
structure creation time.
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/167329104884.1472990.4639750192433251493.stgit@awfm-02.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
After a call to console_unlock() in vcs_read() the vc_data struct can be
freed by vc_deallocate(). Because of that, the struct vc_data pointer
load must be done at the top of while loop in vcs_read() to avoid a UAF
when vcs_size() is called.
Syzkaller reported a UAF in vcs_size().
BUG: KASAN: use-after-free in vcs_size (drivers/tty/vt/vc_screen.c:215)
Read of size 4 at addr ffff8881137479a8 by task 4a005ed81e27e65/1537
CPU: 0 PID: 1537 Comm: 4a005ed81e27e65 Not tainted 6.2.0-rc5 #1
Hardware name: Red Hat KVM, BIOS 1.15.0-2.module
Call Trace:
<TASK>
__asan_report_load4_noabort (mm/kasan/report_generic.c:350)
vcs_size (drivers/tty/vt/vc_screen.c:215)
vcs_read (drivers/tty/vt/vc_screen.c:415)
vfs_read (fs/read_write.c:468 fs/read_write.c:450)
...
</TASK>
Allocated by task 1191:
...
kmalloc_trace (mm/slab_common.c:1069)
vc_allocate (./include/linux/slab.h:580 ./include/linux/slab.h:720
drivers/tty/vt/vt.c:1128 drivers/tty/vt/vt.c:1108)
con_install (drivers/tty/vt/vt.c:3383)
tty_init_dev (drivers/tty/tty_io.c:1301 drivers/tty/tty_io.c:1413
drivers/tty/tty_io.c:1390)
tty_open (drivers/tty/tty_io.c:2080 drivers/tty/tty_io.c:2126)
chrdev_open (fs/char_dev.c:415)
do_dentry_open (fs/open.c:883)
vfs_open (fs/open.c:1014)
...
Freed by task 1548:
...
kfree (mm/slab_common.c:1021)
vc_port_destruct (drivers/tty/vt/vt.c:1094)
tty_port_destructor (drivers/tty/tty_port.c:296)
tty_port_put (drivers/tty/tty_port.c:312)
vt_disallocate_all (drivers/tty/vt/vt_ioctl.c:662 (discriminator 2))
vt_ioctl (drivers/tty/vt/vt_ioctl.c:903)
tty_ioctl (drivers/tty/tty_io.c:2776)
...
The buggy address belongs to the object at ffff888113747800
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 424 bytes inside of
1024-byte region [ffff888113747800, ffff888113747c00)
The buggy address belongs to the physical page:
page:00000000b3fe6c7c refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x113740
head:00000000b3fe6c7c order:3 compound_mapcount:0 subpages_mapcount:0
compound_pincount:0
anon flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
raw: 0017ffffc0010200 ffff888100042dc0 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888113747880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888113747900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff888113747980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888113747a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888113747a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
Fixes: ac751efa6a0d ("console: rename acquire/release_console_sem() to console_lock/unlock()")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Suggested-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: George Kennedy <george.kennedy@oracle.com>
Link: https://lore.kernel.org/r/1674577014-12374-1-git-send-email-george.kennedy@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Running 'make dtbs_check' with schema in net/marvell,orion-mdio.yaml
gives following warnings:
mdio-bus@72004: Unevaluated properties are not allowed
('ethernet-phy' was unexpected)
arch/arm/boot/dts/dove-cubox.dtb
arch/arm/boot/dts/dove-cubox-es.dtb
arch/arm/boot/dts/dove-d2plug.dtb
arch/arm/boot/dts/dove-d2plug.dtb
arch/arm/boot/dts/dove-dove-db.dtb
arch/arm/boot/dts/dove-d3plug.dtb
arch/arm/boot/dts/dove-sbc-a510.dtb
As every subnode of mdio is expected to have an @X, ethernet-phy subnode
in dove.dtsi doesn't have one. Fix these errors by moving ethernet-phy
into relevant .dts files with correct @<reg address>.
Signed-off-by: Michał Grzelak <mig@semihalf.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
|
|
The cited commit moves umem to call the unlocked versions of dmabuf
unmap/map attachment, but the lock is held while calling to these
functions, hence move back to the locked versions of these APIs.
Fixes: 21c9c5c0784f ("RDMA/umem: Prepare to dynamic dma-buf locking specification")
Link: https://lore.kernel.org/r/311c2cb791f8af75486df446819071357353db1b.1675088709.git.leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|