Age | Commit message (Collapse) | Author |
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v6.1-rc1:
[airlied - fix sun4i_tv build]
UAPI Changes:
- Hide unregistered connectors from GETCONNECTOR ioctl.
- drm/virtio no longer advertises LINEAR modifier, as it doesn't work.
-
Cross-subsystem Changes:
- Fix GPF in udmabuf failure path.
Core Changes:
- Rework TTM placement to use intersect/compatible functions.
- Drop legacy DP-MST support.
- More DP-MST related fixes, and move all state into atomic.
- Make DRM_MIPI_DBI select DRM_KMS_HELPER.
- Add audio_infoframe packing for DP.
- Add logging when some atomic check functions fail.
- Assorted documentation updates and fixes.
Driver Changes:
- Assorted cleanups and fixes in msm, lcdif, nouveau, virtio,
panel/ilitek, bridge/icn6211, tve200, gma500, bridge/*, panfrost, via,
bochs, qxl, sun4i.
- Add add AUO B133UAN02.1, IVO M133NW4J-R3, Innolux N120ACA-EA1 eDP panels.
- Improve DP-MST modeset state handling in amdgpu, nouveau, i915.
- Drop DP-MST from radeon driver, it was broken and only user of legacy
DP-MST.
- Handle unplugging better in vc4.
- Simplify drm cmdparser tests.
- Add DP support to ti-sn65dsi86.
- Add MT8195 DP support to mediatek.
- Support RGB565, XRGB64, and ARGB64 formats in vkms.
- Convert sun4i tv support to atomic.
- Refactor vc4/vec TV Modesetting, and fix timings.
- Use atomic helpers instead of simple display helpers in ssd130x.
Maintainer changes:
- Add Douglas Anderson as reviewer for panel-edp.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a489485b-3ebc-c734-0f80-aed963d89efe@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Intel VT-d fixes from Lu Baolu:
- Boot kdump kernels with VT-d scalable mode on
- Calculate the right page table levels
- Fix two recursive locking issues
- Fix a lockdep splat issue
- AMD IOMMU fixes:
- Fix for completion-wait command to use full 64 bits of data
- Fix PASID related issue where GPU sound devices failed to
initialize
- Fix for Virtio-IOMMU to report correct caching behavior, needed for
use with VFIO
* tag 'iommu-fixes-v6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix false ownership failure on AMD systems with PASID activated
iommu/vt-d: Fix possible recursive locking in intel_iommu_init()
iommu/virtio: Fix interaction with VFIO
iommu/vt-d: Fix lockdep splat due to klist iteration in atomic context
iommu/vt-d: Fix recursive lock issue in iommu_flush_dev_iotlb()
iommu/vt-d: Correctly calculate sagaw value of IOMMU
iommu/vt-d: Fix kdump kernels boot failure with scalable mode
iommu/amd: use full 64-bit value in build_completion_wait()
|
|
In order for everyone to understand clearly why we want to use
maintenance charging for batteries, expand the description with two
diagrams and some text.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.
The dmar_global_lock used in the intel_iommu_init() might cause recursive
locking issue, for example, intel_iommu_get_resv_regions() is taking the
dmar_global_lock from within a section where intel_iommu_init() already
holds it via probe_acpi_namespace_devices().
Using dmar_global_lock in intel_iommu_init() could be relaxed since it is
unlikely that any IO board must be hot added before the IOMMU subsystem is
initialized. This eliminates the possible recursive locking issue by moving
down DMAR hotplug support after the IOMMU is initialized and removing the
uses of dmar_global_lock in intel_iommu_init().
Fixes: d5692d4af08cd ("iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()")
Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/894db0ccae854b35c73814485569b634237b5538.1657034828.git.robin.murphy@arm.com
Link: https://lore.kernel.org/r/20220718235325.3952426-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Verifier logic to confirm that a callback function returns 0 or 1 was
added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper").
At the time, callback return value was only used to continue or stop
iteration.
In order to support callbacks with a broader return value range, such as
those added in rbtree series[0] and others, add a callback_ret_range to
bpf_func_state. Verifier's helpers which set in_callback_fn will also
set the new field, which the verifier will later use to check return
value bounds.
Default to tnum_range(0, 0) instead of using tnum_unknown as a sentinel
value as the latter would prevent the valid range (0, U64_MAX) being
used. Previous global default tnum_range(0, 1) is explicitly set for
extant callback helpers. The change to global default was made after
discussion around this patch in rbtree series [1], goal here is to make
it more obvious that callback_ret_range should be explicitly set.
[0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com/
[1]: lore.kernel.org/bpf/20220830172759.4069786-2-davemarchevsky@fb.com/
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220908230716.2751723-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add corresponding unimplemented stub for when CONFIG_BPF_SYSCALL=n
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/4021398e884433b1fef57a4d28361bb9fcf1bd05.1662568410.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Wrapper function that finds all memory type resources by
using acpi_dev_get_resources(). It removes the need for the
drivers to check the resource data type separately.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- revert a panic on swiotlb initialization failure (Yu Zhao)
- fix the lookup for partial syncs in dma-debug (Robin Murphy)
- fix a shift overflow in swiotlb (Chao Gao)
- fix a comment typo in swiotlb (Chao Gao)
- mark a function static now that all abusers are gone (Christoph
Hellwig)
* tag 'dma-mapping-6.0-2022-09-10' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: mark dma_supported static
swiotlb: fix a typo
swiotlb: avoid potential left shift overflow
dma-debug: improve search for partial syncs
Revert "swiotlb: panic if nslabs is too small"
|
|
Linux 6.0-rc4 so we can test on BeagleBone again.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some small driver core and debugfs fixes for 6.0-rc5.
Included in here are:
- multiple attempts to get the arch_topology code to work properly on
non-cluster SMT systems. First attempt caused build breakages in
linux-next and 0-day, second try worked.
- debugfs fixes for a long-suffering memory leak. The pattern of
debugfs_remove(debugfs_lookup(...)) turns out to leak dentries, so
add debugfs_lookup_and_remove() to fix this problem. Also fix up
the scheduler debug code that highlighted this problem. Fixes for
other subsystems will be trickling in over the next few months for
this same issue once the debugfs function is merged.
All of these have been in linux-next since Wednesday with no reported
problems"
* tag 'driver-core-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
arch_topology: Make cluster topology span at least SMT CPUs
sched/debug: fix dentry leak in update_sched_domain_debugfs
debugfs: add debugfs_lookup_and_remove()
driver core: fix driver_set_override() issue with empty strings
Revert "arch_topology: Make cluster topology span at least SMT CPUs"
arch_topology: Make cluster topology span at least SMT CPUs
|
|
Pull rdma fixes from Jason Gunthorpe:
"Many bug fixes in several drivers:
- Fix misuse of the DMA API in rtrs
- Several irdma issues: hung task due to SQ flushing, incorrect
capability reporting to userspace, improper error handling for MW
corners, touching an uninitialized SGL for during invalidation.
- hns was using the wrong page size limits for the HW, an incorrect
calculation of wqe_shift causing WQE corruption, and mis computed a
timer id.
- Fix a crash in SRP triggered by blktests
- Fix compiler errors by calling virt_to_page() with the proper type
in siw
- Userspace triggerable deadlock in ODP
- mlx5 could use the wrong profile due to some driver loading races,
counters were not working in some device configurations, and a
crash on error unwind"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/irdma: Report RNR NAK generation in device caps
RDMA/irdma: Use s/g array in post send only when its valid
RDMA/irdma: Return correct WC error for bind operation failure
RDMA/irdma: Return error on MR deregister CQP failure
RDMA/irdma: Report the correct max cqes from query device
MAINTAINERS: Update maintainers of HiSilicon RoCE
RDMA/mlx5: Fix UMR cleanup on error flow of driver init
RDMA/mlx5: Set local port to one when accessing counters
RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile
IB/core: Fix a nested dead lock as part of ODP flow
RDMA/siw: Pass a pointer to virt_to_page()
RDMA/srp: Set scmnd->result only when scmnd is not NULL
RDMA/hns: Remove the num_qpc_timer variable
RDMA/hns: Fix wrong fixed value of qp->rq.wqe_shift
RDMA/hns: Fix supported page size
RDMA/cma: Fix arguments order in net device validation
RDMA/irdma: Fix drain SQ hang with no completion
RDMA/rtrs-srv: Pass the correct number of entries for dma mapped SGL
RDMA/rtrs-clt: Use the right sg_cnt after ib_dma_map_sg
|
|
On some platforms it is found that Linux more aggressively enters s2idle
than Windows enters Modern Standby and this uncovers some synchronization
issues for the platform. To aid in debugging this class of problems in
the future, add support for an extra optional callback intended for
drivers to emit extra debugging.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220829162953.5947-2-mario.limonciello@amd.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
As PAGE_SIZE is unsigned long, -1 > PAGE_SIZE when NR_CPUS <= 3.
This leads to very large file sizes:
topology$ ls -l
total 0
-r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 core_cpus
-r--r--r-- 1 root root 4096 Sep 5 11:59 core_cpus_list
-r--r--r-- 1 root root 4096 Sep 5 10:58 core_id
-r--r--r-- 1 root root 18446744073709551615 Sep 5 10:10 core_siblings
-r--r--r-- 1 root root 4096 Sep 5 11:59 core_siblings_list
-r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 die_cpus
-r--r--r-- 1 root root 4096 Sep 5 11:59 die_cpus_list
-r--r--r-- 1 root root 4096 Sep 5 11:59 die_id
-r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 package_cpus
-r--r--r-- 1 root root 4096 Sep 5 11:59 package_cpus_list
-r--r--r-- 1 root root 4096 Sep 5 10:58 physical_package_id
-r--r--r-- 1 root root 18446744073709551615 Sep 5 10:10 thread_siblings
-r--r--r-- 1 root root 4096 Sep 5 11:59 thread_siblings_list
Adjust the inequality to catch the case when NR_CPUS is configured
to a small value.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: stable@vger.kernel.org
Cc: feng xiangjun <fengxj325@gmail.com>
Fixes: 7ee951acd31a ("drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist")
Reported-by: feng xiangjun <fengxj325@gmail.com>
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
|
|
PSI accounts stalls for each cgroup separately and aggregates it
at each level of the hierarchy. This may cause non-negligible overhead
for some workloads when under deep level of the hierarchy.
commit 3958e2d0c34e ("cgroup: make per-cgroup pressure stall tracking configurable")
make PSI to skip per-cgroup stall accounting, only account system-wide
to avoid this each level overhead.
But for our use case, we also want leaf cgroup PSI stats accounted for
userspace adjustment on that cgroup, apart from only system-wide adjustment.
So this patch introduce a per-cgroup PSI accounting disable/re-enable
interface "cgroup.pressure", which is a read-write single value file that
allowed values are "0" and "1", the defaults is "1" so per-cgroup
PSI stats is enabled by default.
Implementation details:
It should be relatively straight-forward to disable and re-enable
state aggregation, time tracking, averaging on a per-cgroup level,
if we can live with losing history from while it was disabled.
I.e. the avgs will restart from 0, total= will have gaps.
But it's hard or complex to stop/restart groupc->tasks[] updates,
which is not implemented in this patch. So we always update
groupc->tasks[] and PSI_ONCPU bit in psi_group_change() even when
the cgroup PSI stats is disabled.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20220907090332.2078-1-zhouchengming@bytedance.com
|
|
We use iterate_groups() to iterate each level psi_group to update
PSI stats, which is a very hot path.
In current code, iterate_groups() have to use multiple branches and
cgroup_parent() to get parent psi_group for each level, which is not
very efficient.
This patch cache parent psi_group in struct psi_group, only need to get
psi_group of task itself first, then just use group->parent to iterate.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20220825164111.29534-10-zhouchengming@bytedance.com
|
|
cgroup_psi() can't return psi_group for root cgroup, so we have many
open code "psi = cgroup_ino(cgrp) == 1 ? &psi_system : cgrp->psi".
This patch move cgroup_psi() definition to <linux/psi.h>, in which
we can return psi_system for root cgroup, so can handle all cgroups.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20220825164111.29534-9-zhouchengming@bytedance.com
|
|
Now PSI already tracked workload pressure stall information for
CPU, memory and IO. Apart from these, IRQ/SOFTIRQ could have
obvious impact on some workload productivity, such as web service
workload.
When CONFIG_IRQ_TIME_ACCOUNTING, we can get IRQ/SOFTIRQ delta time
from update_rq_clock_task(), in which we can record that delta
to CPU curr task's cgroups as PSI_IRQ_FULL status.
Note we don't use PSI_IRQ_SOME since IRQ/SOFTIRQ always happen in
the current task on the CPU, make nothing productive could run
even if it were runnable, so we only use PSI_IRQ_FULL.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20220825164111.29534-8-zhouchengming@bytedance.com
|
|
We put all fields updated by the scheduler in the first cacheline of
struct psi_group_cpu for performance.
Since we want add another PSI_IRQ_FULL to track IRQ/SOFTIRQ pressure,
we need to reclaim space first. This patch remove NR_ONCPU task accounting
in struct psi_group_cpu, use one bit in state_mask to track instead.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lore.kernel.org/r/20220825164111.29534-7-zhouchengming@bytedance.com
|
|
This patch move psi_task_change/psi_task_switch declarations out of
PSI public header, since they are only needed for implementing the
PSI stats tracking in sched/stats.h
psi_task_switch is obvious, psi_task_change can't be public helper
since it doesn't check psi_disabled static key. And there is no
any user now, so put it in sched/stats.h too.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20220825164111.29534-5-zhouchengming@bytedance.com
|
|
Pull in dependent cgroup patches
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
All non-UAPI asm/termios.h consist of include of UAPI counterpart
and, possibly, include of linux/uaccess.h
The latter can't be simply removed, even though nothing in
linux/termios.h doesn't depend upon it anymore - there are several
places that rely upon that indirect chain of includes to pull
linux/uaccess.h. So the include needs to be lifted out of there -
we lift into tty_driver.h, serdev.h and places that pull asm/termios.h,
but none of
* linux/uaccess.h (obvious)
* net/sock.h (pulls uaccess.h)
* linux/{tty,tty_driver,serdev}.h (tty.h pulls tty_driver.h)
That leaves us just with the include of UAPI asm/termios.h, which is
what <asm/termios.h> will resolve to if we simply remove non-UAPI header.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/YxDnKvYCHn/ogBUv@ZenIV
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/YxDnDCR2VRTA3Etp@ZenIV
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
turn it into an array initializer; then alpha, mips and powerpc
variants fold into it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/YxDm7M6M91gC2RPL@ZenIV
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
On old systems it used to be ^O. Linux had never actually used
the value, but INIT_C_CC (on i386) did initialize it to ^O;
unfortunately, it had a typo in the comment claiming that to be
^U. Most of the architectures copied the (correct) definition
along with mistaken comment. alpha, powerpc and sparc tried
to make the definition match comment.
However, util-linux still resets it to ^O on any architecture,
^O is the historical value, kernel ignores it anyway and finally,
Linus said "Just change everybody to do the same, nobody cares
about VDISCARD".
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/YxDmy//MKzs3ye7l@ZenIV
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
* new header (linut/termios_internal.h), pulled by the users of those
suckers
* defaults for INIT_C_CC and externs for conversion helpers moved over
there
* remove termios-base.h (empty now)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/YxDmptU7dNGZ+/Hn@ZenIV
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Now that dma-iommu.h only contains internal interfaces, make it
private to the IOMMU subsytem.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b237e06c56a101f77af142a54b629b27aa179d22.1660668998.git.robin.murphy@arm.com
[ joro : re-add stub for iommu_dma_get_resv_regions ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
DEFINE_RES_ macros have been created for the commonly used resource types,
but not IORESOURCE_REG. Add the macro so it can be used in a similar manner
to all other resource types.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20220905162132.2943088-7-colin.foster@in-advantage.com
|
|
Several ocelot-related modules are designed for MMIO / regmaps. As such,
they often use a combination of devm_platform_get_and_ioremap_resource()
and devm_regmap_init_mmio().
Operating in an MFD might be different, in that it could be memory mapped,
or it could be SPI, I2C... In these cases a fallback to use IORESOURCE_REG
instead of IORESOURCE_MEM becomes necessary.
When this happens, there's redundant logic that needs to be implemented in
every driver. In order to avoid this redundancy, utilize a single function
that, if the MFD scenario is enabled, will perform this fallback logic.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20220905162132.2943088-2-colin.foster@in-advantage.com
|
|
SBI firmware may not provide information for some counters in response
to SBI_EXT_PMU_COUNTER_GET_INFO call. Exclude such counters from the
subsequent SBI requests. For this purpose use global mask to keep track
of fully specified counters.
Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20220830155306.301714-3-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Introduce the DMA logging feature support in the vfio core layer.
It includes the processing of the device start/stop/report DMA logging
UAPIs and calling the relevant driver 'op' to do the work.
Specifically,
Upon start, the core translates the given input ranges into an interval
tree, checks for unexpected overlapping, non aligned ranges and then
pass the translated input to the driver for start tracking the given
ranges.
Upon report, the core translates the given input user space bitmap and
page size into an IOVA kernel bitmap iterator. Then it iterates it and
call the driver to set the corresponding bits for the dirtied pages in a
specific IOVA range.
Upon stop, the driver is called to stop the previous started tracking.
The next patches from the series will introduce the mlx5 driver
implementation for the logging ops.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20220908183448.195262-6-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
The new facility adds a bunch of wrappers that abstract how an IOVA range
is represented in a bitmap that is granulated by a given page_size. So it
translates all the lifting of dealing with user pointers into its
corresponding kernel addresses backing said user memory into doing finally
the (non-atomic) bitmap ops to change various bits.
The formula for the bitmap is:
data[(iova / page_size) / 64] & (1ULL << (iova % 64))
Where 64 is the number of bits in a unsigned long (depending on arch)
It introduces an IOVA iterator that uses a windowing scheme to minimize the
pinning overhead, as opposed to pinning it on demand 4K at a time. Assuming
a 4K kernel page and 4K requested page size, we can use a single kernel
page to hold 512 page pointers, mapping 2M of bitmap, representing 64G of
IOVA space.
An example usage of these helpers for a given @base_iova, @page_size,
@length and __user @data:
bitmap = iova_bitmap_alloc(base_iova, page_size, length, data);
if (IS_ERR(bitmap))
return -ENOMEM;
ret = iova_bitmap_for_each(bitmap, arg, dirty_reporter_fn);
iova_bitmap_free(bitmap);
Each iteration of the @dirty_reporter_fn is called with a unique @iova
and @length argument, indicating the current range available through the
iova_bitmap. The @dirty_reporter_fn uses iova_bitmap_set() to mark dirty
areas (@iova_length) within that provided range, as following:
iova_bitmap_set(bitmap, iova, iova_length);
The facility is intended to be used for user bitmaps representing dirtied
IOVAs by IOMMU (via IOMMUFD) and PCI Devices (via vfio-pci).
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20220908183448.195262-5-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"Several fixes that came in since the merge window, the major one being
a fix for the spi-mux driver which was broken by the performance
optimisations due to it peering inside the core's data structures more
than it should"
* tag 'spi-fix-v6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi: Fix queue hang if previous transfer failed
spi: mux: Fix mux interaction with fast path optimisations
spi: cadence-quadspi: Disable irqs during indirect reads
spi: bitbang: Fix lsb-first Rx
|
|
Merge net/mlx5 depedencies for device DMA logging and mlx5 variant
driver suppport.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
drivers/net/ethernet/freescale/fec.h
7d650df99d52 ("net: fec: add pm_qos support on imx6q platform")
40c79ce13b03 ("net: fec: add stop mode support for imx8 platform")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The commit 5e0531f6b90a ("spi: Add capability to perform some transfer
with chipselect off") added a new flag but squeezed it into a wrong
group of struct spi_transfer members (note that SPI_NBITS_* are macros
for easier interpretation of the tx_nbits and rx_nbits bitfields).
Group cs_change and cs_off flags together and their doc strings.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220908130518.32186-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes
Arm SCMI fixes for v6.0
Few fixes addressing possible out of bound access violations by
hardening them, incorrect asynchronous resets by restricting them,
incorrect SCMI tracing message format by harmonizing them, missing
kernel-doc in optee transport, missing SCMI PM driver remove
routine by adding it to avoid warning when scmi driver is unloaded
and finally improve checks in the info_get operations.
* tag 'scmi-fixes-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Harmonize SCMI tracing message format
firmware: arm_scmi: Add SCMI PM driver remove routine
firmware: arm_scmi: Fix the asynchronous reset requests
firmware: arm_scmi: Harden accesses to the reset domains
firmware: arm_scmi: Harden accesses to the sensor domains
firmware: arm_scmi: Improve checks in the info_get operations
firmware: arm_scmi: Fix missing kernel-doc in optee
Link: https://lore.kernel.org/r/20220829174435.207911-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from rxrpc, netfilter, wireless and bluetooth
subtrees.
Current release - regressions:
- skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
- bluetooth: fix regression preventing ACL packet transmission
Current release - new code bugs:
- dsa: microchip: fix kernel oops on ksz8 switches
- dsa: qca8k: fix NULL pointer dereference for
of_device_get_match_data
Previous releases - regressions:
- netfilter: clean up hook list when offload flags check fails
- wifi: mt76: fix crash in chip reset fail
- rxrpc: fix ICMP/ICMP6 error handling
- ice: fix DMA mappings leak
- i40e: fix kernel crash during module removal
Previous releases - always broken:
- ipv6: sr: fix out-of-bounds read when setting HMAC data.
- tcp: TX zerocopy should not sense pfmemalloc status
- sch_sfb: don't assume the skb is still around after
enqueueing to child
- netfilter: drop dst references before setting
- wifi: wilc1000: fix DMA on stack objects
- rxrpc: fix an insufficiently large sglist in
rxkad_verify_packet_2()
- fec: use a spinlock to guard `fep->ptp_clk_on`
Misc:
- usb: qmi_wwan: add Quectel RM520N"
* tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
sch_sfb: Also store skb len before calling child enqueue
net: phy: lan87xx: change interrupt src of link_up to comm_ready
net/smc: Fix possible access to freed memory in link clear
net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb
net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear
net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
net: usb: qmi_wwan: add Quectel RM520N
net: dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
tcp: fix early ETIMEDOUT after spurious non-SACK RTO
stmmac: intel: Simplify intel_eth_pci_remove()
net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()
ipv6: sr: fix out-of-bounds read when setting HMAC data.
bonding: accept unsolicited NA message
bonding: add all node mcast address when slave up
bonding: use unspecified address if no available link local address
wifi: use struct_group to copy addresses
wifi: mac80211_hwsim: check length for virtio packets
...
|
|
Commit d4252071b97d ("add barriers to buffer_uptodate and
set_buffer_uptodate") added proper memory barriers to the buffer head
BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date
will be guaranteed to actually see initialized state.
However, that commit didn't _just_ add the memory barrier, it also ended
up dropping the "was it already set" logic that the BUFFER_FNS() macro
had.
That's conceptually the right thing for a generic "this is a memory
barrier" operation, but in the case of the buffer contents, we really
only care about the memory barrier for the _first_ time we set the bit,
in that the only memory ordering protection we need is to avoid anybody
seeing uninitialized memory contents.
Any other access ordering wouldn't be about the BH_Uptodate bit anyway,
and would require some other proper lock (typically BH_Lock or the folio
lock). A reader that races with somebody invalidating the buffer head
isn't an issue wrt the memory ordering, it's a serialization issue.
Now, you'd think that the buffer head operations don't matter in this
day and age (and I certainly thought so), but apparently some loads
still end up being heavy users of buffer heads. In particular, the
kernel test robot reported that not having this bit access optimization
in place caused a noticeable direct IO performance regression on ext4:
fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression
although you presumably need a fast disk and a lot of cores to actually
notice.
Link: https://lore.kernel.org/all/Yw8L7HTZ%2FdE2%2Fo9C@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Fengwei Yin <fengwei.yin@intel.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In preparation to make memory operations accessible for a non
ffa_driver/device, it is better to split the ffa_ops into different
categories of operations: info, message and memory. The info and memory
are ffa_device independent and can be used without any associated
ffa_device from a non ffa_driver.
However, we don't export these info and memory APIs yet without the user.
The first users of these APIs can export them.
Link: https://lore.kernel.org/r/20220907145240.1683088-11-sudeep.holla@arm.com
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
FF-A v1.1 adds a flag in the partition properties to indicate if the
partition runs in the AArch32 or AArch64 execution state. Use the same
to set-up the 32-bit execution flag mode in the ffa_dev automatically
if the detected firmware version is above v1.0 and ignore any requests
to do the same from the ffa_driver.
Link: https://lore.kernel.org/r/20220907145240.1683088-10-sudeep.holla@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
FF-A v1.1 adds support to discovery the UUIDs of the partitions that was
missing in v1.0 and which the driver workarounds by using UUIDs supplied
by the ffa_drivers.
Add the v1.1 get_partition_info support and disable the workaround if
the detected FF-A version is greater than v1.0.
Link: https://lore.kernel.org/r/20220907145240.1683088-9-sudeep.holla@arm.com
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
Except the message APIs, all other APIs are ffa_device independent and can
be used without any associated ffa_device from a non ffa_driver.
In order to reflect the same, just rename ffa_dev_ops as ffa_ops to
avoid any confusion or to keep it simple.
Link: https://lore.kernel.org/r/20220907145240.1683088-8-sudeep.holla@arm.com
Suggested-by: Sumit Garg <sumit.garg@linaro.org>
Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
There is a requirement to make memory APIs independent of the ffa_device.
One of the use-case is to have a common memory driver that manages the
memory for all the ffa_devices. That common memory driver won't be a
ffa_driver or won't have any ffa_device associated with it. So having
these memory APIs accessible without a ffa_device is needed and should
be possible as most of these are handled by the partition manager(SPM
or hypervisor).
Drop the ffa_device argument to the memory APIs and make them ffa_device
independent.
Link: https://lore.kernel.org/r/20220907145240.1683088-7-sudeep.holla@arm.com
Acked-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
The only user of this exported ffa_dev_ops_get() was OPTEE driver which
now uses ffa_dev->ops directly, there are no other users for this.
Also, since any ffa driver can use ffa_dev->ops directly, there will be
no need for ffa_dev_ops_get(), so just remove ffa_dev_ops_get().
Link: https://lore.kernel.org/r/20220907145240.1683088-4-sudeep.holla@arm.com
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
Currently ffa_dev_ops_get() is the way to fetch the ffa_dev_ops pointer.
It checks if the ffa_dev structure pointer is valid before returning the
ffa_dev_ops pointer.
Instead, the pointer can be made part of the ffa_dev structure and since
the core driver is incharge of creating ffa_device for each identified
partition, there is no need to check for the validity explicitly if the
pointer is embedded in the structure.
Add the pointer to the ffa_dev_ops in the ffa_dev structure itself and
initialise the same as part of creation of the device.
Link: https://lore.kernel.org/r/20220907145240.1683088-2-sudeep.holla@arm.com
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
For a lot of use cases in future patches, we will want to modify the
state of registers part of some same 'group' (e.g. same ref_obj_id). It
won't just be limited to releasing reference state, but setting a type
flag dynamically based on certain actions, etc.
Hence, we need a way to easily pass a callback to the function that
iterates over all registers in current bpf_verifier_state in all frames
upto (and including) the curframe.
While in C++ we would be able to easily use a lambda to pass state and
the callback together, sadly we aren't using C++ in the kernel. The next
best thing to avoid defining a function for each case seems like
statement expressions in GNU C. The kernel already uses them heavily,
hence they can passed to the macro in the style of a lambda. The
statement expression will then be substituted in the for loop bodies.
Variables __state and __reg are set to current bpf_func_state and reg
for each invocation of the expression inside the passed in verifier
state.
Then, convert mark_ptr_or_null_regs, clear_all_pkt_pointers,
release_reference, find_good_pkt_pointers, find_equal_scalars to
use bpf_for_each_reg_in_vstate.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220904204145.3089-16-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Enable run-time checking of dynamic memcpy() and memmove() lengths,
issuing a WARN when a write would exceed the size of the target struct
member, when built with CONFIG_FORTIFY_SOURCE=y. This would have
caught all of the memcpy()-based buffer overflows in the last 3 years,
specifically covering all the cases where the destination buffer size
is known at compile time.
This change ONLY adds a run-time warning. As false positives are currently
still expected, this will not block the overflow. The new warnings will
look like this:
memcpy: detected field-spanning write (size N) of single field "var->dest" (size M)
WARNING: CPU: n PID: pppp at source/file/path.c:nr function+0xXX/0xXX [module]
There may be false positives in the kernel where intentional
field-spanning writes are happening. These need to be addressed
similarly to how the compile-time cases were addressed: add a
struct_group(), split the memcpy(), or some other refactoring.
In order to make counting/investigating instances of added runtime checks
easier, each instance includes the destination variable name as a WARN
argument, prefixed with 'field "'. Therefore, on an x86_64 defconfig
build, it is trivial to inspect the build artifacts to find instances.
For example on an x86_64 defconfig build, there are 78 new run-time
memcpy() bounds checks added:
$ for i in vmlinux $(find . -name '*.ko'); do \
strings "$i" | grep '^field "'; done | wc -l
78
Simple cases where a destination buffer is known to be a dynamic size
do not generate a WARN. For example:
struct normal_flex_array {
void *a;
int b;
u32 c;
size_t array_size;
u8 flex_array[];
};
struct normal_flex_array *instance;
...
/* These will be ignored for run-time bounds checking. */
memcpy(instance, src, len);
memcpy(instance->flex_array, src, len);
However, one of the dynamic-sized destination cases is irritatingly
unable to be detected by the compiler: when using memcpy() to target
a composite struct member which contains a trailing flexible array
struct. For example:
struct wrapper {
int foo;
char bar;
struct normal_flex_array embedded;
};
struct wrapper *instance;
...
/* This will incorrectly WARN when len > sizeof(instance->embedded) */
memcpy(&instance->embedded, src, len);
These cases end up appearing to the compiler to be sized as if the
flexible array had 0 elements. :( For more details see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832
https://godbolt.org/z/vW6x8vh4P
These "composite flexible array structure destination" cases will be
need to be flushed out and addressed on a case-by-case basis.
Regardless, for the general case of using memcpy() on flexible array
destinations, future APIs will be created to handle common cases. Those
can be used to migrate away from open-coded memcpy() so that proper
error handling (instead of trapping) can be used.
As mentioned, none of these bounds checks block any overflows
currently. For users that have tested their workloads, do not encounter
any warnings, and wish to make these checks stop any overflows, they
can use a big hammer and set the sysctl panic_on_warn=1.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Clean up uses of "(size_t)-1" in favor of SIZE_MAX.
Cc: linux-hardening@vger.kernel.org
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
With CONFIG_FORTIFY=y and CONFIG_UBSAN_LOCAL_BOUNDS=y enabled, we observe
a runtime panic while running Android's Compatibility Test Suite's (CTS)
android.hardware.input.cts.tests. This is stemming from a strlen()
call in hidinput_allocate().
__compiletime_strlen() is implemented in terms of __builtin_object_size(),
then does an array access to check for NUL-termination. A quirk of
__builtin_object_size() is that for strings whose values are runtime
dependent, __builtin_object_size(str, 1 or 0) returns the maximum size
of possible values when those sizes are determinable at compile time.
Example:
static const char *v = "FOO BAR";
static const char *y = "FOO BA";
unsigned long x (int z) {
// Returns 8, which is:
// max(__builtin_object_size(v, 1), __builtin_object_size(y, 1))
return __builtin_object_size(z ? v : y, 1);
}
So when FORTIFY_SOURCE is enabled, the current implementation of
__compiletime_strlen() will try to access beyond the end of y at runtime
using the size of v. Mixed with UBSAN_LOCAL_BOUNDS we get a fault.
hidinput_allocate() has a local C string whose value is control flow
dependent on a switch statement, so __builtin_object_size(str, 1)
evaluates to the maximum string length, making all other cases fault on
the last character check. hidinput_allocate() could be cleaned up to
avoid runtime calls to strlen() since the local variable can only have
literal values, so there's no benefit to trying to fortify the strlen
call site there.
Perform a __builtin_constant_p() check against index 0 earlier in the
macro to filter out the control-flow-dependant case. Add a KUnit test
for checking the expected behavioral characteristics of FORTIFY_SOURCE
internals.
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Cc: David Gow <davidgow@google.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Sander Vanheule <sander@svanheule.net>
Cc: linux-hardening@vger.kernel.org
Cc: llvm@lists.linux.dev
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Android Treehugger Robot
Link: https://android-review.googlesource.com/c/kernel/common/+/2206839
Co-developed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
One of the "legitimate" uses of strncpy() is copying a NUL-terminated
string into a fixed-size non-NUL-terminated character array. To avoid
the weaknesses and ambiguity of intent when using strncpy(), provide
replacement functions that explicitly distinguish between trailing
padding and not, and require the destination buffer size be discoverable
by the compiler.
For example:
struct obj {
int foo;
char small[4] __nonstring;
char big[8] __nonstring;
int bar;
};
struct obj p;
/* This will truncate to 4 chars with no trailing NUL */
strncpy(p.small, "hello", sizeof(p.small));
/* p.small contains 'h', 'e', 'l', 'l' */
/* This will NUL pad to 8 chars. */
strncpy(p.big, "hello", sizeof(p.big));
/* p.big contains 'h', 'e', 'l', 'l', 'o', '\0', '\0', '\0' */
When the "__nonstring" attributes are missing, the intent of the
programmer becomes ambiguous for whether the lack of a trailing NUL
in the p.small copy is a bug. Additionally, it's not clear whether
the trailing padding in the p.big copy is _needed_. Both cases
become unambiguous with:
strtomem(p.small, "hello");
strtomem_pad(p.big, "hello", 0);
See also https://github.com/KSPP/linux/issues/90
Expand the memcpy KUnit tests to include these functions.
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
|